2.0.2-f (#430)

* cn batch slider * cn batch slider * cn batch slider * fix tensorart link * xyz for forge (#432) * forge info * Update README.md * Add option for default save formats (#422) * xyz plot support --------- Co-authored-by: Andray <33491867+light-and-ray@users.noreply.github.com> Co-authored-by: zappityzap <zappityzap@proton.me> * remove shit * save some change * save some change * save some change * save some change * move import upward * import * fix mm * restore i2i batch * sync ui * doc. still need to do a series of changes in infv2v, mm and utils * finished? * finished? * finished? * finished? --------- Co-authored-by: Andray <33491867+light-and-ray@users.noreply.github.com> Co-authored-by: zappityzap <zappityzap@proton.me>
2024-03-21 03:39:10 -05:00 · 2024-03-21 03:39:10 -05:00 · b20f7519a1
parent 7723f9dedf
commit b20f7519a1
16 changed files with 407 additions and 108 deletions
--- a/README.md
+++ b/README.md
@ -1,39 +1,45 @@
 # AnimateDiff for Stable Diffusion WebUI Forge
 This branch is specifically designed for [Stable Diffusion WebUI Forge](https://github.com/lllyasviel/stable-diffusion-webui-forge) by lllyasviel. See [here](docs/how-to-use.md#preparation) for how to install forge and this extension. See [Update](#update) for current status.

-This extension aim for integrating [AnimateDiff](https://github.com/guoyww/AnimateDiff/) w/ [CLI](https://github.com/s9roll7/animatediff-cli-prompt-travel) into [lllyasviel's Forge Adaption of AUTOMATIC1111 Stable Diffusion WebUI](https://github.com/lllyasviel/stable-diffusion-webui-forge) and form the most easy-to-use AI video toolkit. You can generate GIFs in exactly the same way as generating images after enabling this extension.
+This extension aim for integrating [AnimateDiff](https://github.com/guoyww/AnimateDiff/) with [CLI](https://github.com/s9roll7/animatediff-cli-prompt-travel) into [lllyasviel's Forge Adaption of AUTOMATIC1111 Stable Diffusion WebUI](https://github.com/lllyasviel/stable-diffusion-webui-forge) and form the most easy-to-use AI video toolkit. You can generate GIFs in exactly the same way as generating images after enabling this extension.

-This extension implements AnimateDiff in a different way. It makes heavy use of [Unet Patcher](https://github.com/lllyasviel/stable-diffusion-webui-forge?tab=readme-ov-file#unet-patcher), so that you do not need to reload your model weights if you don't want to, and I can almostly get rif of monkey-patching WebUI and ControlNet.
+This extension implements AnimateDiff in a different way. It makes heavy use of [Unet Patcher](https://github.com/lllyasviel/stable-diffusion-webui-forge?tab=readme-ov-file#unet-patcher), so that you do not need to reload your model weights if you don't want to, and I can almostly get rid of monkey-patching WebUI and ControlNet.

-You might also be interested in another extension I created: [Segment Anything for Stable Diffusion WebUI](https://github.com/continue-revolution/sd-webui-segment-anything). This extension will also be redesigned for forge later.
+You might also be interested in another extension I created: [Segment Anything for Stable Diffusion WebUI](https://github.com/continue-revolution/sd-webui-segment-anything). This extension will also be redesigned for forge later..

-[TusiArt](https://tusiart.com/) (for users physically inside P.R.China mainland) and [TensorArt](https://tusiart.com/) (for others) offers online service of this extension.
+[TusiArt](https://tusiart.com/) (for users inside P.R.China mainland) and [TensorArt](https://tensor.art/) (for others) offers online service of this extension.


 ## Table of Contents
-[Update](#update) | [TODO](#todo) | [Model Zoo](#model-zoo) | [Documentation](#documentation) | [Tutorial](#tutorial) | [Thanks](#thanks) | [Star History](#star-history) | [Sponsor](#sponsor)
+[Update](#update) | [Future Plan](#future-plan) | [Model Zoo](#model-zoo) | [Documentation](#documentation) | [Tutorial](#tutorial) | [Thanks](#thanks) | [Star History](#star-history) | [Sponsor](#sponsor)


 ## Update
 - [v2.0.0-f](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.0-f) in `02/05/2023`: txt2img, prompt travel, infinite generation, all kinds of optimizations have been proven to be working properly and elegantly.
 - [v2.0.1-f](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.1-f) in `02/11/2023`: [ControlNet V2V](docs/features.md#controlnet-v2v) in txt2img panel is working properly and elegantly. You can also try adding mask and inpaint.
+- [v2.0.2-f](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.2-f) in `03/18/2023`: Motion LoRA, i2i batch and GroupNorm hack have been restored. Motion LoRA is built under [KohakuBlueleaf](https://github.com/KohakuBlueleaf)'s [LyCORIS](https://github.com/KohakuBlueleaf/a1111-sd-webui-lycoris) extension. GroupNorm hack is currently in [this](https://github.com/lllyasviel/stable-diffusion-webui-forge/tree/conrevo/gn-patcher-for-early-ad) branch.

+We believe that all features in OG A1111 version (except IP-Adapter prompt travel / SparseCtrl / ControlNet keyframe / FreeInit) have been available in Forge version. We will synchronize ControlNet updates from OG A1111 version, add SparseCtrl and Magic Animate, and add more paramters as soon as we can.

-## TODO
- [ ] MotionLoRA and i2i batch are still under heavy construction, but I expect to release a working version soon in a week.
- [ ] When all previous features are working properly, I will soon release SparseCtrl, Magic Animate and Moore Animate Anyone.
- [ ] An official video tutorial will be available on YouTube and bilibili.
- [ ] A bunch of new models / advanced parameters / new features may be implented soon.
- [ ] All problems in master branch will be fixed soon, but new feature updates for OG A1111 + Mikubill ControlNet extension may be postponded to some time when I have time to rewrite ControlNet extension.
+BREAKING CHANGE:
+- You need PyTorch >= 2.0.0 to run this extension.
+- You need [a1111-sd-webui-lycoris](https://github.com/KohakuBlueleaf/a1111-sd-webui-lycoris) extension to run Motion LoRA. Consider ALL LoRAs as LyCORIS models (specify ALL LoRAs as `<lyco:whatever:x.y>` instead of `<lora:whatever:x.y>`, not only when you use AnimateDiff) if you install LyCORIS extension in Foge.
+- You need to download [Motion LoRA](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/lora_v2), [Hotshot-XL](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/motion_module/mm_sdxl_hs.safetensors?download=true), [AnimateDiff V3 Motion Adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora_v2/mm_sd15_v3_adapter.safetensors?download=true), [SparseCtrl](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/control) from [my HuggingFace repository](https://huggingface.co/conrevo/AnimateDiff-A1111).
+
+## Future Plan
+Although [OpenAI Sora](https://openai.com/sora) is far better at following complex text prompts and generating complex scenes, we believe that OpenAI will NOT open source Sora or any other other products they released recently. My current plan is to continue developing this extension until when an open-sourced video model is released, with strong ability to generate complex scenes, easy customization and good ecosystem like SD1.5.
+
+We will try our best to bring interesting researches into both WebUI and Forge as long as we can. Not all researches will be implemented. You are welcome to submit a feature request if you find an interesting one. We are also open to learn from other equivalent software.
+
+That said, due to the notorious difficulty in maintaining [sd-webui-controlnet](https://github.com/Mikubill/sd-webui-controlnet), we do NOT plan to implement ANY new research into WebUI if it touches "reference control", such as [Magic Animate](https://github.com/magic-research/magic-animate). Such features will be Forge only. Also, some advanced features in [ControlNet Forge Intergrated](https://github.com/lllyasviel/stable-diffusion-webui-forge/tree/main/extensions-builtin/sd_forge_controlnet), such as ControlNet per-frame mask, will also be Forge only. I really hope that I could have bandwidth to rework sd-webui-controlnet, but it requires a huge amount of time.


 ## Model Zoo
-I am maintaining a [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main) to provide all official models in fp16 & safetensors format. You are highly recommended to use my link. You MUST use my link to download adapter for V3. You may still use the old links if you want, for all models except adapter for V3.
+I am maintaining a [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main) to provide all official models in fp16 & safetensors format. You are highly recommended to use my link. You MUST use my link to download [Motion LoRA](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/lora_v2), [Hotshot-XL](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/motion_module/mm_sdxl_hs.safetensors?download=true), [AnimateDiff V3 Motion Adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora_v2/mm_sd15_v3_adapter.safetensors?download=true), [SparseCtrl](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/control). You may still use the old links if you want, for all other models

 - "Official" models by [@guoyww](https://github.com/guoyww): [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI) | [HuggingFace](https://huggingface.co/guoyww/animatediff/tree/main) | [CivitAI](https://civitai.com/models/108836)
 - "Stabilized" community models by [@manshoety](https://huggingface.co/manshoety): [HuggingFace](https://huggingface.co/manshoety/AD_Stabilized_Motion/tree/main)
 - "TemporalDiff" models by [@CiaraRowles](https://huggingface.co/CiaraRowles): [HuggingFace](https://huggingface.co/CiaraRowles/TemporalDiff/tree/main)
- "HotShotXL" models by [@hotshotco](https://huggingface.co/hotshotco/): [HuggingFace](https://huggingface.co/hotshotco/Hotshot-XL/tree/main)


 ## Documentation
@ -44,20 +50,20 @@ I am maintaining a [huggingface repo](https://huggingface.co/conrevo/AnimateDiff


 ## Tutorial 
-TODO
+There are a lot of wonderful video tutorials on YouTube and bilibili, and you should check those out for now. For the time being, there are a series of updates on the way and I don't want to work on my own before I am satisfied. An official tutorial should come when I am satisfied with the available features.


 ## Thanks
-I thank researchers from [Shanghai AI Lab](https://www.shlab.org.cn/), especially [@guoyww](https://github.com/guoyww) for creating AnimateDiff. I also thank [@neggles](https://github.com/neggles) and [@s9roll7](https://github.com/s9roll7) for creating and improving [AnimateDiff CLI Prompt Travel](https://github.com/s9roll7/animatediff-cli-prompt-travel). This extension could not be made possible without these creative works.
-
-I also thank community developers, especially
- [@zappityzap](https://github.com/zappityzap) who developed the majority of the [output features](https://github.com/continue-revolution/sd-webui-animatediff/blob/master/scripts/animatediff_output.py)
+We thank all developers and community users who contribute to this repository in many ways, especially
+- [@guoyww](https://github.com/guoyww) for creating AnimateDiff
+- [@limbo0000](https://github.com/limbo0000) for responding to my questions about AnimateDiff
+- [@neggles](https://github.com/neggles) and [@s9roll7](https://github.com/s9roll7) for developing [AnimateDiff CLI Prompt Travel](https://github.com/s9roll7/animatediff-cli-prompt-travel)
+- [@zappityzap](https://github.com/zappityzap) for developing the majority of the [output features](https://github.com/continue-revolution/sd-webui-animatediff/blob/master/scripts/animatediff_output.py)
+- [@thiswinex](https://github.com/thiswinex) for developing FreeInit
+- [@lllyasviel](https://github.com/lllyasviel) for adding me as a collaborator of sd-webui-controlnet and offering technical support for Forge
+- [@KohakuBlueleaf](https://github.com/KohakuBlueleaf) for helping with FP8 and LCM development
 - [@TDS4874](https://github.com/TDS4874) and [@opparco](https://github.com/opparco) for resolving the grey issue which significantly improve the performance
- [@lllyasviel](https://github.com/lllyasviel) for offering forge technical support
-
-and many others who have contributed to this extension.
-
-I also thank community users, especially [@streamline](https://twitter.com/kaizirod) who provided dataset and workflow of ControlNet V2V. His workflow is extremely amazing and definitely worth checking out.
+- [@streamline](https://twitter.com/kaizirod) for providing ControlNet V2V dataset and workflow. His workflow is extremely amazing and definitely worth checking out.


 ## Star History
--- a/docs/demo.md
+++ b/docs/demo.md
@ -1,5 +1,4 @@
 # Demo
-> Unfortunaly, due to unknown reason, you will NOT be able to reproduce videos here (created by OG A1111) to Forge.

 ## Basic Usage
 | AnimateDiff | Extension | img2img |
--- a/docs/features.md
+++ b/docs/features.md
@ -54,24 +54,24 @@ Example input parameter fill-in:
 There are a lot of amazing demo online. Here I provide a very simple demo. The dataset is from [streamline](https://twitter.com/kaizirod), but the workflow is an arbitrary setup by me. You can find a lot more much more amazing examples (and potentially available workflows / infotexts) on Reddit, Twitter, YouTube and Bilibili. The easiest way to share your workflow created by my software is to share one output frame with infotext.
 | input | output |
 | --- | --- |
-| <img height='512px' src='https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/ff066808-fc00-43e1-a2a6-b16e41dad603'> | <img height='512px' src='https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/dc3d833b-a113-4278-9e48-5f2a8ee06704'> |
+| <img height='512px' src='https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/ff066808-fc00-43e1-a2a6-b16e41dad603'> | <img height='512px' src='https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/5aab1f9f-245d-45e9-ba71-1b902bc6ea40'> |


 ## Model Spec
-> Currently GroupNorm is not hacked, so Hotshot-XL and AnimateDiff V1 will have performance degradation.
-> 
-> Due to system change, you will not be able to apply Motion LoRA at thie time. I will try to adapt to the new system later.
+> BREAKING CHANGE: You need to download [Motion LoRA](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/lora_v2), [Hotshot-XL](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/motion_module/mm_sdxl_hs.safetensors?download=true), [AnimateDiff V3 Motion Adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora_v2/mm_sd15_v3_adapter.safetensors?download=true), [SparseCtrl](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/control) from [my HuggingFace repository](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/lora_v2) instead of the original one.

 ### Motion LoRA
-[Download](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/lora) and use them like any other LoRA you use (example: download Motion LoRA to `stable-diffusion-webui/models/Lora` and add `<lora:mm_sd15_v2_lora_PanLeft:0.8>` to your positive prompt). **Motion LoRAs cam only be applied to V2 motion modules**.
+> Motion LoRA in Forge is built with [a1111-sd-webui-lycoris](https://github.com/KohakuBlueleaf/a1111-sd-webui-lycoris). If you really want to use Motion LoRA in Forge, all you need to do is to install LyCORIS extension and replace all `<lora:` in your positive prompt to `<lyco:`. Always do this, no matter whether you are using AnimateDiff. 
+
+[Download](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/lora_v2) and use them like any other LoRA you use (example: download Motion LoRA to `stable-diffusion-webui/models/Lora` and add `<lyco:mm_sd15_v2_lora_PanLeft:0.8>` to your positive prompt). **Motion LoRAs can only be applied to V2 motion module**.

 ### V3
-V3 has identical state dict keys as V1 but slightly different inference logic (GroupNorm is not hacked for V3). You may optionally use [adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) for V3, in the same way as how you apply LoRA. You MUST use [my link](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) instead of the [official link](https://huggingface.co/guoyww/animatediff/resolve/main/v3_sd15_adapter.ckpt?download=true). The official adapter won't work for A1111 due to state dict incompatibility.
+AnimateDiff V3 has identical state dict keys as V1 but slightly different inference logic (GroupNorm is not hacked for V3). You may optionally use [adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) for V3, in the same way as how you apply LoRA. You MUST use [my link](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) instead of the [official link](https://huggingface.co/guoyww/animatediff/resolve/main/v3_sd15_adapter.ckpt?download=true). The official adapter won't work for A1111 due to state dict incompatibility.

 ### SDXL
 [AnimateDiff-XL](https://github.com/guoyww/AnimateDiff/tree/sdxl) and [HotShot-XL](https://github.com/hotshotco/Hotshot-XL) have identical architecture to AnimateDiff-SD1.5. The only difference are
 - HotShot-XL is trained with 8 frames instead of 16 frames. You are recommended to set `Context batch size` to 8 for HotShot-XL.
- AnimateDiff-XL is still trained with 16 frames. You do not need to change `Context batch size` for AnimateDiffXL.
+- AnimateDiff-XL is still trained with 16 frames. You do not need to change `Context batch size` for AnimateDiff-XL.
 - AnimateDiff-XL & HotShot-XL have fewer layers compared to AnimateDiff-SD1.5 because of SDXL.
 - AnimateDiff-XL is trained with higher resolution compared to HotShot-XL.

@ -79,4 +79,4 @@ Although AnimateDiff-XL & HotShot-XL have identical structure as AnimateDiff-SD1

 Technically all features available for AnimateDiff + SD1.5 are also available for (AnimateDiff / HotShot) + SDXL. However, I have not tested all of them. I have tested infinite context generation and prompt travel; I have not tested ControlNet. If you find any bug, please report it to me.

-Unfortunately, neither of these 2 motion modules are as good as those for SD1.5, and there is NOTHING I can do about it (they are just poorly trained). I strongly discourage anyone from applying SDXL for video generation. You will be VERY disappointed if you do that.
+Unfortunately, neither of these 2 motion modules are as good as those for SD1.5, and there is NOTHING I can do about it (they are just poorly trained). Also, there seem to be no ControlNets comparable to what [@lllyasviel](https://github.com/lllyasviel) had trained for Sd1.5. I strongly discourage anyone from applying SDXL for video generation. You will be VERY disappointed if you do that.
--- a/docs/how-to-use.md
+++ b/docs/how-to-use.md
@ -25,7 +25,7 @@ Then install [sd-forge-animatediff](https://github.com/continue-revolution/sd-fo
 1. Go to txt2img if you want to try txt2vid and img2img if you want to try img2vid.
 1. Choose an SD checkpoint, write prompts, set configurations such as image width/height. If you want to generate multiple GIFs at once, please [change batch number, instead of batch size](performance.md#batch-size).
 1. Enable AnimateDiff extension, set up [each parameter](#parameters), then click `Generate`.
-1. You should see the output GIF on the output gallery. You can access GIF output at `stable-diffusion-webui/outputs/{txt2img or img2img}-images/AnimateDiff/{yy-mm-dd}`. You can also access image frames at `stable-diffusion-webui/outputs/{txt2img or img2img}-images/{yy-mm-dd}`. You may choose to save frames for each generation into separate directories in `Settings/AnimateDiff`.
+1. You should see the output GIF on the output gallery. You can access GIF output and image frames at `stable-diffusion-webui/outputs/{txt2img or img2img}-images/AnimateDiff/{yy-mm-dd}`. You may choose to save frames for each generation into the original txt2img / img2img output directory by uncheck a checkbox inside `Settings/AnimateDiff`.

 ## API
 It is quite similar to the way you use ControlNet. API will return a video in base64 format. In `format`, `PNG` means to save frames to your file system without returning all the frames. If you want your API to return all frames, please add `Frame` to `format` list. For most up-to-date parameters, please read [here](https://github.com/continue-revolution/sd-webui-animatediff/blob/master/scripts/animatediff_ui.py#L26).
@ -60,6 +60,8 @@ It is quite similar to the way you use ControlNet. API will return a video in ba
 },
 ```

+If you wish to specify different conditional hints for different ControlNet units, the only additional thing you need to do is to specify `batch_image_dir` and `batch_mask_dir` parameters in your ControlNet JSON API parameters. The expected input format is exactly the same as [how to use ControlNet in WebUI](features.md#controlnet-v2v).
+

 ## Parameters
 1. **Save format** — Format of the output. Choose at least one of "GIF"|"MP4"|"WEBP"|"WEBM"|"PNG". Check "TXT" if you want infotext, which will live in the same directory as the output GIF. Infotext is also accessible via `stable-diffusion-webui/params.txt` and outputs in all formats.
--- a/docs/performance.md
+++ b/docs/performance.md
@ -5,7 +5,7 @@
 Optimizations can be significantly helpful if you want to improve speed and reduce VRAM usage.

 ### Attention
-In forge, we will always apply scaled dot product attention from PyTorch.
+We will always apply scaled dot product attention from PyTorch.

 ### FP8
 FP8 requires torch >= 2.1.0. Add `--unet-in-fp8-e4m3fn` to command line arguments if you want fp8.
@ -16,9 +16,11 @@ FP8 requires torch >= 2.1.0. Add `--unet-in-fp8-e4m3fn` to command line argument
 - apply [LCM LoRA](https://civitai.com/models/195519/lcm-lora-weights-stable-diffusion-acceleration-module)
 - apply low CFG denoising strength (1-2 is recommended)

+I have [PR-ed](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/14583) this sampler to Stable Diffusion WebUI and you no longer need this extension to have LCM sampler. I have removed LCM sampler in this repository.
+

 ## VRAM
-> These are for OG A1111. Information about forge will be updated soon.
+> These are for OG A1111. It is meaningless to measure VRAM consumption in Forge because [@lllyasviel](https://github.com/lllyasviel) implemented batch VAE decode based on your available VRAM.

 Actual VRAM usage depends on your image size and context batch size. You can try to reduce image size to reduce VRAM usage. You are discouraged from changing context batch size, because this conflicts training specification.

@ -39,4 +41,4 @@ Batch size on WebUI will be replaced by GIF frame number internally: 1 full GIF

 Batch number is NOT the same as batch size. In A1111 WebUI, batch number is above batch size. Batch number means the number of sequential steps, but batch size means the number of parallel steps. You do not have to worry too much when you increase batch number, but you do need to worry about your VRAM when you increase your batch size (where in this extension, video frame number). You do not need to change batch size at all when you are using this extension.

-We might develope approach to support batch size on WebUI in the near future.
+We might develope approach to support batch size on WebUI, but this is with very low priority and we cannot commit a specific date for this.
--- a/motion_module.py
+++ b/motion_module.py
@ -261,7 +261,7 @@ class PositionalEncoding(nn.Module):
        self.register_buffer('pe', pe)

    def forward(self, x):
-        x = x + self.pe[:, :x.size(1)].type(x.dtype)
+        x = x + self.pe[:, :x.size(1)].to(x)
        return self.dropout(x)


--- a/scripts/animatediff.py
+++ b/scripts/animatediff.py
@ -1,5 +1,5 @@
 from typing import List, Tuple
-from fastapi import params
+
 import gradio as gr

 from modules import script_callbacks, scripts
@ -13,10 +13,12 @@ from scripts.animatediff_logger import logger_animatediff as logger
 from scripts.animatediff_mm import mm_animatediff as motion_module
 from scripts.animatediff_prompt import AnimateDiffPromptSchedule
 from scripts.animatediff_output import AnimateDiffOutput
+from scripts.animatediff_xyz import patch_xyz, xyz_attrs
 from scripts.animatediff_ui import AnimateDiffProcess, AnimateDiffUiGroup
 from scripts.animatediff_settings import on_ui_settings
 from scripts.animatediff_infotext import update_infotext, infotext_pasted
 from scripts.animatediff_utils import get_animatediff_arg
+from scripts.animatediff_i2ibatch import animatediff_hook_i2i_batch, animatediff_unhook_i2i_batch

 script_dir = scripts.basedir()
 motion_module.set_script_dir(script_dir)
@ -40,7 +42,6 @@ class AnimateDiffScript(scripts.Script):
    def ui(self, is_img2img):
        unit = AnimateDiffUiGroup().render(
            is_img2img,
-            motion_module.get_model_dir(),
            self.infotext_fields,
            self.paste_field_names
        )
@ -51,6 +52,11 @@ class AnimateDiffScript(scripts.Script):
        if p.is_api:
            params = get_animatediff_arg(p)
        motion_module.set_ad_params(params)
+
+        # apply XYZ settings
+        params.apply_xyz()
+        xyz_attrs.clear()
+
        if params.enable:
            logger.info("AnimateDiff process start.")
            motion_module.load(params.model)
@ -86,7 +92,11 @@ class AnimateDiffScript(scripts.Script):
            logger.info("AnimateDiff process end.")


+patch_xyz()
+
 script_callbacks.on_ui_settings(on_ui_settings)
 script_callbacks.on_after_component(AnimateDiffUiGroup.on_after_component)
 script_callbacks.on_cfg_denoiser(AnimateDiffInfV2V.animatediff_on_cfg_denoiser)
 script_callbacks.on_infotext_pasted(infotext_pasted)
+script_callbacks.on_before_ui(animatediff_hook_i2i_batch)
+script_callbacks.on_script_unloaded(animatediff_unhook_i2i_batch)
--- a/scripts/animatediff_i2ibatch.py
+++ b/scripts/animatediff_i2ibatch.py
@ -7,7 +7,7 @@ import numpy as np
 import torch
 import hashlib
 from PIL import Image, ImageOps, UnidentifiedImageError
-from modules import processing, shared, scripts, devices, masking, sd_samplers, images
+from modules import processing, shared, scripts, devices, masking, sd_samplers, images, img2img
 from modules.processing import (StableDiffusionProcessingImg2Img,
                                process_images,
                                create_binary_mask,
@ -20,6 +20,7 @@ from modules.sd_samplers_common import images_tensor_to_samples, approximation_i
 from modules.sd_models import get_closet_checkpoint_match

 from scripts.animatediff_logger import logger_animatediff as logger
+from scripts.animatediff_utils import get_animatediff_arg, get_controlnet_units


 def animatediff_i2i_init(self, all_prompts, all_seeds, all_subseeds): # only hack this when i2i-batch with batch mask
@ -174,9 +175,16 @@ def animatediff_i2i_init(self, all_prompts, all_seeds, all_subseeds): # only hac
    self.image_conditioning = self.img2img_image_conditioning(image * 2 - 1, self.init_latent, image_masks) # let's ignore this image_masks which is related to inpaint model with different arch


-def amimatediff_i2i_batch(
+def animatediff_i2i_batch(
        p: StableDiffusionProcessingImg2Img, input_dir: str, output_dir: str, inpaint_mask_dir: str,
        args, to_scale=False, scale_by=1.0, use_png_info=False, png_info_props=None, png_info_dir=None):
+    ad_params = get_animatediff_arg(p)
+    if not ad_params.enable:
+        return img2img.original_i2i_batch(p, input_dir, output_dir, inpaint_mask_dir, args, to_scale, scale_by, use_png_info, png_info_props, png_info_dir)
+    ad_params.is_i2i_batch = True
+    if not ad_params.video_path and not ad_params.video_source:
+        ad_params.video_path = input_dir
+
    output_dir = output_dir.strip()
    processing.fix_seed(p)

@ -189,10 +197,21 @@ def amimatediff_i2i_batch(

        if is_inpaint_batch:
            assert len(inpaint_masks) == 1 or len(inpaint_masks) == len(images), 'The number of masks must be 1 or equal to the number of images.'
-            logger.info(f"\n[i2i batch] Inpaint batch is enabled. {len(inpaint_masks)} masks found.")
+            logger.info(f"[i2i batch] Inpaint batch is enabled. {len(inpaint_masks)} masks found.")
            if len(inpaint_masks) > 1: # batch mask
                p.init = MethodType(animatediff_i2i_init, p)

+            cn_units = get_controlnet_units(p)
+            for idx, cn_unit in enumerate(cn_units):
+                # batch path broadcast
+                if (cn_unit.input_mode.name == 'SIMPLE' and cn_unit.image is None) or \
+                   (cn_unit.input_mode.name == 'BATCH' and not cn_unit.batch_image_dir) or \
+                   (cn_unit.input_mode.name == 'MERGE' and not cn_unit.batch_input_gallery):
+                    cn_unit.input_mode = cn_unit.input_mode.__class__.BATCH
+                    if "inpaint" in cn_unit.module:
+                        cn_unit.batch_mask_dir = inpaint_mask_dir
+                        logger.info(f"ControlNetUnit-{idx} is an inpaint unit without cond_hint specification. We have set batch_images = {cn_unit.batch_image_dir}.")
+
    logger.info(f"[i2i batch] Will process {len(images)} images, creating {p.n_iter} new videos.")

    # extract "default" params to use in case getting png info fails
@ -301,3 +320,17 @@ def amimatediff_i2i_batch(
            batch_results.infotexts = batch_results.infotexts[:int(shared.opts.img2img_batch_show_results_limit)]

    return batch_results
+
+
+def animatediff_hook_i2i_batch():
+    if getattr(img2img, "original_i2i_batch", None) is None:
+        logger.info("AnimateDiff Hooking i2i_batch")
+        img2img.original_i2i_batch = img2img.process_batch
+        img2img.process_batch = animatediff_i2i_batch
+
+
+def animatediff_unhook_i2i_batch():
+    if getattr(img2img, "original_i2i_batch", None) is not None:
+        logger.info("AnimateDiff Unhooking i2i_batch")
+        img2img.process_batch = img2img.original_i2i_batch
+        img2img.original_i2i_batch = None
--- a/scripts/animatediff_infv2v.py
+++ b/scripts/animatediff_infv2v.py
@ -1,9 +1,13 @@
-from re import A
+import gc
 import numpy as np
 import torch

+from ldm_patched.modules.model_management import get_torch_device, soft_empty_cache
+from modules import shared
+from modules.sd_samplers_cfg_denoiser import pad_cond
 from modules.script_callbacks import CFGDenoiserParams
 from scripts.animatediff_logger import logger_animatediff as logger
+from scripts.animatediff_mm import mm_animatediff as motion_module


 class AnimateDiffInfV2V:
@ -75,18 +79,16 @@ class AnimateDiffInfV2V:

    @staticmethod
    def animatediff_on_cfg_denoiser(cfg_params: CFGDenoiserParams):
-        from scripts.animatediff_mm import mm_animatediff as motion_module
        ad_params = motion_module.ad_params
        if ad_params is None or not ad_params.enable:
            return
+
        ad_params.step = cfg_params.denoiser.step
-        if getattr(ad_params, "text_cond", None) is None:
+        if cfg_params.denoiser.step == 0:
            prompt_closed_loop = (ad_params.video_length > ad_params.batch_size) and (ad_params.closed_loop in ['R+P', 'A'])
            ad_params.text_cond = ad_params.prompt_scheduler.multi_cond(cfg_params.text_cond, prompt_closed_loop)

        #TODO: move this to cond modifier patch
-        from modules import shared
-        from modules.sd_samplers_cfg_denoiser import pad_cond
        def pad_cond_uncond(cond, uncond):
            empty = shared.sd_model.cond_stage_model_empty_prompt
            num_repeats = (cond.shape[1] - uncond.shape[1]) // empty.shape[1]
@ -100,10 +102,6 @@ class AnimateDiffInfV2V:

    @staticmethod
    def mm_sd_forward(apply_model, info):
-        from scripts.animatediff_mm import mm_animatediff as motion_module
-        from ldm_patched.modules.model_management import get_torch_device, soft_empty_cache
-        import gc
-
        logger.debug("Running special forward for AnimateDiff")
        x_out = torch.zeros_like(info["input"])
        ad_params = motion_module.ad_params
--- a/scripts/animatediff_mm.py
+++ b/scripts/animatediff_mm.py
@ -47,8 +47,8 @@ class AnimateDiffMM:
            if unet_manual_cast(unet_dtype(), get_torch_device()) is not None:
                mm_config["operations"] = manual_cast
            self.mm = MotionWrapper(**mm_config)
-            missed_keys = self.mm.load_state_dict(mm_state_dict)
-            logger.warn(f"Missing keys {missed_keys}")
+            self.mm.load_state_dict(mm_state_dict)
+        self.set_layer_mapping(shared.sd_model)


    def inject(self, sd_model, model_name="mm_sd15_v3.safetensors"):
@ -56,20 +56,18 @@ class AnimateDiffMM:
        sd_ver = "SDXL" if sd_model.is_sdxl else "SD1.5"
        assert sd_model.is_sdxl == self.mm.is_xl, f"Motion module incompatible with SD. You are using {sd_ver} with {self.mm.mm_type}."

-        # TODO: What's the best way to do GroupNorm32 forward function hack?
        if self.mm.enable_gn_hack():
-            logger.warning(f"{sd_ver} GroupNorm32 forward function is NOT hacked. Performance will be degraded. Please use newer motion module")
-            # from ldm_patched.ldm.modules.diffusionmodules import model as diffmodel
-            # self.gn32_original_forward = diffmodel.Normalize
-            # gn32_original_forward = self.gn32_original_forward
-
-            # def groupnorm32_mm_forward(self, x):
-            #     x = rearrange(x, "(b f) c h w -> b c f h w", b=2)
-            #     x = gn32_original_forward(self, x)
-            #     x = rearrange(x, "b c f h w -> (b f) c h w", b=2)
-            #     return x
-
-            # diffmodel.Normalize = groupnorm32_mm_forward
+            try:
+                from einops import rearrange
+                def groupnorm32_mm_forward(gn32_original_forward, x, transformer_options={}):
+                    x = rearrange(x, "(b f) c h w -> b c f h w", f=self.ad_params.batch_size)
+                    x = gn32_original_forward(x)
+                    x = rearrange(x, "b c f h w -> (b f) c h w", f=self.ad_params.batch_size)
+                    return x
+                unet.set_groupnorm_wrapper(groupnorm32_mm_forward)
+                logger.info(f"{sd_ver} GroupNorm32 forward function is hacked.")
+            except:
+                logger.warning(f"{sd_ver} GroupNorm32 forward function is NOT hacked. Performance will be degraded. Please use newer motion module")

        logger.info(f"Injecting motion module {model_name} into {sd_ver} UNet.")

@ -96,10 +94,13 @@ class AnimateDiffMM:

        def mm_cn_forward(model, inner_model, hint, **kwargs):
            controls = []
-            for i in range(0, hint.shape[0], 2 * self.ad_params.batch_size):
-                current_kwargs = {k: (v[i:i + 2 * self.ad_params.batch_size].to(get_torch_device())
+            control_batch_size = shared.opts.data.get("animatediff_control_batch_size", 0)
+            if control_batch_size == 0:
+                control_batch_size = 2 * self.ad_params.batch_size
+            for i in range(0, hint.shape[0], control_batch_size):
+                current_kwargs = {k: (v[i:i + control_batch_size].to(get_torch_device())
                                  if type(v) == torch.Tensor else v) for k, v in kwargs.items()}
-                current_kwargs["hint"] = hint[i:i + 2 * self.ad_params.batch_size].to(get_torch_device())
+                current_kwargs["hint"] = hint[i:i + control_batch_size].to(get_torch_device())
                current_ctrl = inner_model(**current_kwargs)
                if len(controls) == 0:
                    controls = [[c.cpu() if type(c) == torch.Tensor else c] for c in current_ctrl]
@ -122,7 +123,10 @@ class AnimateDiffMM:
        unet.set_model_unet_function_wrapper(AnimateDiffInfV2V.mm_sd_forward)
        unet.add_block_inner_modifier(mm_block_modifier)
        unet.set_memory_peak_estimation_modifier(mm_memory_estimator)
-        unet.set_controlnet_model_function_wrapper(mm_cn_forward)
+        if shared.opts.data.get("animatediff_disable_control_wrapper", False):
+            logger.warning("ControlNet wrapper is disabled. Be cautious that you may run out of VRAM.")
+        else:
+            unet.set_controlnet_model_function_wrapper(mm_cn_forward)
        sd_model.forge_objects.unet = unet


@ -145,4 +149,12 @@ class AnimateDiffMM:
        sd_model.forge_objects.unet = unet


+    def set_layer_mapping(self, sd_model):
+        if hasattr(sd_model, 'network_layer_mapping'):
+            for name, module in self.mm.named_modules():
+                network_name = name.replace(".", "_")
+                sd_model.network_layer_mapping[network_name] = module
+                module.network_layer_name = network_name
+
+
 mm_animatediff = AnimateDiffMM()
--- a/scripts/animatediff_output.py
+++ b/scripts/animatediff_output.py
@ -1,6 +1,7 @@
 import base64
 import datetime
 from pathlib import Path
+import traceback

 import imageio.v3 as imageio
 import numpy as np
@ -18,7 +19,9 @@ from scripts.animatediff_ui import AnimateDiffProcess
 class AnimateDiffOutput:
    def output(self, p: StableDiffusionProcessing, res: Processed, params: AnimateDiffProcess):
        video_paths = []
-        logger.info("Merging images into GIF.")
+        first_frames = []
+        from_xyz = any("xyz_grid" in frame.filename for frame in traceback.extract_stack())
+        logger.info(f"Saving output formats: {', '.join(params.format)}")
        date = datetime.datetime.now().strftime('%Y-%m-%d')
        output_dir = Path(f"{p.outpath_samples}/AnimateDiff/{date}")
        output_dir.mkdir(parents=True, exist_ok=True)
@ -27,7 +30,9 @@ class AnimateDiffOutput:
            # frame interpolation replaces video_list with interpolated frames
            # so make a copy instead of a slice (reference), to avoid modifying res
            frame_list = [image.copy() for image in res.images[i : i + params.video_length]]
-
+            if from_xyz:
+                first_frames.append(res.images[i].copy())
+            
            seq = images.get_next_sequence_number(output_dir, "")
            filename_suffix = f"-{params.request_id}" if params.request_id else ""
            filename = f"{seq:05}-{res.all_seeds[(i-res.index_of_first_image)]}{filename_suffix}"
@ -43,9 +48,16 @@ class AnimateDiffOutput:

        res.images = video_paths if not p.is_api else (self._encode_video_to_b64(video_paths) + (frame_list if 'Frame' in params.format else []))

+        # replace results with first frame of each video so xyz grid draws correctly
+        if from_xyz:
+            res.images = first_frames
+
+        if shared.opts.data.get("animatediff_frame_extract_remove", False):
+            self._remove_frame_extract(params)
+

    def _remove_frame_extract(self, params: AnimateDiffProcess):
-        if params.video_source and params.video_path and not shared.opts.data.get("animatediff_frame_extract_path", None) and Path(params.video_path).exists():
+        if params.video_source and params.video_path and Path(params.video_path).exists():
            logger.info(f"Removing extracted frames from {params.video_path}")
            import shutil
            shutil.rmtree(params.video_path)
@ -367,4 +379,4 @@ class AnimateDiffOutput:
        targetpath = f"{date}/{filename}"
        client.upload_file(file_path, bucket,  targetpath)
        logger.info(f"{file_path} saved to s3 in bucket: {bucket}")
-        return f"http://{host}:{port}/{bucket}/{targetpath}"  
+        return f"http://{host}:{port}/{bucket}/{targetpath}"
--- a/scripts/animatediff_prompt.py
+++ b/scripts/animatediff_prompt.py
@ -130,7 +130,8 @@ class AnimateDiffPromptSchedule:
        if isinstance(cond, torch.Tensor):
            return torch.stack(cond_list).to(cond.dtype).to(cond.device)
        else:
-            return {k: torch.stack(v).to(cond[k].dtype).to(cond[k].device) for k, v in cond_list.items()}
+            from modules.prompt_parser import DictWithShape
+            return DictWithShape({k: torch.stack(v).to(cond[k].dtype).to(cond[k].device) for k, v in cond_list.items()}, None)


    @staticmethod
--- a/scripts/animatediff_settings.py
+++ b/scripts/animatediff_settings.py
@ -9,6 +9,25 @@ def on_ui_settings():
    s3_selection =("animatediff", "AnimateDiff AWS") 

    # default option specification
+    shared.opts.add_option(
+        "animatediff_disable_control_wrapper",
+        shared.OptionInfo(
+            False,
+            "Disable ControlNet wrapper for AnimateDiff (this requires a large VRAM)",
+            gr.Checkbox,
+            section=section
+        )
+    )
+    shared.opts.add_option(
+        "animatediff_control_batch_size",
+        shared.OptionInfo(
+            0,
+            "ControlNet batch size for AnimateDiff (default: 2 * context_batch_size)",
+            gr.Slider,
+            {"minimum": 0, "maximum": 128, "step": 4},
+            section=section,
+        ),
+    )
    shared.opts.add_option(
        "animatediff_model_path",
        shared.OptionInfo(
@ -29,10 +48,20 @@ def on_ui_settings():
            section=section
        ).needs_restart()
    )
+    shared.opts.add_option(
+        "animatediff_save_to_custom",
+        shared.OptionInfo(
+            True,
+            "Save frames to stable-diffusion-webui/outputs/{ txt|img }2img-images/AnimateDiff/{gif filename}/{date} "
+            "instead of stable-diffusion-webui/outputs/{ txt|img }2img-images/{date}/.",
+            gr.Checkbox,
+            section=section
+        )
+    )
    shared.opts.add_option(
        "animatediff_frame_extract_path",
        shared.OptionInfo(
-            "GIF",
+            None,
            "Path to save extracted frames",
            gr.Textbox,
            {"placeholder": "Leave empty to use default path: tmp/animatediff-frames"},
@ -40,15 +69,24 @@ def on_ui_settings():
        )
    )
    shared.opts.add_option(
-        "animatediff_save_to_custom",
+        "animatediff_frame_extract_remove",
        shared.OptionInfo(
-            True,
-            "Save frames to stable-diffusion-webui/outputs/{ txt|img }2img-images/AnimateDiff/{gif filename}/{date} "
-            "instead of stable-diffusion-webui/outputs/{ txt|img }2img-images/{date}/.",
+            False,
+            "Always remove extracted frames after processing",
            gr.Checkbox,
            section=section
        )
    )
+    shared.opts.add_option(
+        "animatediff_default_frame_extract_method",
+        shared.OptionInfo(
+            "ffmpeg",
+            "Default frame extraction method",
+            gr.Radio,
+            {"choices": ["ffmpeg", "opencv"]},
+            section=section
+        )
+    )

    # traditional video optimization specification
    shared.opts.add_option(
--- a/scripts/animatediff_ui.py
+++ b/scripts/animatediff_ui.py
@ -10,6 +10,7 @@ from modules.launch_utils import git
 from modules.processing import StableDiffusionProcessing, StableDiffusionProcessingImg2Img

 from scripts.animatediff_mm import mm_animatediff as motion_module
+from scripts.animatediff_xyz import xyz_attrs
 from scripts.animatediff_logger import logger_animatediff as logger
 from scripts.animatediff_utils import get_controlnet_units, extract_frames_from_video

@ -138,6 +139,11 @@ class AnimateDiffProcess:
        ), "At least one saving format should be selected."


+    def apply_xyz(self):
+        for k, v in xyz_attrs.items():
+            setattr(self, k, v)
+
+
    def set_p(self, p: StableDiffusionProcessing):
        self._check()
        if self.video_length < self.batch_size:
@ -153,9 +159,6 @@ class AnimateDiffProcess:
            p.do_not_save_samples = True

        cn_units = get_controlnet_units(p)
-        if not cn_units:
-            return
-
        min_batch_in_cn = -1
        for cn_unit in cn_units:
            # batch path broadcast
@ -168,7 +171,7 @@ class AnimateDiffProcess:
                cn_unit.batch_image_dir = self.video_path

            # mask path broadcast
-            if cn_unit.input_mode.name == 'BATCH' and not cn_unit.batch_mask_dir and self.mask_path:
+            if cn_unit.input_mode.name == 'BATCH' and self.mask_path and not cn_unit.batch_mask_dir:
                cn_unit.batch_mask_dir = self.mask_path

            # find minimun control images in CN batch
@ -186,7 +189,7 @@ class AnimateDiffProcess:
                    cur_batch_modifier = getattr(cn_unit, "batch_modifiers", [])
                    cur_batch_modifier.append(cn_batch_modifler)
                    cn_unit.batch_modifiers = cur_batch_modifier
-            self.post_setup_cn_for_i2i_batch(p)
+        self.post_setup_cn_for_i2i_batch(p)
        logger.info(f"AnimateDiff + ControlNet will generate {self.video_length} frames.")


@ -220,43 +223,58 @@ class AnimateDiffProcess:
 class AnimateDiffUiGroup:
    txt2img_submit_button = None
    img2img_submit_button = None
+    setting_sd_model_checkpoint = None
+    animatediff_ui_group = []

    def __init__(self):
        self.params = AnimateDiffProcess()
+        AnimateDiffUiGroup.animatediff_ui_group.append(self)


-    def render(self, is_img2img: bool, model_dir: str, infotext_fields, paste_field_names):
+    def get_model_list(self):
+        model_dir = motion_module.get_model_dir()
        if not os.path.isdir(model_dir):
            os.mkdir(model_dir)
+        def get_sd_rm_tag():
+            if shared.sd_model.is_sdxl:
+                return ["sd1"]
+            elif shared.sd_model.is_sd2:
+                return ["sd1, xl"]
+            elif shared.sd_model.is_sd1:
+                return ["xl"]
+            else:
+                return []
+        return [f for f in os.listdir(model_dir) if f != ".gitkeep" and not any(tag in f for tag in get_sd_rm_tag())]
+
+
+    def refresh_models(self, *inputs):
+        new_model_list = self.get_model_list()
+        dd = inputs[0]
+        if dd in new_model_list:
+            selected = dd
+        elif len(new_model_list) > 0:
+            selected = new_model_list[0]
+        else:
+            selected = None
+        return gr.Dropdown.update(choices=new_model_list, value=selected)
+
+
+    def render(self, is_img2img: bool, infotext_fields, paste_field_names):
        elemid_prefix = "img2img-ad-" if is_img2img else "txt2img-ad-"
-        model_list = [f for f in os.listdir(model_dir) if f != ".gitkeep"]
        with gr.Accordion("AnimateDiff", open=False):
            gr.Markdown(value="Please click [this link](https://github.com/continue-revolution/sd-webui-animatediff/blob/forge/master/docs/how-to-use.md#parameters) to read the documentation of each parameter.")
            with gr.Row():
-
-                def refresh_models(*inputs):
-                    new_model_list = [
-                        f for f in os.listdir(model_dir) if f != ".gitkeep"
-                    ]
-                    dd = inputs[0]
-                    if dd in new_model_list:
-                        selected = dd
-                    elif len(new_model_list) > 0:
-                        selected = new_model_list[0]
-                    else:
-                        selected = None
-                    return gr.Dropdown.update(choices=new_model_list, value=selected)
-
                with gr.Row():
+                    model_list = self.get_model_list()
                    self.params.model = gr.Dropdown(
                        choices=model_list,
-                        value=(self.params.model if self.params.model in model_list else None),
+                        value=(self.params.model if self.params.model in model_list else (model_list[0] if len(model_list) > 0 else None)),
                        label="Motion module",
                        type="value",
                        elem_id=f"{elemid_prefix}motion-module",
                    )
                    refresh_model = ToolButton(value="\U0001f504")
-                    refresh_model.click(refresh_models, self.params.model, self.params.model)
+                    refresh_model.click(self.refresh_models, self.params.model, self.params.model)

                self.params.format = gr.CheckboxGroup(
                    choices=supported_save_formats,
@ -435,3 +453,13 @@ class AnimateDiffUiGroup:
            AnimateDiffUiGroup.img2img_submit_button = component
            return

+        if elem_id == "setting_sd_model_checkpoint":
+            for group in AnimateDiffUiGroup.animatediff_ui_group:
+                component.change( # this step cannot success. I don't know why.
+                    fn=group.refresh_models,
+                    inputs=[group.params.model],
+                    outputs=[group.params.model],
+                    queue=False,
+                )
+            return
+
--- a/scripts/animatediff_utils.py
+++ b/scripts/animatediff_utils.py
@ -1,3 +1,4 @@
+import os
 import cv2
 import subprocess
 from pathlib import Path
@ -8,6 +9,22 @@ from modules.processing import StableDiffusionProcessing

 from scripts.animatediff_logger import logger_animatediff as logger

+def generate_random_hash(length=8):
+    import hashlib
+    import secrets
+
+    # Generate a random number or string
+    random_data = secrets.token_bytes(32)  # 32 bytes of random data
+
+    # Create a SHA-256 hash of the random data
+    hash_object = hashlib.sha256(random_data)
+    hash_hex = hash_object.hexdigest()
+
+    # Get the first 10 characters
+    if length > len(hash_hex):
+        length = len(hash_hex)
+    return hash_hex[:length]
+

 def get_animatediff_arg(p: StableDiffusionProcessing):
    """
@ -22,6 +39,7 @@ def get_animatediff_arg(p: StableDiffusionProcessing):
            if isinstance(animatediff_arg, dict):
                from scripts.animatediff_ui import AnimateDiffProcess
                animatediff_arg = AnimateDiffProcess(**animatediff_arg)
+                p.script_args = list(p.script_args)
                p.script_args[script.args_from] = animatediff_arg
            return animatediff_arg

@ -32,14 +50,24 @@ def get_controlnet_units(p: StableDiffusionProcessing):
    Get controlnet arguments from `p`.
    """
    if not p.scripts:
-        return None
+        return []

    for script in p.scripts.alwayson_scripts:
        if script.title().lower() == "controlnet":
            cn_units = p.script_args[script.args_from:script.args_to]
-            return [x for x in cn_units if x.enabled]

-    return None
+            if p.is_api and len(cn_units) > 0 and isinstance(cn_units[0], dict):
+               from lib_controlnet.external_code import ControlNetUnit
+               from lib_controlnet.enums import InputMode
+               cn_units_dataclass = [ControlNetUnit.from_dict(cn_unit_dict) for cn_unit_dict in cn_units]
+               for cn_unit_dataclass in cn_units_dataclass:
+                    if cn_unit_dataclass.image is None:
+                        cn_unit_dataclass.input_mode = InputMode.BATCH
+               p.script_args[script.args_from:script.args_to] = cn_units_dataclass
+
+            return [x for x in cn_units if x.enabled] if not p.is_api else cn_units
+
+    return []


 def ffmpeg_extract_frames(source_video: str, output_dir: str, extract_key: bool = False):
@ -80,9 +108,14 @@ def extract_frames_from_video(params):
    params.video_path = shared.opts.data.get(
        "animatediff_frame_extract_path",
        f"{data_path}/tmp/animatediff-frames")
-    params.video_path += f"{params.video_source}-{generate_random_hash()}"
+    if not params.video_path:
+        params.video_path = f"{data_path}/tmp/animatediff-frames"
+    params.video_path = os.path.join(params.video_path, f"{Path(params.video_source).stem}-{generate_random_hash()}")
    try:
-        ffmpeg_extract_frames(params.video_source, params.video_path)
+        if shared.opts.data.get("animatediff_default_frame_extract_method", "ffmpeg") == "opencv":
+            cv2_extract_frames(params.video_source, params.video_path)
+        else:
+            ffmpeg_extract_frames(params.video_source, params.video_path)
    except Exception as e:
        logger.error(f"[AnimateDiff] Error extracting frames via ffmpeg: {e}, fall back to OpenCV.")
        cv2_extract_frames(params.video_source, params.video_path)
--- a/scripts/animatediff_xyz.py
+++ b/scripts/animatediff_xyz.py
@ -0,0 +1,125 @@
+from types import ModuleType
+from typing import Optional
+
+from modules import scripts
+
+from scripts.animatediff_logger import logger_animatediff as logger
+
+xyz_attrs: dict = {}
+
+def patch_xyz():
+    xyz_module = find_xyz_module()
+    if xyz_module is None:
+        logger.warning("XYZ module not found.")
+        return
+    MODULE = "[AnimateDiff]"
+    xyz_module.axis_options.extend([
+        xyz_module.AxisOption(
+            label=f"{MODULE} Enabled",
+            type=str_to_bool,
+            apply=apply_state("enable"),
+            choices=choices_bool),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Motion Module",
+            type=str,
+            apply=apply_state("model")),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Video length",
+            type=int_or_float,
+            apply=apply_state("video_length")),
+        xyz_module.AxisOption(
+            label=f"{MODULE} FPS",
+            type=int_or_float,
+            apply=apply_state("fps")),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Use main seed",
+            type=str_to_bool,
+            apply=apply_state("use_main_seed"),
+            choices=choices_bool),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Closed loop",
+            type=str,
+            apply=apply_state("closed_loop"),
+            choices=lambda: ["N", "R-P", "R+P", "A"]),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Batch size",
+            type=int_or_float,
+            apply=apply_state("batch_size")),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Stride",
+            type=int_or_float,
+            apply=apply_state("stride")),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Overlap",
+            type=int_or_float,
+            apply=apply_state("overlap")),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Interp",
+            type=str_to_bool,
+                apply=apply_state("interp"),
+            choices=choices_bool),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Interp X",
+            type=int_or_float,
+            apply=apply_state("interp_x")),
+        xyz_module.AxisOption(
+            label=f"{MODULE} Video path",
+            type=str,
+            apply=apply_state("video_path")),
+        xyz_module.AxisOptionImg2Img(
+            label=f"{MODULE} Latent power",
+            type=int_or_float,
+            apply=apply_state("latent_power")),
+        xyz_module.AxisOptionImg2Img(
+            label=f"{MODULE} Latent scale",
+            type=int_or_float,
+            apply=apply_state("latent_scale")),
+        xyz_module.AxisOptionImg2Img(
+            label=f"{MODULE} Latent power last",
+            type=int_or_float,
+            apply=apply_state("latent_power_last")),
+        xyz_module.AxisOptionImg2Img(
+            label=f"{MODULE} Latent scale last",
+            type=int_or_float,
+            apply=apply_state("latent_scale_last")),
+        ])
+
+
+def apply_state(k, key_map=None):
+    def callback(_p, v, _vs):
+        if key_map is not None:
+            v = key_map[v]
+        xyz_attrs[k] = v
+
+    return callback
+
+
+def str_to_bool(string):
+    string = str(string)
+    if string in ["None", ""]:
+        return None
+    elif string.lower() in ["true", "1"]:
+        return True
+    elif string.lower() in ["false", "0"]:
+        return False
+    else:
+        raise ValueError(f"Could not convert string to boolean: {string}")
+
+
+def int_or_float(string):
+    try:
+        return int(string)
+    except ValueError:
+        return float(string)
+
+
+def choices_bool():
+    return ["False", "True"]
+
+
+def find_xyz_module() -> Optional[ModuleType]:
+    for data in scripts.scripts_data:
+        if data.script_class.__module__ in {"xyz_grid.py", "xy_grid.py"} and hasattr(data, "module"):
+            return data.module
+
+    return None