1.13.0 (#373)

* 1.13.0 * readme * readme * Update README.md * Update README.md * readme
2023-12-19 04:09:32 -06:00 · 2023-12-19 04:09:32 -06:00 · 83dc4d0f0c
parent b37a648d80
commit 83dc4d0f0c
6 changed files with 59 additions and 46 deletions
--- a/README.md
+++ b/README.md
@ -13,10 +13,12 @@ You might also be interested in another extension I created: [Segment Anything f
  - [API](#api)
 - [WebUI Parameters](#webui-parameters)
 - [Img2GIF](#img2gif)
- [Motion LoRA](#motion-lora)
 - [Prompt Travel](#prompt-travel)
 - [ControlNet V2V](#controlnet-v2v)
- [SDXL](#sdxl)
+- [Model Spec](#model-spec)
+  - [Motion LoRA](#motion-lora)
+  - [V3](#v3)
+  - [SDXL](#sdxl)
 - [Optimizations](#optimizations)
  - [Attention](#attention)
  - [FP8](#fp8)
@ -29,7 +31,8 @@ You might also be interested in another extension I created: [Segment Anything f
  - [Basic Usage](#basic-usage)
  - [Motion LoRA](#motion-lora-1)
  - [Prompt Travel](#prompt-travel-1)
-  - [AnimateDiff + SDXL](#animatediff--sdxl)
+  - [AnimateDiff V3](#animatediff-v3)
+  - [AnimateDiff SDXL](#animatediff-sdxl)
  - [ControlNet V2V](#controlnet-v2v-1)
 - [Tutorial](#tutorial)
 - [Thanks](#thanks)
@ -57,12 +60,13 @@ You might also be interested in another extension I created: [Segment Anything f
 - `2023/10/19`: [v1.9.3](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.9.3): Support webp output format. See [#233](https://github.com/continue-revolution/sd-webui-animatediff/pull/233) for more information.
 - `2023/10/21`: [v1.9.4](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.9.4): Save prompt travel to output images, `Reverse` merged to `Closed loop` (See [WebUI Parameters](#webui-parameters)), remove `TimestepEmbedSequential` hijack, remove `hints.js`, better explanation of several context-related parameters.
 - `2023/10/25`: [v1.10.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.10.0): Support img2img batch. You need ControlNet installed to make it work properly (you do not need to enable ControlNet). See [ControlNet V2V](#controlnet-v2v) for more information.
- `2023/10/29`: [v1.11.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.11.0): Support [HotShot-XL](https://github.com/hotshotco/Hotshot-XL) for SDXL. See [SDXL](#sdxl) for more information.
+- `2023/10/29`: [v1.11.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.11.0): [HotShot-XL](https://github.com/hotshotco/Hotshot-XL) supported. See [SDXL](#sdxl) for more information.
 - `2023/11/06`: [v1.11.1](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.11.1): Optimize VRAM for ControlNet V2V, patch [encode_pil_to_base64](https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/api/api.py#L104-L133) for api return a video, save frames to `AnimateDiff/yy-mm-dd/`, recover from assertion error, optional [request id](#api) for API.
- `2023/11/10`: [v1.12.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.12.0): [AnimateDiff for SDXL](https://github.com/guoyww/AnimateDiff/tree/sdxl) supported. See [SDXL](#sdxl) for more information. You need to add `--disable-safe-unpickle` to your command line arguments to get rid of the bad file error.
+- `2023/11/10`: [v1.12.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.12.0): [AnimateDiff for SDXL](https://github.com/guoyww/AnimateDiff/tree/sdxl) supported. See [SDXL](#sdxl) for more information.
 - `2023/11/16`: [v1.12.1](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.12.1): FP8 precision and LCM sampler supported. See [Optimizations](#optimizations) for more information. You can also optionally upload videos to AWS S3 storage by configuring appropriately via `Settings/AnimateDiff AWS`.
+- `2023/12/19`: [v1.13.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.13.0): [AnimateDiff V3](https://github.com/guoyww/AnimateDiff?tab=readme-ov-file#202312-animatediff-v3-and-sparsectrl) supported. See [V3](#v3) for more information. Also: release all official models in fp16 & safetensors format [here](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main), add option to disable LCM sampler in `Settings/AnimateDiff`, remove patch [encode_pil_to_base64](https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/api/api.py#L104-L133) because A1111 [v1.7.0](https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/v1.7.0) now supports video return for API. 

-For future update plan, please query [#366](https://github.com/continue-revolution/sd-webui-animatediff/pull/366).
+For future update plan, please query [#366](https://github.com/continue-revolution/sd-webui-animatediff/pull/366). `v1.13.x` is the last version update for `v1`. SparseCtrl, Magic Animate and other control methods will be supported in `v2` via updating both this repo and sd-webui-controlnet.


 ## How to Use
@ -152,9 +156,9 @@ It is quite similar to the way you use ControlNet. API will return a video in ba

 Please read
 - [Img2GIF](#img2gif) for extra parameters on img2gif panel.
- [Motion LoRA](#motion-lora) for how to use Motion LoRA.
 - [Prompt Travel](#prompt-travel) for how to trigger prompt travel.
 - [ControlNet V2V](#controlnet-v2v) for how to use ControlNet V2V.
+- [Model Spec](#model-spec) for how to use Motion LoRA, V3 and SDXL.


 ## Img2GIF
@ -169,10 +173,6 @@ init_latent = init_latent * init_alpha + random_tensor * (1 - init_alpha)
 If you upload a last frame: your `init_latent` will be changed in a similar way. Read [this code](https://github.com/continue-revolution/sd-webui-animatediff/tree/v1.5.0/scripts/animatediff_latent.py#L28-L65) to understand how it works.


-## Motion LoRA
-[Download](https://huggingface.co/guoyww/animatediff) and use them like any other LoRA you use (example: download motion lora to `stable-diffusion-webui/models/Lora` and add `<lora:v2_lora_PanDown:0.8>` to your positive prompt). **Motion LoRA only supports V2 motion modules**.
-
-
 ## Prompt Travel
 Write positive prompt following the example below.

@ -203,8 +203,16 @@ For people who want to inpaint videos: enter a folder which contains two sub-fol
 AnimateDiff in img2img batch will be available in [v1.10.0](https://github.com/continue-revolution/sd-webui-animatediff/pull/224).


-## SDXL
+## Model Spec
+### Motion LoRA
+[Download](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/lora) and use them like any other LoRA you use (example: download motion lora to `stable-diffusion-webui/models/Lora` and add `<lora:mm_sd15_v2_lora_PanLeft:0.8>` to your positive prompt). **Motion LoRA only supports V2 motion modules**.

+### V3
+V3 has identical state dict keys as V1 but slightly different inference logic (GroupNorm is not hacked for V3). This extension identifies V3 via checking "v3" and "sd15" are substrings of the model filename (for example, both `v3_sd15_mm.ckpt` and `mm_sd15_v3.safetensors` contain `v3` and `sd15`). You should NOT change the filename of the official V3 motion module (either from my link or from the official link), and you should make sure that filenames of V3 community models contain both `v3` and `sd15`; filenames of V1 community models cannot contain `v3` and `sd15` at the same time. Other motion modules are identified by guessing from the state dict, so they are not affected.
+
+You may optionally use [adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) for V3, in the same way as the way you use LoRA. You MUST use [my link](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) instead of the [official link](https://huggingface.co/guoyww/animatediff/resolve/main/v3_sd15_adapter.ckpt?download=true). The official adapter won't work for A1111 due to state dict incompatibility.
+
+### SDXL
 [AnimateDiffXL](https://github.com/guoyww/AnimateDiff/tree/sdxl) and [HotShot-XL](https://github.com/hotshotco/Hotshot-XL) have identical architecture to AnimateDiff-SD1.5. The only 2 difference are
 - HotShot-XL is trained with 8 frames instead of 16 frames. You are recommended to set `Context batch size` to 8 for HotShot-XL.
 - AnimateDiffXL is still trained with 16 frames. You do not need to change `Context batch size` for AnimateDiffXL.
@ -215,8 +223,6 @@ Although AnimateDiffXL & HotShot-XL have identical structure with AnimateDiff-SD

 Technically all features available for AnimateDiff + SD1.5 are also available for (AnimateDiff / HotShot) + SDXL. However, I have not tested all of them. I have tested infinite context generation and prompt travel; I have not tested ControlNet. If you find any bug, please report it to me.

-For download link, please read [Model Zoo](#model-zoo). For VRAM usage, please read [VRAM](#vram). For demo, please see [demo](#animatediff--sdxl).
-

 ## Optimizations
 Optimizations can be significantly helpful if you want to improve speed and reduce VRAM usage. With [attention optimization](#attention), [FP8](#fp8) and unchecking `Batch cond/uncond` in `Settings/Optimization`, I am able to run 4 x ControlNet + AnimateDiff + Stable Diffusion to generate 36 frames of 1024 * 1024 images with 18GB VRAM.
@ -248,12 +254,12 @@ Benefits of using this extension instead of [sd-webui-lcm](https://github.com/0x


 ## Model Zoo
- `mm_sd_v14.ckpt` & `mm_sd_v15.ckpt` & `mm_sd_v15_v2.ckpt` & `mm_sdxl_v10_beta.ckpt` by [@guoyww](https://github.com/guoyww): [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI) | [HuggingFace](https://huggingface.co/guoyww/animatediff/tree/main) | [CivitAI](https://civitai.com/models/108836)
- `mm_sd_v14.safetensors` & `mm_sd_v15.safetensors` & `mm_sd_v15_v2.safetensors` by [@neph1](https://github.com/neph1): [HuggingFace](https://huggingface.co/guoyww/animatediff/tree/refs%2Fpr%2F3)
- `mm_sd_v14.fp16.safetensors` & `mm_sd_v15.fp16.safetensors` & `mm_sd_v15_v2.fp16.safetensors` by [@neggles](https://huggingface.co/neggles/): [HuggingFace](https://huggingface.co/neggles/)
- `mm-Stabilized_high.pth` & `mm-Stabbilized_mid.pth` by [@manshoety](https://huggingface.co/manshoety): [HuggingFace](https://huggingface.co/manshoety/AD_Stabilized_Motion/tree/main)
- `temporaldiff-v1-animatediff.ckpt` by [@CiaraRowles](https://huggingface.co/CiaraRowles): [HuggingFace](https://huggingface.co/CiaraRowles/TemporalDiff/tree/main)
- `hsxl_temporal_layers.safetensors` & `hsxl_tenporal_layers.f16.safetensors` by [@hotshotco](https://huggingface.co/hotshotco/): [HuggingFace](https://huggingface.co/hotshotco/Hotshot-XL/tree/main)
+I am maintaining a [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main) to provide all official models in fp16 & safetensors format. You are highly recommended to use my link. You MUST use my link to download adapter for V3. You may still use the old links if you want, for all models except adapter for V3.
+
+- "Official" models by [@guoyww](https://github.com/guoyww): [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI) | [HuggingFace](https://huggingface.co/guoyww/animatediff/tree/main) | [CivitAI](https://civitai.com/models/108836)
+- "Stabilized" community models by [@manshoety](https://huggingface.co/manshoety): [HuggingFace](https://huggingface.co/manshoety/AD_Stabilized_Motion/tree/main)
+- "TemporalDiff" models by [@CiaraRowles](https://huggingface.co/CiaraRowles): [HuggingFace](https://huggingface.co/CiaraRowles/TemporalDiff/tree/main)
+- "HotShotXL" models by [@hotshotco](https://huggingface.co/hotshotco/): [HuggingFace](https://huggingface.co/hotshotco/Hotshot-XL/tree/main)


 ## VRAM
@ -296,7 +302,12 @@ We are currently developing approach to support batch size on WebUI in the near

 The prompt is similar to [above](#prompt-travel).

-### AnimateDiff + SDXL
+### AnimateDiff V3
+You should be able to read infotext to understand how I generated this sample.
+![00024-3973810345](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/5f3e3858-8033-4a16-94b0-4dbc0d0a67fc)
+
+
+### AnimateDiff SDXL
 You should be able to read infotext to understand how I generated this sample.
 ![00025-1668075705](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/6d32daf9-51c6-490f-a942-db36f84f23cf)

--- a/motion_module.py
+++ b/motion_module.py
@ -15,19 +15,23 @@ import math
 class MotionModuleType(Enum):
    AnimateDiffV1 = "AnimateDiff V1, Yuwei GUo, Shanghai AI Lab"
    AnimateDiffV2 = "AnimateDiff V2, Yuwei Guo, Shanghai AI Lab"
+    AnimateDiffV3 = "AnimateDiff V3, Yuwei Guo, Shanghai AI Lab"
    AnimateDiffXL = "AnimateDiff SDXL, Yuwei Guo, Shanghai AI Lab"
    HotShotXL = "HotShot-XL, John Mullan, Natural Synthetics Inc"


    @staticmethod
-    def get_mm_type(state_dict: dict):
+    def get_mm_type(state_dict: dict, filename: str = None):
        keys = list(state_dict.keys())
        if any(["mid_block" in k for k in keys]):
            return MotionModuleType.AnimateDiffV2
        elif any(["temporal_attentions" in k for k in keys]):
            return MotionModuleType.HotShotXL
        elif any(["down_blocks.3" in k for k in keys]):
-            return MotionModuleType.AnimateDiffV1
+            if "v3" in filename and "sd15" in filename:
+                return MotionModuleType.AnimateDiffV3
+            else:
+                return MotionModuleType.AnimateDiffV1
        else:
            return MotionModuleType.AnimateDiffXL

@ -43,10 +47,11 @@ class MotionWrapper(nn.Module):
    def __init__(self, mm_name: str, mm_hash: str, mm_type: MotionModuleType):
        super().__init__()
        self.is_v2 = mm_type == MotionModuleType.AnimateDiffV2
+        self.is_v3 = mm_type == MotionModuleType.AnimateDiffV3
        self.is_hotshot = mm_type == MotionModuleType.HotShotXL
        self.is_adxl = mm_type == MotionModuleType.AnimateDiffXL
        self.is_xl = self.is_hotshot or self.is_adxl
-        max_len = 32 if (self.is_v2 or self.is_adxl) else 24
+        max_len = 32 if (self.is_v2 or self.is_adxl or self.is_v3) else 24
        in_channels = (320, 640, 1280) if (self.is_hotshot or self.is_adxl) else (320, 640, 1280, 1280)
        self.down_blocks = nn.ModuleList([])
        self.up_blocks = nn.ModuleList([])
--- a/scripts/animatediff.py
+++ b/scripts/animatediff.py
@ -210,6 +210,15 @@ def on_ui_settings():
            section=section
        )
    )
+    shared.opts.add_option(
+        "animatediff_disable_lcm",
+        shared.OptionInfo(
+            False,
+            "Disable LCM",
+            gr.Checkbox,
+            section=section
+        )
+    )
    shared.opts.add_option(
        "animatediff_s3_enable",
        shared.OptionInfo(
--- a/scripts/animatediff_lcm.py
+++ b/scripts/animatediff_lcm.py
@ -117,6 +117,9 @@ class AnimateDiffLCM:

    @staticmethod
    def hack_kdiff_ui():
+        if shared.opts.data.get("animatediff_disable_lcm", False):
+            return
+
        if AnimateDiffLCM.lcm_ui_injected:
            logger.info(f"LCM UI already injected.")
            return
--- a/scripts/animatediff_mm.py
+++ b/scripts/animatediff_mm.py
@ -39,7 +39,7 @@ class AnimateDiffMM:
            logger.info(f"Loading motion module {model_name} from {model_path}")
            model_hash = hashes.sha256(model_path, f"AnimateDiff/{model_name}")
            mm_state_dict = sd_models.read_state_dict(model_path)
-            model_type = MotionModuleType.get_mm_type(mm_state_dict)
+            model_type = MotionModuleType.get_mm_type(mm_state_dict, model_name)
            logger.info(f"Guessed {model_name} architecture: {model_type}")
            self.mm = MotionWrapper(model_name, model_hash, model_type)
            missed_keys = self.mm.load_state_dict(mm_state_dict)
@ -67,7 +67,7 @@ class AnimateDiffMM:
        if self.mm.is_v2:
            logger.info(f"Injecting motion module {model_name} into {sd_ver} UNet middle block.")
            unet.middle_block.insert(-1, self.mm.mid_block.motion_modules[0])
-        elif not self.mm.is_adxl:
+        elif not (self.mm.is_adxl or self.mm.is_v3):
            logger.info(f"Hacking {sd_ver} GroupNorm32 forward function.")
            if self.mm.is_hotshot:
                from sgm.modules.diffusionmodules.util import GroupNorm32
@ -137,7 +137,7 @@ class AnimateDiffMM:
        if self.mm.is_v2:
            logger.info(f"Removing motion module from {sd_ver} UNet middle block.")
            unet.middle_block.pop(-2)
-        elif not self.mm.is_adxl:
+        elif not (self.mm.is_adxl or self.mm.is_v3):
            logger.info(f"Restoring {sd_ver} GroupNorm32 forward function.")
            if self.mm.is_hotshot:
                from sgm.modules.diffusionmodules.util import GroupNorm32
--- a/scripts/animatediff_output.py
+++ b/scripts/animatediff_output.py
@ -16,9 +16,6 @@ from scripts.animatediff_ui import AnimateDiffProcess


 class AnimateDiffOutput:
-    api_encode_pil_to_base64_hooked = False
-
-
    def output(self, p: StableDiffusionProcessing, res: Processed, params: AnimateDiffProcess):
        video_paths = []
        logger.info("Merging images into GIF.")
@ -41,22 +38,10 @@ class AnimateDiffOutput:
            frame_list = self._interp(p, params, frame_list, filename)
            video_paths += self._save(params, frame_list, video_path_prefix, res, i)

-        if len(video_paths) > 0:
-            if p.is_api:
-                if not AnimateDiffOutput.api_encode_pil_to_base64_hooked:
-                    # TODO: remove this hook when WebUI is updated to v1.7.0
-                    logger.info("Hooking api.encode_pil_to_base64 to encode video to base64")
-                    AnimateDiffOutput.api_encode_pil_to_base64_hooked = True
-                    from modules.api import api
-                    api_encode_pil_to_base64 = api.encode_pil_to_base64
-                    def hooked_encode_pil_to_base64(image):
-                        if isinstance(image, str):
-                            return image
-                        return api_encode_pil_to_base64(image)
-                    api.encode_pil_to_base64 = hooked_encode_pil_to_base64
-                res.images = self._encode_video_to_b64(video_paths) + (frame_list if 'Frame' in params.format else [])
-            else:
-                res.images = video_paths
+        if len(video_paths) == 0:
+            return
+
+        res.images = video_paths if not p.is_api else (self._encode_video_to_b64(video_paths) + (frame_list if 'Frame' in params.format else []))


    def _add_reverse(self, params: AnimateDiffProcess, frame_list: list):