animatelcm

2024-07-12 23:58:17 +08:00 · 2024-07-12 23:58:17 +08:00 · b7ce6550cd
parent 69a2395ec5
commit b7ce6550cd
4 changed files with 16 additions and 8 deletions
--- a/README.md
+++ b/README.md
@ -2,8 +2,6 @@
 > I have recently added a non-commercial [license](https://creativecommons.org/licenses/by-nc-sa/4.0/) to this extension. If you want to use this extension for commercial purpose, please contact me via email.
 > It seems that WebUI v1.9.0 has some major mess-up. Please do not use this WebUI version. You can use either v1.8.0 or v1.9.3 (latest).
 This extension aim for integrating [AnimateDiff](https://github.com/guoyww/AnimateDiff/) with [CLI](https://github.com/s9roll7/animatediff-cli-prompt-travel) into [AUTOMATIC1111 Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) with [ControlNet](https://github.com/Mikubill/sd-webui-controlnet), and form the most easy-to-use AI video toolkit. You can generate GIFs in exactly the same way as generating images after enabling this extension.
 This extension implements AnimateDiff in a different way. It inserts motion modules into UNet at runtime, so that you do not need to reload your model weights if you don't want to.
@ -18,13 +16,15 @@ You might also be interested in another extension I created: [Segment Anything f
 ## Update
- [v2.0.0-a](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.0-a) in `03/02/2023`: The whole extension has been reworked to make it easier to maintain.
+- [v2.0.0-a](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.0-a) in `03/02/2024`: The whole extension has been reworked to make it easier to maintain.
  - Prerequisite: WebUI >= 1.8.0 & ControlNet >=1.1.441 & PyTorch >= 2.0.0
  - New feature:
      - ControlNet inpaint / IP-Adapter prompt travel / SparseCtrl / ControlNet keyframe, see [ControlNet V2V](docs/features.md#controlnet-v2v)
      - FreeInit, see [FreeInit](docs/features.md#FreeInit)
  - Minor: mm filter based on sd version (click refresh button if you switch between SD1.5 and SDXL) / display extension version in infotext
  - Breaking change: You must use Motion LoRA, Hotshot-XL, AnimateDiff V3 Motion Adapter from my [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main).
 - [v2.0.1-a](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.1-a) in `07/12/2024`: Support [AnimateLCM](https://github.com/G-U-N/AnimateLCM) from MMLab@CUHK. See [here](docs/features.md#animatelcm) for instruction.
 ## Future Plan
 Although [OpenAI Sora](https://openai.com/sora) is far better at following complex text prompts and generating complex scenes, we believe that OpenAI will NOT open source Sora or any other other products they released recently. My current plan is to continue developing this extension until when an open-sourced video model is released, with strong ability to generate complex scenes, easy customization and good ecosystem like SD1.5.
@ -44,7 +44,7 @@ I am maintaining a [huggingface repo](https://huggingface.co/conrevo/AnimateDiff
 ## Documentation
 - [How to Use](docs/how-to-use.md) -> [Preparation](docs/how-to-use.md#preparation) | [WebUI](docs/how-to-use.md#webui) | [API](docs/how-to-use.md#api) | [Parameters](docs/how-to-use.md#parameters)
- [Features](docs/features.md) -> [Img2Vid](docs/features.md#img2vid) | [Prompt Travel](docs/features.md#prompt-travel) | [ControlNet V2V](docs/features.md#controlnet-v2v) | [ [Model Spec](docs/features.md#model-spec) -> [Motion LoRA](docs/features.md#motion-lora) | [V3](docs/features.md#v3) | [SDXL](docs/features.md#sdxl) ]
+- [Features](docs/features.md) -> [Img2Vid](docs/features.md#img2vid) | [Prompt Travel](docs/features.md#prompt-travel) | [ControlNet V2V](docs/features.md#controlnet-v2v) | [ [Model Spec](docs/features.md#model-spec) -> [Motion LoRA](docs/features.md#motion-lora) | [V3](docs/features.md#v3) | [SDXL](docs/features.md#sdxl) | [AnimateLCM](docs/features.md#animatelcm) ]
 - [Performance](docs/performance.md) -> [ [Optimizations](docs/performance.md#optimizations) -> [Attention](docs/performance.md#attention) | [FP8](docs/performance.md#fp8) | [LCM](docs/performance.md#lcm) ] | [VRAM](docs/performance.md#vram) | [Batch Size](docs/performance.md#batch-size)
 - [Demo](docs/demo.md) -> [Basic Usage](docs/demo.md#basic-usage) | [Motion LoRA](docs/demo.md#motion-lora) | [Prompt Travel](docs/demo.md#prompt-travel) | [AnimateDiff V3](docs/demo.md#animatediff-v3) | [AnimateDiff XL](docs/demo.md#animatediff-xl) | [ControlNet V2V](docs/demo.md#controlnet-v2v)
--- a/docs/features.md
+++ b/docs/features.md
@ -109,6 +109,11 @@ There are a lot of amazing demo online. Here I provide a very simple demo. The d
 ### V3
 AnimateDiff V3 has identical state dict keys as V1 but slightly different inference logic (GroupNorm is not hacked for V3). You may optionally use [adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) for V3, in the same way as how you apply LoRA. You MUST use [my link](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) instead of the [official link](https://huggingface.co/guoyww/animatediff/resolve/main/v3_sd15_adapter.ckpt?download=true). The official adapter won't work for A1111 due to state dict incompatibility.
 ### AnimateLCM
 - You can download the motion module from [here](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/motion_module/mm_sd15_AnimateLCM.safetensors?download=true). The [original weights](https://huggingface.co/wangfuyun/AnimateLCM/resolve/main/AnimateLCM_sd15_t2v.ckpt?download=true) should also work, but I recommend using my safetensors fp16 version.
 - You should also download Motion LoRA from [here](https://huggingface.co/wangfuyun/AnimateLCM/resolve/main/AnimateLCM_sd15_t2v_lora.safetensors?download=true) and use it like any LoRA.
 - You should use LCM sampler and a low CFG scale (typically 1-2).
 ### SDXL
 [AnimateDiff-XL](https://github.com/guoyww/AnimateDiff/tree/sdxl) and [HotShot-XL](https://github.com/hotshotco/Hotshot-XL) have identical architecture to AnimateDiff-SD1.5. The only difference are
 - HotShot-XL is trained with 8 frames instead of 16 frames. You are recommended to set `Context batch size` to 8 for HotShot-XL.
--- a/motion_module.py
+++ b/motion_module.py
@ -15,6 +15,7 @@ class MotionModuleType(Enum):
    AnimateDiffV2 = "AnimateDiff V2, Yuwei Guo, Shanghai AI Lab"
    AnimateDiffV3 = "AnimateDiff V3, Yuwei Guo, Shanghai AI Lab"
    AnimateDiffXL = "AnimateDiff SDXL, Yuwei Guo, Shanghai AI Lab"
    AnimateLCM    = "AnimateLCM, Fu-Yun Wang, MMLab@CUHK"
    SparseCtrl = "SparseCtrl, Yuwei Guo, Shanghai AI Lab"
    HotShotXL = "HotShot-XL, John Mullan, Natural Synthetics Inc"
@ -23,6 +24,8 @@ class MotionModuleType(Enum):
    def get_mm_type(state_dict: dict[str, torch.Tensor]):
        keys = list(state_dict.keys())
        if any(["mid_block" in k for k in keys]):
            if not any(["pe" in k for k in keys]):
                return MotionModuleType.AnimateLCM
            return MotionModuleType.AnimateDiffV2
        elif any(["down_blocks.3" in k for k in keys]):
            if 32 in next((state_dict[key] for key in state_dict if 'pe' in key), None).shape:
@ -49,7 +52,7 @@ class MotionWrapper(nn.Module):
        self.mm_name = mm_name
        self.mm_type = mm_type
        self.mm_hash = mm_hash
-        max_len = 24 if self.enable_gn_hack() else 32
+        max_len = 64 if mm_type == MotionModuleType.AnimateLCM else (24 if self.enable_gn_hack() else 32)
        in_channels = (320, 640, 1280) if self.is_xl else (320, 640, 1280, 1280)
        self.down_blocks = nn.ModuleList([])
        self.up_blocks = nn.ModuleList([])
@ -59,7 +62,7 @@ class MotionWrapper(nn.Module):
            else:
                self.down_blocks.append(MotionModule(c, num_mm=2, max_len=max_len, operations=operations))
                self.up_blocks.insert(0,MotionModule(c, num_mm=3, max_len=max_len, operations=operations))
-        if mm_type in [MotionModuleType.AnimateDiffV2]:
+        if self.is_v2:
            self.mid_block = MotionModule(1280, num_mm=1, max_len=max_len, operations=operations)
@ -83,7 +86,7 @@ class MotionWrapper(nn.Module):
    @property
    def is_v2(self):
-        return self.mm_type == MotionModuleType.AnimateDiffV2
+        return self.mm_type in [MotionModuleType.AnimateDiffV2, MotionModuleType.AnimateLCM]
 class MotionModule(nn.Module):
--- a/scripts/animatediff_mm.py
+++ b/scripts/animatediff_mm.py
@ -49,7 +49,7 @@ class AnimateDiffMM:
            logger.info(f"Guessed {model_name} architecture: {model_type}")
            mm_config = dict(mm_name=model_name, mm_hash=model_hash, mm_type=model_type)
            self.mm = MotionWrapper(**mm_config)
-            self.mm.load_state_dict(mm_state_dict)
+            self.mm.load_state_dict(mm_state_dict, strict=not model_type==MotionModuleType.AnimateLCM)
        self.mm.to(device).eval()
        if not shared.cmd_opts.no_half:
            self.mm.half()