parent
69a2395ec5
commit
b7ce6550cd
|
|
@ -2,8 +2,6 @@
|
||||||
|
|
||||||
> I have recently added a non-commercial [license](https://creativecommons.org/licenses/by-nc-sa/4.0/) to this extension. If you want to use this extension for commercial purpose, please contact me via email.
|
> I have recently added a non-commercial [license](https://creativecommons.org/licenses/by-nc-sa/4.0/) to this extension. If you want to use this extension for commercial purpose, please contact me via email.
|
||||||
|
|
||||||
> It seems that WebUI v1.9.0 has some major mess-up. Please do not use this WebUI version. You can use either v1.8.0 or v1.9.3 (latest).
|
|
||||||
|
|
||||||
This extension aim for integrating [AnimateDiff](https://github.com/guoyww/AnimateDiff/) with [CLI](https://github.com/s9roll7/animatediff-cli-prompt-travel) into [AUTOMATIC1111 Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) with [ControlNet](https://github.com/Mikubill/sd-webui-controlnet), and form the most easy-to-use AI video toolkit. You can generate GIFs in exactly the same way as generating images after enabling this extension.
|
This extension aim for integrating [AnimateDiff](https://github.com/guoyww/AnimateDiff/) with [CLI](https://github.com/s9roll7/animatediff-cli-prompt-travel) into [AUTOMATIC1111 Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) with [ControlNet](https://github.com/Mikubill/sd-webui-controlnet), and form the most easy-to-use AI video toolkit. You can generate GIFs in exactly the same way as generating images after enabling this extension.
|
||||||
|
|
||||||
This extension implements AnimateDiff in a different way. It inserts motion modules into UNet at runtime, so that you do not need to reload your model weights if you don't want to.
|
This extension implements AnimateDiff in a different way. It inserts motion modules into UNet at runtime, so that you do not need to reload your model weights if you don't want to.
|
||||||
|
|
@ -18,13 +16,15 @@ You might also be interested in another extension I created: [Segment Anything f
|
||||||
|
|
||||||
|
|
||||||
## Update
|
## Update
|
||||||
- [v2.0.0-a](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.0-a) in `03/02/2023`: The whole extension has been reworked to make it easier to maintain.
|
- [v2.0.0-a](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.0-a) in `03/02/2024`: The whole extension has been reworked to make it easier to maintain.
|
||||||
- Prerequisite: WebUI >= 1.8.0 & ControlNet >=1.1.441 & PyTorch >= 2.0.0
|
- Prerequisite: WebUI >= 1.8.0 & ControlNet >=1.1.441 & PyTorch >= 2.0.0
|
||||||
- New feature:
|
- New feature:
|
||||||
- ControlNet inpaint / IP-Adapter prompt travel / SparseCtrl / ControlNet keyframe, see [ControlNet V2V](docs/features.md#controlnet-v2v)
|
- ControlNet inpaint / IP-Adapter prompt travel / SparseCtrl / ControlNet keyframe, see [ControlNet V2V](docs/features.md#controlnet-v2v)
|
||||||
- FreeInit, see [FreeInit](docs/features.md#FreeInit)
|
- FreeInit, see [FreeInit](docs/features.md#FreeInit)
|
||||||
- Minor: mm filter based on sd version (click refresh button if you switch between SD1.5 and SDXL) / display extension version in infotext
|
- Minor: mm filter based on sd version (click refresh button if you switch between SD1.5 and SDXL) / display extension version in infotext
|
||||||
- Breaking change: You must use Motion LoRA, Hotshot-XL, AnimateDiff V3 Motion Adapter from my [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main).
|
- Breaking change: You must use Motion LoRA, Hotshot-XL, AnimateDiff V3 Motion Adapter from my [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main).
|
||||||
|
- [v2.0.1-a](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.1-a) in `07/12/2024`: Support [AnimateLCM](https://github.com/G-U-N/AnimateLCM) from MMLab@CUHK. See [here](docs/features.md#animatelcm) for instruction.
|
||||||
|
|
||||||
|
|
||||||
## Future Plan
|
## Future Plan
|
||||||
Although [OpenAI Sora](https://openai.com/sora) is far better at following complex text prompts and generating complex scenes, we believe that OpenAI will NOT open source Sora or any other other products they released recently. My current plan is to continue developing this extension until when an open-sourced video model is released, with strong ability to generate complex scenes, easy customization and good ecosystem like SD1.5.
|
Although [OpenAI Sora](https://openai.com/sora) is far better at following complex text prompts and generating complex scenes, we believe that OpenAI will NOT open source Sora or any other other products they released recently. My current plan is to continue developing this extension until when an open-sourced video model is released, with strong ability to generate complex scenes, easy customization and good ecosystem like SD1.5.
|
||||||
|
|
@ -44,7 +44,7 @@ I am maintaining a [huggingface repo](https://huggingface.co/conrevo/AnimateDiff
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
- [How to Use](docs/how-to-use.md) -> [Preparation](docs/how-to-use.md#preparation) | [WebUI](docs/how-to-use.md#webui) | [API](docs/how-to-use.md#api) | [Parameters](docs/how-to-use.md#parameters)
|
- [How to Use](docs/how-to-use.md) -> [Preparation](docs/how-to-use.md#preparation) | [WebUI](docs/how-to-use.md#webui) | [API](docs/how-to-use.md#api) | [Parameters](docs/how-to-use.md#parameters)
|
||||||
- [Features](docs/features.md) -> [Img2Vid](docs/features.md#img2vid) | [Prompt Travel](docs/features.md#prompt-travel) | [ControlNet V2V](docs/features.md#controlnet-v2v) | [ [Model Spec](docs/features.md#model-spec) -> [Motion LoRA](docs/features.md#motion-lora) | [V3](docs/features.md#v3) | [SDXL](docs/features.md#sdxl) ]
|
- [Features](docs/features.md) -> [Img2Vid](docs/features.md#img2vid) | [Prompt Travel](docs/features.md#prompt-travel) | [ControlNet V2V](docs/features.md#controlnet-v2v) | [ [Model Spec](docs/features.md#model-spec) -> [Motion LoRA](docs/features.md#motion-lora) | [V3](docs/features.md#v3) | [SDXL](docs/features.md#sdxl) | [AnimateLCM](docs/features.md#animatelcm) ]
|
||||||
- [Performance](docs/performance.md) -> [ [Optimizations](docs/performance.md#optimizations) -> [Attention](docs/performance.md#attention) | [FP8](docs/performance.md#fp8) | [LCM](docs/performance.md#lcm) ] | [VRAM](docs/performance.md#vram) | [Batch Size](docs/performance.md#batch-size)
|
- [Performance](docs/performance.md) -> [ [Optimizations](docs/performance.md#optimizations) -> [Attention](docs/performance.md#attention) | [FP8](docs/performance.md#fp8) | [LCM](docs/performance.md#lcm) ] | [VRAM](docs/performance.md#vram) | [Batch Size](docs/performance.md#batch-size)
|
||||||
- [Demo](docs/demo.md) -> [Basic Usage](docs/demo.md#basic-usage) | [Motion LoRA](docs/demo.md#motion-lora) | [Prompt Travel](docs/demo.md#prompt-travel) | [AnimateDiff V3](docs/demo.md#animatediff-v3) | [AnimateDiff XL](docs/demo.md#animatediff-xl) | [ControlNet V2V](docs/demo.md#controlnet-v2v)
|
- [Demo](docs/demo.md) -> [Basic Usage](docs/demo.md#basic-usage) | [Motion LoRA](docs/demo.md#motion-lora) | [Prompt Travel](docs/demo.md#prompt-travel) | [AnimateDiff V3](docs/demo.md#animatediff-v3) | [AnimateDiff XL](docs/demo.md#animatediff-xl) | [ControlNet V2V](docs/demo.md#controlnet-v2v)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -109,6 +109,11 @@ There are a lot of amazing demo online. Here I provide a very simple demo. The d
|
||||||
### V3
|
### V3
|
||||||
AnimateDiff V3 has identical state dict keys as V1 but slightly different inference logic (GroupNorm is not hacked for V3). You may optionally use [adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) for V3, in the same way as how you apply LoRA. You MUST use [my link](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) instead of the [official link](https://huggingface.co/guoyww/animatediff/resolve/main/v3_sd15_adapter.ckpt?download=true). The official adapter won't work for A1111 due to state dict incompatibility.
|
AnimateDiff V3 has identical state dict keys as V1 but slightly different inference logic (GroupNorm is not hacked for V3). You may optionally use [adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) for V3, in the same way as how you apply LoRA. You MUST use [my link](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) instead of the [official link](https://huggingface.co/guoyww/animatediff/resolve/main/v3_sd15_adapter.ckpt?download=true). The official adapter won't work for A1111 due to state dict incompatibility.
|
||||||
|
|
||||||
|
### AnimateLCM
|
||||||
|
- You can download the motion module from [here](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/motion_module/mm_sd15_AnimateLCM.safetensors?download=true). The [original weights](https://huggingface.co/wangfuyun/AnimateLCM/resolve/main/AnimateLCM_sd15_t2v.ckpt?download=true) should also work, but I recommend using my safetensors fp16 version.
|
||||||
|
- You should also download Motion LoRA from [here](https://huggingface.co/wangfuyun/AnimateLCM/resolve/main/AnimateLCM_sd15_t2v_lora.safetensors?download=true) and use it like any LoRA.
|
||||||
|
- You should use LCM sampler and a low CFG scale (typically 1-2).
|
||||||
|
|
||||||
### SDXL
|
### SDXL
|
||||||
[AnimateDiff-XL](https://github.com/guoyww/AnimateDiff/tree/sdxl) and [HotShot-XL](https://github.com/hotshotco/Hotshot-XL) have identical architecture to AnimateDiff-SD1.5. The only difference are
|
[AnimateDiff-XL](https://github.com/guoyww/AnimateDiff/tree/sdxl) and [HotShot-XL](https://github.com/hotshotco/Hotshot-XL) have identical architecture to AnimateDiff-SD1.5. The only difference are
|
||||||
- HotShot-XL is trained with 8 frames instead of 16 frames. You are recommended to set `Context batch size` to 8 for HotShot-XL.
|
- HotShot-XL is trained with 8 frames instead of 16 frames. You are recommended to set `Context batch size` to 8 for HotShot-XL.
|
||||||
|
|
|
||||||
|
|
@ -15,6 +15,7 @@ class MotionModuleType(Enum):
|
||||||
AnimateDiffV2 = "AnimateDiff V2, Yuwei Guo, Shanghai AI Lab"
|
AnimateDiffV2 = "AnimateDiff V2, Yuwei Guo, Shanghai AI Lab"
|
||||||
AnimateDiffV3 = "AnimateDiff V3, Yuwei Guo, Shanghai AI Lab"
|
AnimateDiffV3 = "AnimateDiff V3, Yuwei Guo, Shanghai AI Lab"
|
||||||
AnimateDiffXL = "AnimateDiff SDXL, Yuwei Guo, Shanghai AI Lab"
|
AnimateDiffXL = "AnimateDiff SDXL, Yuwei Guo, Shanghai AI Lab"
|
||||||
|
AnimateLCM = "AnimateLCM, Fu-Yun Wang, MMLab@CUHK"
|
||||||
SparseCtrl = "SparseCtrl, Yuwei Guo, Shanghai AI Lab"
|
SparseCtrl = "SparseCtrl, Yuwei Guo, Shanghai AI Lab"
|
||||||
HotShotXL = "HotShot-XL, John Mullan, Natural Synthetics Inc"
|
HotShotXL = "HotShot-XL, John Mullan, Natural Synthetics Inc"
|
||||||
|
|
||||||
|
|
@ -23,6 +24,8 @@ class MotionModuleType(Enum):
|
||||||
def get_mm_type(state_dict: dict[str, torch.Tensor]):
|
def get_mm_type(state_dict: dict[str, torch.Tensor]):
|
||||||
keys = list(state_dict.keys())
|
keys = list(state_dict.keys())
|
||||||
if any(["mid_block" in k for k in keys]):
|
if any(["mid_block" in k for k in keys]):
|
||||||
|
if not any(["pe" in k for k in keys]):
|
||||||
|
return MotionModuleType.AnimateLCM
|
||||||
return MotionModuleType.AnimateDiffV2
|
return MotionModuleType.AnimateDiffV2
|
||||||
elif any(["down_blocks.3" in k for k in keys]):
|
elif any(["down_blocks.3" in k for k in keys]):
|
||||||
if 32 in next((state_dict[key] for key in state_dict if 'pe' in key), None).shape:
|
if 32 in next((state_dict[key] for key in state_dict if 'pe' in key), None).shape:
|
||||||
|
|
@ -49,7 +52,7 @@ class MotionWrapper(nn.Module):
|
||||||
self.mm_name = mm_name
|
self.mm_name = mm_name
|
||||||
self.mm_type = mm_type
|
self.mm_type = mm_type
|
||||||
self.mm_hash = mm_hash
|
self.mm_hash = mm_hash
|
||||||
max_len = 24 if self.enable_gn_hack() else 32
|
max_len = 64 if mm_type == MotionModuleType.AnimateLCM else (24 if self.enable_gn_hack() else 32)
|
||||||
in_channels = (320, 640, 1280) if self.is_xl else (320, 640, 1280, 1280)
|
in_channels = (320, 640, 1280) if self.is_xl else (320, 640, 1280, 1280)
|
||||||
self.down_blocks = nn.ModuleList([])
|
self.down_blocks = nn.ModuleList([])
|
||||||
self.up_blocks = nn.ModuleList([])
|
self.up_blocks = nn.ModuleList([])
|
||||||
|
|
@ -59,7 +62,7 @@ class MotionWrapper(nn.Module):
|
||||||
else:
|
else:
|
||||||
self.down_blocks.append(MotionModule(c, num_mm=2, max_len=max_len, operations=operations))
|
self.down_blocks.append(MotionModule(c, num_mm=2, max_len=max_len, operations=operations))
|
||||||
self.up_blocks.insert(0,MotionModule(c, num_mm=3, max_len=max_len, operations=operations))
|
self.up_blocks.insert(0,MotionModule(c, num_mm=3, max_len=max_len, operations=operations))
|
||||||
if mm_type in [MotionModuleType.AnimateDiffV2]:
|
if self.is_v2:
|
||||||
self.mid_block = MotionModule(1280, num_mm=1, max_len=max_len, operations=operations)
|
self.mid_block = MotionModule(1280, num_mm=1, max_len=max_len, operations=operations)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -83,7 +86,7 @@ class MotionWrapper(nn.Module):
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def is_v2(self):
|
def is_v2(self):
|
||||||
return self.mm_type == MotionModuleType.AnimateDiffV2
|
return self.mm_type in [MotionModuleType.AnimateDiffV2, MotionModuleType.AnimateLCM]
|
||||||
|
|
||||||
|
|
||||||
class MotionModule(nn.Module):
|
class MotionModule(nn.Module):
|
||||||
|
|
|
||||||
|
|
@ -49,7 +49,7 @@ class AnimateDiffMM:
|
||||||
logger.info(f"Guessed {model_name} architecture: {model_type}")
|
logger.info(f"Guessed {model_name} architecture: {model_type}")
|
||||||
mm_config = dict(mm_name=model_name, mm_hash=model_hash, mm_type=model_type)
|
mm_config = dict(mm_name=model_name, mm_hash=model_hash, mm_type=model_type)
|
||||||
self.mm = MotionWrapper(**mm_config)
|
self.mm = MotionWrapper(**mm_config)
|
||||||
self.mm.load_state_dict(mm_state_dict)
|
self.mm.load_state_dict(mm_state_dict, strict=not model_type==MotionModuleType.AnimateLCM)
|
||||||
self.mm.to(device).eval()
|
self.mm.to(device).eval()
|
||||||
if not shared.cmd_opts.no_half:
|
if not shared.cmd_opts.no_half:
|
||||||
self.mm.half()
|
self.mm.half()
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue