Documentation for ControlNet (#441)

* doc * Update features.md * Update features.md
2024-03-03 01:30:02 -06:00 · 2024-03-03 01:30:02 -06:00 · 95ae9905ed
parent e3c58bac40
commit 95ae9905ed
2 changed files with 46 additions and 13 deletions
--- a/README.md
+++ b/README.md
@ -1,5 +1,5 @@
 # AnimateDiff for Stable Diffusion WebUI
-This extension aim for integrating [AnimateDiff](https://github.com/guoyww/AnimateDiff/) w/ [CLI](https://github.com/s9roll7/animatediff-cli-prompt-travel) into [AUTOMATIC1111 Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) w/ [ControlNet](https://github.com/Mikubill/sd-webui-controlnet), and form the most easy-to-use AI video toolkit. You can generate GIFs in exactly the same way as generating images after enabling this extension.
+This extension aim for integrating [AnimateDiff](https://github.com/guoyww/AnimateDiff/) with [CLI](https://github.com/s9roll7/animatediff-cli-prompt-travel) into [AUTOMATIC1111 Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) with [ControlNet](https://github.com/Mikubill/sd-webui-controlnet), and form the most easy-to-use AI video toolkit. You can generate GIFs in exactly the same way as generating images after enabling this extension.

 This extension implements AnimateDiff in a different way. It inserts motion modules into UNet at runtime, so that you do not need to reload your model weights if you don't want to.

@ -7,7 +7,7 @@ You might also be interested in another extension I created: [Segment Anything f

 [Forge](https://github.com/lllyasviel/stable-diffusion-webui-forge) users should either checkout branch [forge/master](https://github.com/continue-revolution/sd-webui-animatediff/tree/forge/master) in this repository or use [sd-forge-animatediff](https://github.com/continue-revolution/sd-forge-animatediff). They will be in sync.

-[TusiArt](https://tusiart.com/) (for users physically inside P.R.China mainland) and [TensorArt](https://tensor.art/) (for others) offers online service of this extension.
+[TusiArt](https://tusiart.com/) (for users inside P.R.China mainland) and [TensorArt](https://tensor.art/) (for others) offers online service of this extension.


 ## Table of Contents
@ -19,7 +19,7 @@ You might also be interested in another extension I created: [Segment Anything f
  - Prerequisite: WebUI >= 1.8.0 & ControlNet >=1.1.441
  - New feature: ControlNet inpaint / IP-Adapter prompt travel / SparseCtrl / ControlNet keyframe, see [ControlNet V2V](docs/features.md#controlnet-v2v)
  - Minor: mm filter based on sd version (click refresh button if you switch between SD1.5 and SDXL) / display extension version in infotext
-  - Breaking change: You must use my Motion LoRA, my Hotshot-XL, my AnimateDiff V3 Motion Adapter from my [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main).
+  - Breaking change: You must use Motion LoRA, Hotshot-XL, AnimateDiff V3 Motion Adapter from my [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main).

 ## Future Plan
 Although [OpenAI Sora](https://openai.com/sora) is far better at following complex text prompts and generating complex scenes, we believe that OpenAI will NOT open source Sora or any other other products they released recently. My current plan is to continue developing this extension until when an open-sourced video model is released, with strong ability to generate complex scenes, easy customization and good ecosystem like SD1.5.
@ -30,7 +30,7 @@ That said, due to the notorious difficulty in maintaining [sd-webui-controlnet](


 ## Model Zoo
-I am maintaining a [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main) to provide all official models in fp16 & safetensors format. You are highly recommended to use my link. You MUST use my link to download AnimateDiff V3 Motion Adapter, SparseCtrl and HotShot-XL. You may still use the old links if you want, for all other models
+I am maintaining a [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main) to provide all official models in fp16 & safetensors format. You are highly recommended to use my link. You MUST use my link to download Motion LoRA, Hotshot-XL, AnimateDiff V3 Motion Adapter. You may still use the old links if you want, for all other models

 - "Official" models by [@guoyww](https://github.com/guoyww): [Google Drive](https://drive.google.com/drive/folders/1EqLC65eR1-W-sGD0Im7fkED6c8GkiNFI) | [HuggingFace](https://huggingface.co/guoyww/animatediff/tree/main) | [CivitAI](https://civitai.com/models/108836)
 - "Stabilized" community models by [@manshoety](https://huggingface.co/manshoety): [HuggingFace](https://huggingface.co/manshoety/AD_Stabilized_Motion/tree/main)
@ -54,7 +54,7 @@ We thank all developers and community users who contribute to this repository in
 - [@limbo0000](https://github.com/limbo0000) for responding to my questions about AnimateDiff
 - [@neggles](https://github.com/neggles) and [@s9roll7](https://github.com/s9roll7) for developing [AnimateDiff CLI Prompt Travel](https://github.com/s9roll7/animatediff-cli-prompt-travel)
 - [@zappityzap](https://github.com/zappityzap) for developing the majority of the [output features](https://github.com/continue-revolution/sd-webui-animatediff/blob/master/scripts/animatediff_output.py)
- [@lllyasvlel](https://github.com/lllyasviel) for adding me as a collaborator of sd-webui-controlnet and offering technical support for Forge
+- [@lllyasviel](https://github.com/lllyasviel) for adding me as a collaborator of sd-webui-controlnet and offering technical support for Forge
 - [@KohakuBlueleaf](https://github.com/KohakuBlueleaf) for helping with FP8 and LCM development
 - [@TDS4874](https://github.com/TDS4874) and [@opparco](https://github.com/opparco) for resolving the grey issue which significantly improve the performance
 - [@streamline](https://twitter.com/kaizirod) for providing ControlNet V2V dataset and workflow. His workflow is extremely amazing and definitely worth checking out.
--- a/docs/features.md
+++ b/docs/features.md
@ -32,29 +32,62 @@ smile


 ## ControlNet V2V
-> Sample configuration for ControlNet V2V, ControlNet inpaint, IP-Adapter prompt travel, SparseCtrl, ControlNet keyframe, img2img batch will be updated in a day.
-
 You need to go to txt2img / img2img-batch and submit source video or path to frames. Each ControlNet will find control images according to this priority:
 1. ControlNet `Single Image` tab or `Batch Folder` tab. Simply upload a control image or a path to folder of control frames is enough.
 1. Img2img Batch tab `Input directory` if you are using img2img batch. If you upload a directory of control frames, it will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet panel.
 1. AnimateDiff `Video Path`. If you upload a path to frames through `Video Path`, it will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet panel.
 1. AnimateDiff `Video Source`. If you upload a video through `Video Source`, it will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet panel.

-`Number of frames` will be capped to the minimum number of images among all **folders** you provide. Each control image in each folder will be applied to one single frame. If you upload one single image for a ControlNet unit, that image will control **ALL** frames.
+`Number of frames` will be capped to the minimum number of images among all **folders** you provide, unless it has a "keyframe" parameter.
+
+**SparseCtrl**: Sparse ControlNet is for video generation with key frames. If you upload one image in "single image" tab, it will control the following frames to follow your first frame (a **probably** better way to do img2vid). If you upload a path in "batch" tab, with "keyframe" parameter in a new line (see below), it will attempt to do video frame interpolation. Note that I don't think this ControlNet has a comparable performance to those trained by [@lllyasviel](https://github.com/lllyasviel). Use at your own risk.

 Example input parameter fill-in:
-1. Specify different `Input Directory` and `Mask Directory` for different individual ControlNet Units.
-![cn1](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/c1e0e04d-d2c7-4150-89e1-3befe2f3c750)
+1. Fill-in seperate control inputs for different ControlNet units.
+   1. Control all frames with a single control input. Exception: SparseCtrl will only control the first frame in this way.
+      | IP-Adapter | Output |
+      | --- | --- |
+      | ![ipadapter-single](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/82ef7455-168a-40a5-95a7-e7b22cf86dc8) | ![ipadapter-single](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/2539c84f-8775-4697-a0ec-006c9fafef1c) |
+   1. Control each frame with a seperate control input. You are encouraged to try multi-ControlNet.
+      | Canny | Output |
+      | --- | --- |
+      | ![controlnet-batch](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/71ed300d-5c3e-42d8-aed1-6d8d4c442941) | ![00005-1961300716](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/8e7d8f92-2816-47be-baad-8dd63e0cc1a1) |
+   1. ControlNet inpaint unit: You are encouraged to use my [Segment Anything](https://github.com/continue-revolution/sd-webui-segment-anything) extension to automatically draw mask / generate masks in batch.
+      - specify a global image and draw mask on it, or upload a mask. White region is where changes will apply.
+      - "mask" parameter for ControlNet inpaint in batch. Type "ctrl + enter" to start a new line and fill in "mask" parameter in format `mask:/path/to/mask/frames/`.
+
+      | single image | batch |
+      | --- | --- |
+      | ![inpaint-single](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/c0804da5-b2fb-4669-bd09-fb9fb3f2782b) | ![inpaint-batch](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/db5e09d9-d192-4a38-b56c-402407232eb1) |
+   1. "keyframe" parameter.
+      - **IP-Adapter**: this parameter means "IP-Adapter prompt travel". See image below for explanation.
+        ![ipadapter-keyframe](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/51a625cf-0ad5-4dfd-be71-644cc53764eb)
+        You will see terminal log like
+        ```bash
+        ControlNet - INFO - AnimateDiff + ControlNet ip-adapter_clip_sd15 receive the following parameters:
+        ControlNet - INFO -   batch control images: /home/conrevo/SD/dataset/upperbodydataset/mask/key-ipadapter/
+        ControlNet - INFO -   batch control keyframe index: [0, 6, 12, 18]
+        ```
+        ```bash
+        ControlNet - INFO - IP-Adapter: control prompts will be traveled in the following way:
+        ControlNet - INFO -   0: /home/conrevo/SD/dataset/upperbodydataset/mask/key-ipadapter/anime_girl_head_1.png
+        ControlNet - INFO -   6: /home/conrevo/SD/dataset/upperbodydataset/mask/key-ipadapter/anime_girl_head_2.png
+        ControlNet - INFO -   12: /home/conrevo/SD/dataset/upperbodydataset/mask/key-ipadapter/anime_girl_head_3.png
+        ControlNet - INFO -   18: /home/conrevo/SD/dataset/upperbodydataset/mask/key-ipadapter/anime_girl_head_4.png
+        ```
+      - **SparseCtrl**: this parameter means keyframe. SparseCtrl has its special processing for keyframe logic. Specify this parameter in the same way as IP-Adapter above.
+      - All other ControlNets: we insert blank control image for you, and the control latent for that frame will be purely zero. Specify this parameter in the same way as IP-Adapter above.
 1. Specify a global `Videl path` and `Mask path` and leave ControlNet Unit `Input Directory` input blank.
    - You can arbitratily change ControlNet Unit tab to `Single Image` / `Batch Folder` / `Batch Upload` as long as you leave it blank.
    - If you specify a global mask path, all ControlNet Units that you do not give a `Mask Directory` will use this path.
    - Please only have one of `Video source` and `Video path`. They cannot be applied at the same time.
-![cn2](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/dc8d71df-60ea-4dd9-a040-b7bd35161587)
+    ![cn2](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/dc8d71df-60ea-4dd9-a040-b7bd35161587)
+1. img2img batch. See the screenshot below.![i2i-batch](https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/58110cfe-ac57-4403-817b-82e9126b938a)

 There are a lot of amazing demo online. Here I provide a very simple demo. The dataset is from [streamline](https://twitter.com/kaizirod), but the workflow is an arbitrary setup by me. You can find a lot more much more amazing examples (and potentially available workflows / infotexts) on Reddit, Twitter, YouTube and Bilibili. The easiest way to share your workflow created by my software is to share one output frame with infotext.
 | input | output |
 | --- | --- |
-| <img height='512px' src='https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/ff066808-fc00-43e1-a2a6-b16e41dad603'> | <img height='512px' src='https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/dc3d833b-a113-4278-9e48-5f2a8ee06704'> |
+| <img height='512px' src='https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/ff066808-fc00-43e1-a2a6-b16e41dad603'> | <img height='512px' src='https://github.com/continue-revolution/sd-webui-animatediff/assets/63914308/5aab1f9f-245d-45e9-ba71-1b902bc6ea40'> |


 ## Model Spec
@ -64,7 +97,7 @@ There are a lot of amazing demo online. Here I provide a very simple demo. The d
 [Download](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main/lora) and use them like any other LoRA you use (example: download Motion LoRA to `stable-diffusion-webui/models/Lora` and add `<lora:mm_sd15_v2_lora_PanLeft:0.8>` to your positive prompt). **Motion LoRAs can only be applied to V2 motion module**.

 ### V3
-V3 has identical state dict keys as V1 but slightly different inference logic (GroupNorm is not hacked for V3). You may optionally use [adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) for V3, in the same way as how you apply LoRA. You MUST use [my link](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) instead of the [official link](https://huggingface.co/guoyww/animatediff/resolve/main/v3_sd15_adapter.ckpt?download=true). The official adapter won't work for A1111 due to state dict incompatibility.
+AnimateDiff V3 has identical state dict keys as V1 but slightly different inference logic (GroupNorm is not hacked for V3). You may optionally use [adapter](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) for V3, in the same way as how you apply LoRA. You MUST use [my link](https://huggingface.co/conrevo/AnimateDiff-A1111/resolve/main/lora/mm_sd15_v3_adapter.safetensors?download=true) instead of the [official link](https://huggingface.co/guoyww/animatediff/resolve/main/v3_sd15_adapter.ckpt?download=true). The official adapter won't work for A1111 due to state dict incompatibility.

 ### SDXL
 [AnimateDiff-XL](https://github.com/guoyww/AnimateDiff/tree/sdxl) and [HotShot-XL](https://github.com/hotshotco/Hotshot-XL) have identical architecture to AnimateDiff-SD1.5. The only difference are