* restart to try infv2v

* infv2v works now, but cn+infv2v is not working

* finally works

* readme
pull/156/head
Chengsong Zhang 2023-09-29 23:01:17 -05:00 committed by GitHub
parent dee63eee74
commit 7e9f3830f5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 216 additions and 80 deletions

View File

@ -20,7 +20,7 @@ You might also be interested in another extension I created: [Segment Anything f
1. Go to txt2img if you want to try txt2gif and img2img if you want to try img2gif.
1. Choose an SD1.5 checkpoint, write prompts, set configurations such as image width/height. If you want to generate multiple GIFs at once, please change batch number, instead of batch size.
1. Enable AnimateDiff extension, and set up each parameter, and click `Generate`.
1. **Number of frames**[**!!!Infinite V2V feature, DO NOT CHANGE, KEEP IT 0!!!**] Choose whatever number you like.
1. **Number of frames** — Choose whatever number you like.
If you enter 0 (default):
- If you submit a video via `Video source` / enter a video path via `Video path` / enable ANY batch ControlNet, the number of frames will be the number of frames in the video (use shortest if more than one videos are submitted).
@ -30,9 +30,9 @@ You might also be interested in another extension I created: [Segment Anything f
1. **FPS** — Frames per second, which is how many frames (images) are shown every second. If 16 frames are generated at 8 frames per second, your GIFs duration is 2 seconds. If you submit a source video, your FPS will be the same as the source video.
1. **Display loop number** — How many times the GIF is played. A value of `0` means the GIF never stops playing.
1. **Batch size** — How many frames will be passed into the motion module at once. The model is trained with 16 frames, so itll give the best results when the number of frames is set to `16`. Choose [1, 24] for V1 motion modules and [1, 32] for V2 motion modules.
1. **Closed loop**[**!!!Infinite V2V feature, DO NOT CHANGE, KEEP IT False!!!**] If you enable this option and your number of frames is greater than your batch size, this extension will try to make the last frame the same as the first frame.
1. **Stride**[**!!!Infinite V2V feature, DO NOT CHANGE, KEEP IT 1!!!**] Max motion stride as a power of 2 (default: 1).
1. **Overlap**[**!!!Infinite V2V feature, DO NOT CHANGE, KEEP IT -1!!!**] Number of frames to overlap in context. If overlap is -1 (default): your overlap will be `Batch size` // 4.
1. **Closed loop** — If you enable this option and your number of frames is greater than your batch size, this extension will try to make the last frame the same as the first frame.
1. **Stride** — Max motion stride as a power of 2 (default: 1).
1. **Overlap** — Number of frames to overlap in context. If overlap is -1 (default): your overlap will be `Batch size` // 4.
1. **Save** — Format of the output. Choose at least one of "GIF"|"MP4"|"PNG". Check "TXT" if you want infotext, which will live in the same directory as the output GIF.
1. You can optimize GIF with `gifsicle` (`apt install gifsicle` required, read [#91](https://github.com/continue-revolution/sd-webui-animatediff/pull/91) for more information) and/or `palette` (read [#104](https://github.com/continue-revolution/sd-webui-animatediff/pull/104) for more information). Go to `Settings/AnimateDiff` to enable them.
1. **Reverse** — Append reversed frames to your output. See [#112](https://github.com/continue-revolution/sd-webui-animatediff/issues/112) for instruction.
@ -97,9 +97,10 @@ Just like how you use ControlNet. Here is a sample. You will get a list of gener
- `2023/09/19`: [v1.5.1](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.5.1): support xformers, sdp, sub-quadratic attention optimization - VRAM usage decrease to 5.60GB with default setting. See [FAQ](#faq) 1st item for more information.
- `2023/09/22`: [v1.5.2](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.5.2): option to disable xformers at `Settings/AnimateDiff` [due to a bug in xformers](https://github.com/facebookresearch/xformers/issues/845), API support, option to enable GIF paletter optimization at `Settings/AnimateDiff` (credit to [@rkfg](https://github.com/rkfg)), gifsicle optimization move to `Settings/AnimateDiff`.
- `2023/09/25`: [v1.6.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.6.0): [motion LoRA](https://github.com/guoyww/AnimateDiff#features) supported. Download and use them like any other LoRA you use (example: download motion lora to `stable-diffusion-webui/models/Lora` and add `<lora:v2_lora_PanDown:0.8>` to your positive prompt). **Motion LoRA only supports V2 motion modules**.
- `2023/09/27`: [v1.7.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.7.0): [ControlNet](https://github.com/Mikubill/sd-webui-controlnet) supported. Please closely follow the instructions in [How to Use](#how-to-use), especially the explanation of `Video source` and `Video path` attributes. ControlNet is way more complex than what I can test and I ask you to test for me. Please submit an issue whenever you find a bug. [Demo and video instructions](#demo-and-video-instructions) are coming soon. Safetensors for some motion modules are also available now. See [model zoo](#motion-module-model-zoo). You may want to check `Do not append detectmap to output` in `Settings/ControlNet` to avoid having a series of control images in your output gallery. You should not change some attributes in your extension UI because they are for infinite v2v, see [WebUI](#webui) for what you should not change.
- `2023/09/27`: [v1.7.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.7.0): [ControlNet](https://github.com/Mikubill/sd-webui-controlnet) supported. Please closely follow the instructions in [How to Use](#how-to-use), especially the explanation of `Video source` and `Video path` attributes. ControlNet is way more complex than what I can test and I ask you to test for me. Please submit an issue whenever you find a bug. You may want to check `Do not append detectmap to output` in `Settings/ControlNet` to avoid having a series of control images in your output gallery. [Safetensors](#motion-module-model-zoo) for some motion modules are also available now.
- `2023/09/29`: [v1.8.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.8.0): Infinite generation (with/without ControlNet) supported. [Demo and video instructions](#demo-and-video-instructions) are coming soon.
Infinite V2V, Prompt Travel and other CLI features are currently work in progress inside [#121](https://github.com/continue-revolution/sd-webui-animatediff/pull/121). Stay tuned and they should be released within a week.
Prompt Travel and other CLI features are currently work in progress inside [#71](https://github.com/continue-revolution/sd-webui-animatediff/pull/71). Stay tuned and they should be released soon.
## FAQ
1. Q: How much VRAM do I need?
@ -115,10 +116,6 @@ Infinite V2V, Prompt Travel and other CLI features are currently work in progres
A: You will have to wait for someone to train SDXL-specific motion modules which will have a different model architecture. This extension essentially inject multiple motion modules into SD1.5 UNet. It does not work for other variations of SD, such as SD2.1 and SDXL.
1. Q: Can I override the limitation of 24/32 frames per generation?
A: Not at this time, but will be supported via supporting [AnimateDIFF CLI Prompt Travel](https://github.com/s9roll7/animatediff-cli-prompt-travel) in the near future. This is a huge amount of work and life is busy, so expect to wait for a long time before updating.
## Demo and Video Instructions

View File

@ -47,8 +47,8 @@ class AnimateDiffScript(scripts.Script):
motion_module.inject(p.sd_model, params.model)
self.lora_hacker = AnimateDiffLora(motion_module.mm.using_v2)
self.lora_hacker.hack()
# self.cfg_hacker = AnimateDiffInfV2V(p)
# self.cfg_hacker.hack(params)
self.cfg_hacker = AnimateDiffInfV2V(p)
self.cfg_hacker.hack(params)
self.cn_hacker = AnimateDiffControl(p)
self.cn_hacker.hack(params)
@ -67,7 +67,7 @@ class AnimateDiffScript(scripts.Script):
if isinstance(params, dict): params = AnimateDiffProcess(**params)
if params.enable:
self.cn_hacker.restore()
# self.cfg_hacker.restore()
self.cfg_hacker.restore()
self.lora_hacker.restore()
motion_module.restore(p.sd_model)
AnimateDiffOutput().output(p, res, params)

View File

@ -97,8 +97,9 @@ class AnimateDiffControl:
params.video_length = video_length
if params.batch_size > video_length:
params.batch_size = video_length
# if params.video_length == 0: # TODO: support inf length
# params.video_length = video_length
if params.video_default:
params.video_length = video_length
p.batch_size = video_length
for unit in units:
if getattr(unit, 'input_mode', InputMode.SIMPLE) == InputMode.BATCH:
unit.batch_images = unit.batch_images[:params.video_length]

View File

@ -1,6 +1,14 @@
from typing import List
import numpy as np
from modules.sd_samplers_cfg_denoiser import CFGDenoiser
from modules.prompt_parser import MulticondLearnedConditioning
import torch
from modules import prompt_parser, devices, sd_samplers_common, shared
from modules.shared import opts, state
from modules.script_callbacks import CFGDenoiserParams, cfg_denoiser_callback
from modules.script_callbacks import CFGDenoisedParams, cfg_denoised_callback
from modules.script_callbacks import AfterCFGCallbackParams, cfg_after_cfg_callback
from modules.sd_samplers_cfg_denoiser import CFGDenoiser, catenate_conds, subscript_cond, pad_cond
from scripts.animatediff_logger import logger_animatediff as logger
from scripts.animatediff_ui import AnimateDiffProcess
@ -55,78 +63,205 @@ class AnimateDiffInfV2V:
video_length + pad + (0 if closed_loop else -overlap),
(batch_size * context_step - overlap),
):
batch_list = [e % video_length for e in range(j, j + batch_size * context_step, context_step)]
if not closed_loop and batch_list[-1] < batch_list[0]:
batch_list_end = batch_list[: video_length - batch_list[0]]
batch_list_front = batch_list[video_length - batch_list[0] :]
if len(batch_list_end) < len(batch_list_front):
batch_list_front_end = batch_list_front[-1]
for i in range(len(batch_list_end)):
batch_list_front.append(batch_list_front_end + i + 1)
yield batch_list_front
else:
batch_list_end_front = batch_list_end[0]
for i in range(len(batch_list_front)):
batch_list_end.insert(0, batch_list_end_front - i - 1)
yield batch_list_end
else:
yield batch_list
yield [e % video_length for e in range(j, j + batch_size * context_step, context_step)]
def hack(self, params: AnimateDiffProcess):
logger.info(f"Hacking CFGDenoiser forward function.")
self.cfg_original_forward = CFGDenoiser.forward
cfg_original_forward = self.cfg_original_forward
cn_script = self.cn_script
def mm_cfg_forward(self, x, sigma, uncond, cond, cond_scale, s_min_uncond, image_cond):
for context in AnimateDiffInfV2V.uniform(self.step, params.video_length, params.batch_size, params.stride, params.overlap, params.closed_loop):
# take control images for current context.
# controlllite is for sdxl and we do not support it. reserve here for future use is needed.
if cn_script is not None and cn_script.latest_network is not None:
from scripts.hook import ControlModelType
for control in cn_script.latest_network.control_params:
if control.hint_cond.shape[0] > len(context):
control.hint_cond_backup = control.hint_cond
control.hint_cond = control.hint_cond[context]
if control.hr_hint_cond is not None and control.hr_hint_cond.shape[0] > len(context):
control.hr_hint_cond_backup = control.hr_hint_cond
control.hr_hint_cond = control.hr_hint_cond[context]
if control.control_model_type == ControlModelType.IPAdapter and control.control_model.image_emb.shape[0] > len(context):
control.control_model.image_emb_backup = control.control_model.image_emb
control.control_model.image_emb = control.control_model.image_emb[context]
control.control_model.uncond_image_emb_backup = control.control_model.uncond_image_emb
control.control_model.uncond_image_emb = control.control_model.uncond_image_emb[context]
# if control.control_model_type == ControlModelType.Controlllite:
# for module in control.control_model.modules.values():
# if module.cond_image.shape[0] > len(context):
# module.cond_image_backup = module.cond_image
# module.set_cond_image(module.cond_image[context])
# run original forward function for the current context
# TODO: what to do with cond?
x[context] = cfg_original_forward(
self, x[context], sigma[context], [uncond[i] for i in context],
MulticondLearnedConditioning(len(context), [cond.batch[i] for i in context]),
cond_scale, s_min_uncond, image_cond[context])
# restore control images for next context
if cn_script is not None and cn_script.latest_network is not None:
from scripts.hook import ControlModelType
for control in cn_script.latest_network.control_params:
if control.hint_cond.shape[0] > len(context):
control.hint_cond = control.hint_cond_backup
if control.hr_hint_cond is not None and control.hr_hint_cond.shape[0] > len(context):
control.hr_hint_cond = control.hr_hint_cond_backup
if control.control_model_type == ControlModelType.IPAdapter and control.control_model.image_emb.shape[0] > len(context):
control.control_model.image_emb = control.control_model.image_emb_backup
control.control_model.uncond_image_emb = control.control_model.uncond_image_emb_backup
# if control.control_model_type == ControlModelType.Controlllite:
# for module in control.control_model.modules.values():
# if module.cond_image.shape[0] > len(context):
# module.set_cond_image(module.cond_image_backup)
def mm_cn_select(context: List[int]):
# take control images for current context.
# controlllite is for sdxl and we do not support it. reserve here for future use is needed.
if cn_script is not None and cn_script.latest_network is not None:
from scripts.hook import ControlModelType
for control in cn_script.latest_network.control_params:
if control.hint_cond.shape[0] > len(context):
control.hint_cond_backup = control.hint_cond
control.hint_cond = control.hint_cond[context]
if control.hr_hint_cond is not None and control.hr_hint_cond.shape[0] > len(context):
control.hr_hint_cond_backup = control.hr_hint_cond
control.hr_hint_cond = control.hr_hint_cond[context]
if control.control_model_type == ControlModelType.IPAdapter and control.control_model.image_emb.shape[0] > len(context):
control.control_model.image_emb_backup = control.control_model.image_emb
control.control_model.image_emb = control.control_model.image_emb[context]
control.control_model.uncond_image_emb_backup = control.control_model.uncond_image_emb
control.control_model.uncond_image_emb = control.control_model.uncond_image_emb[context]
# if control.control_model_type == ControlModelType.Controlllite:
# for module in control.control_model.modules.values():
# if module.cond_image.shape[0] > len(context):
# module.cond_image_backup = module.cond_image
# module.set_cond_image(module.cond_image[context])
def mm_cn_restore(context: List[int]):
# restore control images for next context
if cn_script is not None and cn_script.latest_network is not None:
from scripts.hook import ControlModelType
for control in cn_script.latest_network.control_params:
if getattr(control, "hint_cond_backup", None) is not None:
control.hint_cond_backup[context] = control.hint_cond
control.hint_cond = control.hint_cond_backup
if control.hr_hint_cond is not None and getattr(control, "hr_hint_cond_backup", None) is not None:
control.hr_hint_cond_backup[context] = control.hr_hint_cond
control.hr_hint_cond = control.hr_hint_cond_backup
if control.control_model_type == ControlModelType.IPAdapter and getattr(control.control_model, "image_emb_backup", None) is not None:
control.control_model.image_emb_backup[context] = control.control_model.image_emb
control.control_model.uncond_image_emb_backup[context] = control.control_model.uncond_image_emb
control.control_model.image_emb = control.control_model.image_emb_backup
control.control_model.uncond_image_emb = control.control_model.uncond_image_emb_backup
# if control.control_model_type == ControlModelType.Controlllite:
# for module in control.control_model.modules.values():
# if module.cond_image.shape[0] > len(context):
# module.set_cond_image(module.cond_image_backup)
def mm_cfg_forward(self, x, sigma, uncond, cond, cond_scale, s_min_uncond, image_cond):
if state.interrupted or state.skipped:
raise sd_samplers_common.InterruptedException
if sd_samplers_common.apply_refiner(self):
cond = self.sampler.sampler_extra_args['cond']
uncond = self.sampler.sampler_extra_args['uncond']
# at self.image_cfg_scale == 1.0 produced results for edit model are the same as with normal sampling,
# so is_edit_model is set to False to support AND composition.
is_edit_model = shared.sd_model.cond_stage_key == "edit" and self.image_cfg_scale is not None and self.image_cfg_scale != 1.0
conds_list, tensor = prompt_parser.reconstruct_multicond_batch(cond, self.step)
uncond = prompt_parser.reconstruct_cond_batch(uncond, self.step)
assert not is_edit_model or all(len(conds) == 1 for conds in conds_list), "AND is not supported for InstructPix2Pix checkpoint (unless using Image CFG scale = 1.0)"
if self.mask_before_denoising and self.mask is not None:
x = self.init_latent * self.mask + self.nmask * x
batch_size = len(conds_list)
repeats = [len(conds_list[i]) for i in range(batch_size)]
if shared.sd_model.model.conditioning_key == "crossattn-adm":
image_uncond = torch.zeros_like(image_cond) # this should not be supported.
make_condition_dict = lambda c_crossattn, c_adm: {"c_crossattn": [c_crossattn], "c_adm": c_adm}
else:
image_uncond = image_cond
if isinstance(uncond, dict):
make_condition_dict = lambda c_crossattn, c_concat: {**c_crossattn, "c_concat": [c_concat]}
else:
make_condition_dict = lambda c_crossattn, c_concat: {"c_crossattn": [c_crossattn], "c_concat": [c_concat]}
if not is_edit_model:
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x])
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma])
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond])
else:
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x] + [x])
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma] + [sigma])
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond] + [torch.zeros_like(self.init_latent)])
denoiser_params = CFGDenoiserParams(x_in, image_cond_in, sigma_in, state.sampling_step, state.sampling_steps, tensor, uncond)
cfg_denoiser_callback(denoiser_params)
x_in = denoiser_params.x
image_cond_in = denoiser_params.image_cond
sigma_in = denoiser_params.sigma
tensor = denoiser_params.text_cond
uncond = denoiser_params.text_uncond
skip_uncond = False
# alternating uncond allows for higher thresholds without the quality loss normally expected from raising it
if self.step % 2 and s_min_uncond > 0 and sigma[0] < s_min_uncond and not is_edit_model:
skip_uncond = True
x_in = x_in[:-batch_size]
sigma_in = sigma_in[:-batch_size]
self.padded_cond_uncond = False
if shared.opts.pad_cond_uncond and tensor.shape[1] != uncond.shape[1]:
empty = shared.sd_model.cond_stage_model_empty_prompt
num_repeats = (tensor.shape[1] - uncond.shape[1]) // empty.shape[1]
if num_repeats < 0:
tensor = pad_cond(tensor, -num_repeats, empty)
self.padded_cond_uncond = True
elif num_repeats > 0:
uncond = pad_cond(uncond, num_repeats, empty)
self.padded_cond_uncond = True
if tensor.shape[1] == uncond.shape[1] or skip_uncond:
if is_edit_model:
cond_in = catenate_conds([tensor, uncond, uncond])
elif skip_uncond:
cond_in = tensor
else:
cond_in = catenate_conds([tensor, uncond])
if shared.opts.batch_cond_uncond: # only support this branch
# x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
x_out = torch.zeros_like(x_in, dtype=x_in.dtype, device=x_in.device)
for context in AnimateDiffInfV2V.uniform(self.step, params.video_length, params.batch_size, params.stride, params.overlap, params.closed_loop):
# run original forward function for the current context
_context = context + [c + params.video_length for c in context]
print(f"context: {_context}, shape: {x_in.shape}, {sigma_in.shape}, {cond_in.shape}, {image_cond_in.shape}")
mm_cn_select(_context)
x_out[_context] = self.inner_model(x_in[_context], sigma_in[_context], cond=make_condition_dict(cond_in[_context], image_cond_in[_context]))
mm_cn_restore(_context)
else:
x_out = torch.zeros_like(x_in)
for batch_offset in range(0, x_out.shape[0], batch_size):
a = batch_offset
b = a + batch_size
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(subscript_cond(cond_in, a, b), image_cond_in[a:b]))
else:
x_out = torch.zeros_like(x_in)
batch_size = batch_size*2 if shared.opts.batch_cond_uncond else batch_size
for batch_offset in range(0, tensor.shape[0], batch_size):
a = batch_offset
b = min(a + batch_size, tensor.shape[0])
if not is_edit_model:
c_crossattn = subscript_cond(tensor, a, b)
else:
c_crossattn = torch.cat([tensor[a:b]], uncond)
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
if not skip_uncond:
x_out[-uncond.shape[0]:] = self.inner_model(x_in[-uncond.shape[0]:], sigma_in[-uncond.shape[0]:], cond=make_condition_dict(uncond, image_cond_in[-uncond.shape[0]:]))
denoised_image_indexes = [x[0][0] for x in conds_list]
if skip_uncond:
fake_uncond = torch.cat([x_out[i:i+1] for i in denoised_image_indexes])
x_out = torch.cat([x_out, fake_uncond]) # we skipped uncond denoising, so we put cond-denoised image to where the uncond-denoised image should be
denoised_params = CFGDenoisedParams(x_out, state.sampling_step, state.sampling_steps, self.inner_model)
cfg_denoised_callback(denoised_params)
devices.test_for_nans(x_out, "unet")
if is_edit_model:
denoised = self.combine_denoised_for_edit_model(x_out, cond_scale)
elif skip_uncond:
denoised = self.combine_denoised(x_out, conds_list, uncond, 1.0)
else:
denoised = self.combine_denoised(x_out, conds_list, uncond, cond_scale)
if not self.mask_before_denoising and self.mask is not None:
denoised = self.init_latent * self.mask + self.nmask * denoised
self.sampler.last_latent = self.get_pred_x0(torch.cat([x_in[i:i + 1] for i in denoised_image_indexes]), torch.cat([x_out[i:i + 1] for i in denoised_image_indexes]), sigma)
if opts.live_preview_content == "Prompt":
preview = self.sampler.last_latent
elif opts.live_preview_content == "Negative prompt":
preview = self.get_pred_x0(x_in[-uncond.shape[0]:], x_out[-uncond.shape[0]:], sigma)
else:
preview = self.get_pred_x0(torch.cat([x_in[i:i+1] for i in denoised_image_indexes]), torch.cat([denoised[i:i+1] for i in denoised_image_indexes]), sigma)
sd_samplers_common.store_latent(preview)
after_cfg_callback_params = AfterCFGCallbackParams(denoised, state.sampling_step, state.sampling_steps)
cfg_after_cfg_callback(after_cfg_callback_params)
denoised = after_cfg_callback_params.x
self.step -= 1
self.step += 1
return x
return denoised
CFGDenoiser.forward = mm_cfg_forward

View File

@ -83,6 +83,9 @@ class AnimateDiffProcess:
p.batch_size = self.video_length
if self.video_length == 0:
self.video_length = p.batch_size
self.video_default = True
else:
self.video_default = False
if self.overlap == -1:
self.overlap = self.batch_size // 4
if "PNG" not in self.format: