infv2v (#153)
* restart to try infv2v * infv2v works now, but cn+infv2v is not working * finally works * readmepull/156/head
parent
dee63eee74
commit
7e9f3830f5
17
README.md
17
README.md
|
|
@ -20,7 +20,7 @@ You might also be interested in another extension I created: [Segment Anything f
|
|||
1. Go to txt2img if you want to try txt2gif and img2img if you want to try img2gif.
|
||||
1. Choose an SD1.5 checkpoint, write prompts, set configurations such as image width/height. If you want to generate multiple GIFs at once, please change batch number, instead of batch size.
|
||||
1. Enable AnimateDiff extension, and set up each parameter, and click `Generate`.
|
||||
1. **Number of frames** — [**!!!Infinite V2V feature, DO NOT CHANGE, KEEP IT 0!!!**] Choose whatever number you like.
|
||||
1. **Number of frames** — Choose whatever number you like.
|
||||
|
||||
If you enter 0 (default):
|
||||
- If you submit a video via `Video source` / enter a video path via `Video path` / enable ANY batch ControlNet, the number of frames will be the number of frames in the video (use shortest if more than one videos are submitted).
|
||||
|
|
@ -30,9 +30,9 @@ You might also be interested in another extension I created: [Segment Anything f
|
|||
1. **FPS** — Frames per second, which is how many frames (images) are shown every second. If 16 frames are generated at 8 frames per second, your GIF’s duration is 2 seconds. If you submit a source video, your FPS will be the same as the source video.
|
||||
1. **Display loop number** — How many times the GIF is played. A value of `0` means the GIF never stops playing.
|
||||
1. **Batch size** — How many frames will be passed into the motion module at once. The model is trained with 16 frames, so it’ll give the best results when the number of frames is set to `16`. Choose [1, 24] for V1 motion modules and [1, 32] for V2 motion modules.
|
||||
1. **Closed loop** — [**!!!Infinite V2V feature, DO NOT CHANGE, KEEP IT False!!!**] If you enable this option and your number of frames is greater than your batch size, this extension will try to make the last frame the same as the first frame.
|
||||
1. **Stride** — [**!!!Infinite V2V feature, DO NOT CHANGE, KEEP IT 1!!!**] Max motion stride as a power of 2 (default: 1).
|
||||
1. **Overlap** — [**!!!Infinite V2V feature, DO NOT CHANGE, KEEP IT -1!!!**] Number of frames to overlap in context. If overlap is -1 (default): your overlap will be `Batch size` // 4.
|
||||
1. **Closed loop** — If you enable this option and your number of frames is greater than your batch size, this extension will try to make the last frame the same as the first frame.
|
||||
1. **Stride** — Max motion stride as a power of 2 (default: 1).
|
||||
1. **Overlap** — Number of frames to overlap in context. If overlap is -1 (default): your overlap will be `Batch size` // 4.
|
||||
1. **Save** — Format of the output. Choose at least one of "GIF"|"MP4"|"PNG". Check "TXT" if you want infotext, which will live in the same directory as the output GIF.
|
||||
1. You can optimize GIF with `gifsicle` (`apt install gifsicle` required, read [#91](https://github.com/continue-revolution/sd-webui-animatediff/pull/91) for more information) and/or `palette` (read [#104](https://github.com/continue-revolution/sd-webui-animatediff/pull/104) for more information). Go to `Settings/AnimateDiff` to enable them.
|
||||
1. **Reverse** — Append reversed frames to your output. See [#112](https://github.com/continue-revolution/sd-webui-animatediff/issues/112) for instruction.
|
||||
|
|
@ -97,9 +97,10 @@ Just like how you use ControlNet. Here is a sample. You will get a list of gener
|
|||
- `2023/09/19`: [v1.5.1](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.5.1): support xformers, sdp, sub-quadratic attention optimization - VRAM usage decrease to 5.60GB with default setting. See [FAQ](#faq) 1st item for more information.
|
||||
- `2023/09/22`: [v1.5.2](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.5.2): option to disable xformers at `Settings/AnimateDiff` [due to a bug in xformers](https://github.com/facebookresearch/xformers/issues/845), API support, option to enable GIF paletter optimization at `Settings/AnimateDiff` (credit to [@rkfg](https://github.com/rkfg)), gifsicle optimization move to `Settings/AnimateDiff`.
|
||||
- `2023/09/25`: [v1.6.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.6.0): [motion LoRA](https://github.com/guoyww/AnimateDiff#features) supported. Download and use them like any other LoRA you use (example: download motion lora to `stable-diffusion-webui/models/Lora` and add `<lora:v2_lora_PanDown:0.8>` to your positive prompt). **Motion LoRA only supports V2 motion modules**.
|
||||
- `2023/09/27`: [v1.7.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.7.0): [ControlNet](https://github.com/Mikubill/sd-webui-controlnet) supported. Please closely follow the instructions in [How to Use](#how-to-use), especially the explanation of `Video source` and `Video path` attributes. ControlNet is way more complex than what I can test and I ask you to test for me. Please submit an issue whenever you find a bug. [Demo and video instructions](#demo-and-video-instructions) are coming soon. Safetensors for some motion modules are also available now. See [model zoo](#motion-module-model-zoo). You may want to check `Do not append detectmap to output` in `Settings/ControlNet` to avoid having a series of control images in your output gallery. You should not change some attributes in your extension UI because they are for infinite v2v, see [WebUI](#webui) for what you should not change.
|
||||
- `2023/09/27`: [v1.7.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.7.0): [ControlNet](https://github.com/Mikubill/sd-webui-controlnet) supported. Please closely follow the instructions in [How to Use](#how-to-use), especially the explanation of `Video source` and `Video path` attributes. ControlNet is way more complex than what I can test and I ask you to test for me. Please submit an issue whenever you find a bug. You may want to check `Do not append detectmap to output` in `Settings/ControlNet` to avoid having a series of control images in your output gallery. [Safetensors](#motion-module-model-zoo) for some motion modules are also available now.
|
||||
- `2023/09/29`: [v1.8.0](https://github.com/continue-revolution/sd-webui-animatediff/releases/tag/v1.8.0): Infinite generation (with/without ControlNet) supported. [Demo and video instructions](#demo-and-video-instructions) are coming soon.
|
||||
|
||||
Infinite V2V, Prompt Travel and other CLI features are currently work in progress inside [#121](https://github.com/continue-revolution/sd-webui-animatediff/pull/121). Stay tuned and they should be released within a week.
|
||||
Prompt Travel and other CLI features are currently work in progress inside [#71](https://github.com/continue-revolution/sd-webui-animatediff/pull/71). Stay tuned and they should be released soon.
|
||||
|
||||
## FAQ
|
||||
1. Q: How much VRAM do I need?
|
||||
|
|
@ -115,10 +116,6 @@ Infinite V2V, Prompt Travel and other CLI features are currently work in progres
|
|||
|
||||
A: You will have to wait for someone to train SDXL-specific motion modules which will have a different model architecture. This extension essentially inject multiple motion modules into SD1.5 UNet. It does not work for other variations of SD, such as SD2.1 and SDXL.
|
||||
|
||||
1. Q: Can I override the limitation of 24/32 frames per generation?
|
||||
|
||||
A: Not at this time, but will be supported via supporting [AnimateDIFF CLI Prompt Travel](https://github.com/s9roll7/animatediff-cli-prompt-travel) in the near future. This is a huge amount of work and life is busy, so expect to wait for a long time before updating.
|
||||
|
||||
|
||||
## Demo and Video Instructions
|
||||
|
||||
|
|
|
|||
|
|
@ -47,8 +47,8 @@ class AnimateDiffScript(scripts.Script):
|
|||
motion_module.inject(p.sd_model, params.model)
|
||||
self.lora_hacker = AnimateDiffLora(motion_module.mm.using_v2)
|
||||
self.lora_hacker.hack()
|
||||
# self.cfg_hacker = AnimateDiffInfV2V(p)
|
||||
# self.cfg_hacker.hack(params)
|
||||
self.cfg_hacker = AnimateDiffInfV2V(p)
|
||||
self.cfg_hacker.hack(params)
|
||||
self.cn_hacker = AnimateDiffControl(p)
|
||||
self.cn_hacker.hack(params)
|
||||
|
||||
|
|
@ -67,7 +67,7 @@ class AnimateDiffScript(scripts.Script):
|
|||
if isinstance(params, dict): params = AnimateDiffProcess(**params)
|
||||
if params.enable:
|
||||
self.cn_hacker.restore()
|
||||
# self.cfg_hacker.restore()
|
||||
self.cfg_hacker.restore()
|
||||
self.lora_hacker.restore()
|
||||
motion_module.restore(p.sd_model)
|
||||
AnimateDiffOutput().output(p, res, params)
|
||||
|
|
|
|||
|
|
@ -97,8 +97,9 @@ class AnimateDiffControl:
|
|||
params.video_length = video_length
|
||||
if params.batch_size > video_length:
|
||||
params.batch_size = video_length
|
||||
# if params.video_length == 0: # TODO: support inf length
|
||||
# params.video_length = video_length
|
||||
if params.video_default:
|
||||
params.video_length = video_length
|
||||
p.batch_size = video_length
|
||||
for unit in units:
|
||||
if getattr(unit, 'input_mode', InputMode.SIMPLE) == InputMode.BATCH:
|
||||
unit.batch_images = unit.batch_images[:params.video_length]
|
||||
|
|
|
|||
|
|
@ -1,6 +1,14 @@
|
|||
from typing import List
|
||||
|
||||
import numpy as np
|
||||
from modules.sd_samplers_cfg_denoiser import CFGDenoiser
|
||||
from modules.prompt_parser import MulticondLearnedConditioning
|
||||
import torch
|
||||
|
||||
from modules import prompt_parser, devices, sd_samplers_common, shared
|
||||
from modules.shared import opts, state
|
||||
from modules.script_callbacks import CFGDenoiserParams, cfg_denoiser_callback
|
||||
from modules.script_callbacks import CFGDenoisedParams, cfg_denoised_callback
|
||||
from modules.script_callbacks import AfterCFGCallbackParams, cfg_after_cfg_callback
|
||||
from modules.sd_samplers_cfg_denoiser import CFGDenoiser, catenate_conds, subscript_cond, pad_cond
|
||||
|
||||
from scripts.animatediff_logger import logger_animatediff as logger
|
||||
from scripts.animatediff_ui import AnimateDiffProcess
|
||||
|
|
@ -55,78 +63,205 @@ class AnimateDiffInfV2V:
|
|||
video_length + pad + (0 if closed_loop else -overlap),
|
||||
(batch_size * context_step - overlap),
|
||||
):
|
||||
batch_list = [e % video_length for e in range(j, j + batch_size * context_step, context_step)]
|
||||
if not closed_loop and batch_list[-1] < batch_list[0]:
|
||||
batch_list_end = batch_list[: video_length - batch_list[0]]
|
||||
batch_list_front = batch_list[video_length - batch_list[0] :]
|
||||
if len(batch_list_end) < len(batch_list_front):
|
||||
batch_list_front_end = batch_list_front[-1]
|
||||
for i in range(len(batch_list_end)):
|
||||
batch_list_front.append(batch_list_front_end + i + 1)
|
||||
yield batch_list_front
|
||||
else:
|
||||
batch_list_end_front = batch_list_end[0]
|
||||
for i in range(len(batch_list_front)):
|
||||
batch_list_end.insert(0, batch_list_end_front - i - 1)
|
||||
yield batch_list_end
|
||||
else:
|
||||
yield batch_list
|
||||
yield [e % video_length for e in range(j, j + batch_size * context_step, context_step)]
|
||||
|
||||
|
||||
def hack(self, params: AnimateDiffProcess):
|
||||
logger.info(f"Hacking CFGDenoiser forward function.")
|
||||
self.cfg_original_forward = CFGDenoiser.forward
|
||||
cfg_original_forward = self.cfg_original_forward
|
||||
cn_script = self.cn_script
|
||||
|
||||
def mm_cfg_forward(self, x, sigma, uncond, cond, cond_scale, s_min_uncond, image_cond):
|
||||
for context in AnimateDiffInfV2V.uniform(self.step, params.video_length, params.batch_size, params.stride, params.overlap, params.closed_loop):
|
||||
# take control images for current context.
|
||||
# controlllite is for sdxl and we do not support it. reserve here for future use is needed.
|
||||
if cn_script is not None and cn_script.latest_network is not None:
|
||||
from scripts.hook import ControlModelType
|
||||
for control in cn_script.latest_network.control_params:
|
||||
if control.hint_cond.shape[0] > len(context):
|
||||
control.hint_cond_backup = control.hint_cond
|
||||
control.hint_cond = control.hint_cond[context]
|
||||
if control.hr_hint_cond is not None and control.hr_hint_cond.shape[0] > len(context):
|
||||
control.hr_hint_cond_backup = control.hr_hint_cond
|
||||
control.hr_hint_cond = control.hr_hint_cond[context]
|
||||
if control.control_model_type == ControlModelType.IPAdapter and control.control_model.image_emb.shape[0] > len(context):
|
||||
control.control_model.image_emb_backup = control.control_model.image_emb
|
||||
control.control_model.image_emb = control.control_model.image_emb[context]
|
||||
control.control_model.uncond_image_emb_backup = control.control_model.uncond_image_emb
|
||||
control.control_model.uncond_image_emb = control.control_model.uncond_image_emb[context]
|
||||
# if control.control_model_type == ControlModelType.Controlllite:
|
||||
# for module in control.control_model.modules.values():
|
||||
# if module.cond_image.shape[0] > len(context):
|
||||
# module.cond_image_backup = module.cond_image
|
||||
# module.set_cond_image(module.cond_image[context])
|
||||
# run original forward function for the current context
|
||||
# TODO: what to do with cond?
|
||||
x[context] = cfg_original_forward(
|
||||
self, x[context], sigma[context], [uncond[i] for i in context],
|
||||
MulticondLearnedConditioning(len(context), [cond.batch[i] for i in context]),
|
||||
cond_scale, s_min_uncond, image_cond[context])
|
||||
# restore control images for next context
|
||||
if cn_script is not None and cn_script.latest_network is not None:
|
||||
from scripts.hook import ControlModelType
|
||||
for control in cn_script.latest_network.control_params:
|
||||
if control.hint_cond.shape[0] > len(context):
|
||||
control.hint_cond = control.hint_cond_backup
|
||||
if control.hr_hint_cond is not None and control.hr_hint_cond.shape[0] > len(context):
|
||||
control.hr_hint_cond = control.hr_hint_cond_backup
|
||||
if control.control_model_type == ControlModelType.IPAdapter and control.control_model.image_emb.shape[0] > len(context):
|
||||
control.control_model.image_emb = control.control_model.image_emb_backup
|
||||
control.control_model.uncond_image_emb = control.control_model.uncond_image_emb_backup
|
||||
# if control.control_model_type == ControlModelType.Controlllite:
|
||||
# for module in control.control_model.modules.values():
|
||||
# if module.cond_image.shape[0] > len(context):
|
||||
# module.set_cond_image(module.cond_image_backup)
|
||||
def mm_cn_select(context: List[int]):
|
||||
# take control images for current context.
|
||||
# controlllite is for sdxl and we do not support it. reserve here for future use is needed.
|
||||
if cn_script is not None and cn_script.latest_network is not None:
|
||||
from scripts.hook import ControlModelType
|
||||
for control in cn_script.latest_network.control_params:
|
||||
if control.hint_cond.shape[0] > len(context):
|
||||
control.hint_cond_backup = control.hint_cond
|
||||
control.hint_cond = control.hint_cond[context]
|
||||
if control.hr_hint_cond is not None and control.hr_hint_cond.shape[0] > len(context):
|
||||
control.hr_hint_cond_backup = control.hr_hint_cond
|
||||
control.hr_hint_cond = control.hr_hint_cond[context]
|
||||
if control.control_model_type == ControlModelType.IPAdapter and control.control_model.image_emb.shape[0] > len(context):
|
||||
control.control_model.image_emb_backup = control.control_model.image_emb
|
||||
control.control_model.image_emb = control.control_model.image_emb[context]
|
||||
control.control_model.uncond_image_emb_backup = control.control_model.uncond_image_emb
|
||||
control.control_model.uncond_image_emb = control.control_model.uncond_image_emb[context]
|
||||
# if control.control_model_type == ControlModelType.Controlllite:
|
||||
# for module in control.control_model.modules.values():
|
||||
# if module.cond_image.shape[0] > len(context):
|
||||
# module.cond_image_backup = module.cond_image
|
||||
# module.set_cond_image(module.cond_image[context])
|
||||
|
||||
def mm_cn_restore(context: List[int]):
|
||||
# restore control images for next context
|
||||
if cn_script is not None and cn_script.latest_network is not None:
|
||||
from scripts.hook import ControlModelType
|
||||
for control in cn_script.latest_network.control_params:
|
||||
if getattr(control, "hint_cond_backup", None) is not None:
|
||||
control.hint_cond_backup[context] = control.hint_cond
|
||||
control.hint_cond = control.hint_cond_backup
|
||||
if control.hr_hint_cond is not None and getattr(control, "hr_hint_cond_backup", None) is not None:
|
||||
control.hr_hint_cond_backup[context] = control.hr_hint_cond
|
||||
control.hr_hint_cond = control.hr_hint_cond_backup
|
||||
if control.control_model_type == ControlModelType.IPAdapter and getattr(control.control_model, "image_emb_backup", None) is not None:
|
||||
control.control_model.image_emb_backup[context] = control.control_model.image_emb
|
||||
control.control_model.uncond_image_emb_backup[context] = control.control_model.uncond_image_emb
|
||||
control.control_model.image_emb = control.control_model.image_emb_backup
|
||||
control.control_model.uncond_image_emb = control.control_model.uncond_image_emb_backup
|
||||
# if control.control_model_type == ControlModelType.Controlllite:
|
||||
# for module in control.control_model.modules.values():
|
||||
# if module.cond_image.shape[0] > len(context):
|
||||
# module.set_cond_image(module.cond_image_backup)
|
||||
|
||||
def mm_cfg_forward(self, x, sigma, uncond, cond, cond_scale, s_min_uncond, image_cond):
|
||||
if state.interrupted or state.skipped:
|
||||
raise sd_samplers_common.InterruptedException
|
||||
|
||||
if sd_samplers_common.apply_refiner(self):
|
||||
cond = self.sampler.sampler_extra_args['cond']
|
||||
uncond = self.sampler.sampler_extra_args['uncond']
|
||||
|
||||
# at self.image_cfg_scale == 1.0 produced results for edit model are the same as with normal sampling,
|
||||
# so is_edit_model is set to False to support AND composition.
|
||||
is_edit_model = shared.sd_model.cond_stage_key == "edit" and self.image_cfg_scale is not None and self.image_cfg_scale != 1.0
|
||||
|
||||
conds_list, tensor = prompt_parser.reconstruct_multicond_batch(cond, self.step)
|
||||
uncond = prompt_parser.reconstruct_cond_batch(uncond, self.step)
|
||||
|
||||
assert not is_edit_model or all(len(conds) == 1 for conds in conds_list), "AND is not supported for InstructPix2Pix checkpoint (unless using Image CFG scale = 1.0)"
|
||||
|
||||
if self.mask_before_denoising and self.mask is not None:
|
||||
x = self.init_latent * self.mask + self.nmask * x
|
||||
|
||||
batch_size = len(conds_list)
|
||||
repeats = [len(conds_list[i]) for i in range(batch_size)]
|
||||
|
||||
if shared.sd_model.model.conditioning_key == "crossattn-adm":
|
||||
image_uncond = torch.zeros_like(image_cond) # this should not be supported.
|
||||
make_condition_dict = lambda c_crossattn, c_adm: {"c_crossattn": [c_crossattn], "c_adm": c_adm}
|
||||
else:
|
||||
image_uncond = image_cond
|
||||
if isinstance(uncond, dict):
|
||||
make_condition_dict = lambda c_crossattn, c_concat: {**c_crossattn, "c_concat": [c_concat]}
|
||||
else:
|
||||
make_condition_dict = lambda c_crossattn, c_concat: {"c_crossattn": [c_crossattn], "c_concat": [c_concat]}
|
||||
|
||||
if not is_edit_model:
|
||||
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x])
|
||||
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma])
|
||||
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond])
|
||||
else:
|
||||
x_in = torch.cat([torch.stack([x[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [x] + [x])
|
||||
sigma_in = torch.cat([torch.stack([sigma[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [sigma] + [sigma])
|
||||
image_cond_in = torch.cat([torch.stack([image_cond[i] for _ in range(n)]) for i, n in enumerate(repeats)] + [image_uncond] + [torch.zeros_like(self.init_latent)])
|
||||
|
||||
denoiser_params = CFGDenoiserParams(x_in, image_cond_in, sigma_in, state.sampling_step, state.sampling_steps, tensor, uncond)
|
||||
cfg_denoiser_callback(denoiser_params)
|
||||
x_in = denoiser_params.x
|
||||
image_cond_in = denoiser_params.image_cond
|
||||
sigma_in = denoiser_params.sigma
|
||||
tensor = denoiser_params.text_cond
|
||||
uncond = denoiser_params.text_uncond
|
||||
skip_uncond = False
|
||||
|
||||
# alternating uncond allows for higher thresholds without the quality loss normally expected from raising it
|
||||
if self.step % 2 and s_min_uncond > 0 and sigma[0] < s_min_uncond and not is_edit_model:
|
||||
skip_uncond = True
|
||||
x_in = x_in[:-batch_size]
|
||||
sigma_in = sigma_in[:-batch_size]
|
||||
|
||||
self.padded_cond_uncond = False
|
||||
if shared.opts.pad_cond_uncond and tensor.shape[1] != uncond.shape[1]:
|
||||
empty = shared.sd_model.cond_stage_model_empty_prompt
|
||||
num_repeats = (tensor.shape[1] - uncond.shape[1]) // empty.shape[1]
|
||||
|
||||
if num_repeats < 0:
|
||||
tensor = pad_cond(tensor, -num_repeats, empty)
|
||||
self.padded_cond_uncond = True
|
||||
elif num_repeats > 0:
|
||||
uncond = pad_cond(uncond, num_repeats, empty)
|
||||
self.padded_cond_uncond = True
|
||||
|
||||
if tensor.shape[1] == uncond.shape[1] or skip_uncond:
|
||||
if is_edit_model:
|
||||
cond_in = catenate_conds([tensor, uncond, uncond])
|
||||
elif skip_uncond:
|
||||
cond_in = tensor
|
||||
else:
|
||||
cond_in = catenate_conds([tensor, uncond])
|
||||
|
||||
if shared.opts.batch_cond_uncond: # only support this branch
|
||||
# x_out = self.inner_model(x_in, sigma_in, cond=make_condition_dict(cond_in, image_cond_in))
|
||||
x_out = torch.zeros_like(x_in, dtype=x_in.dtype, device=x_in.device)
|
||||
for context in AnimateDiffInfV2V.uniform(self.step, params.video_length, params.batch_size, params.stride, params.overlap, params.closed_loop):
|
||||
# run original forward function for the current context
|
||||
_context = context + [c + params.video_length for c in context]
|
||||
print(f"context: {_context}, shape: {x_in.shape}, {sigma_in.shape}, {cond_in.shape}, {image_cond_in.shape}")
|
||||
mm_cn_select(_context)
|
||||
x_out[_context] = self.inner_model(x_in[_context], sigma_in[_context], cond=make_condition_dict(cond_in[_context], image_cond_in[_context]))
|
||||
mm_cn_restore(_context)
|
||||
else:
|
||||
x_out = torch.zeros_like(x_in)
|
||||
for batch_offset in range(0, x_out.shape[0], batch_size):
|
||||
a = batch_offset
|
||||
b = a + batch_size
|
||||
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(subscript_cond(cond_in, a, b), image_cond_in[a:b]))
|
||||
else:
|
||||
x_out = torch.zeros_like(x_in)
|
||||
batch_size = batch_size*2 if shared.opts.batch_cond_uncond else batch_size
|
||||
for batch_offset in range(0, tensor.shape[0], batch_size):
|
||||
a = batch_offset
|
||||
b = min(a + batch_size, tensor.shape[0])
|
||||
|
||||
if not is_edit_model:
|
||||
c_crossattn = subscript_cond(tensor, a, b)
|
||||
else:
|
||||
c_crossattn = torch.cat([tensor[a:b]], uncond)
|
||||
|
||||
x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
|
||||
|
||||
if not skip_uncond:
|
||||
x_out[-uncond.shape[0]:] = self.inner_model(x_in[-uncond.shape[0]:], sigma_in[-uncond.shape[0]:], cond=make_condition_dict(uncond, image_cond_in[-uncond.shape[0]:]))
|
||||
|
||||
denoised_image_indexes = [x[0][0] for x in conds_list]
|
||||
if skip_uncond:
|
||||
fake_uncond = torch.cat([x_out[i:i+1] for i in denoised_image_indexes])
|
||||
x_out = torch.cat([x_out, fake_uncond]) # we skipped uncond denoising, so we put cond-denoised image to where the uncond-denoised image should be
|
||||
|
||||
denoised_params = CFGDenoisedParams(x_out, state.sampling_step, state.sampling_steps, self.inner_model)
|
||||
cfg_denoised_callback(denoised_params)
|
||||
|
||||
devices.test_for_nans(x_out, "unet")
|
||||
|
||||
if is_edit_model:
|
||||
denoised = self.combine_denoised_for_edit_model(x_out, cond_scale)
|
||||
elif skip_uncond:
|
||||
denoised = self.combine_denoised(x_out, conds_list, uncond, 1.0)
|
||||
else:
|
||||
denoised = self.combine_denoised(x_out, conds_list, uncond, cond_scale)
|
||||
|
||||
if not self.mask_before_denoising and self.mask is not None:
|
||||
denoised = self.init_latent * self.mask + self.nmask * denoised
|
||||
|
||||
self.sampler.last_latent = self.get_pred_x0(torch.cat([x_in[i:i + 1] for i in denoised_image_indexes]), torch.cat([x_out[i:i + 1] for i in denoised_image_indexes]), sigma)
|
||||
|
||||
if opts.live_preview_content == "Prompt":
|
||||
preview = self.sampler.last_latent
|
||||
elif opts.live_preview_content == "Negative prompt":
|
||||
preview = self.get_pred_x0(x_in[-uncond.shape[0]:], x_out[-uncond.shape[0]:], sigma)
|
||||
else:
|
||||
preview = self.get_pred_x0(torch.cat([x_in[i:i+1] for i in denoised_image_indexes]), torch.cat([denoised[i:i+1] for i in denoised_image_indexes]), sigma)
|
||||
|
||||
sd_samplers_common.store_latent(preview)
|
||||
|
||||
after_cfg_callback_params = AfterCFGCallbackParams(denoised, state.sampling_step, state.sampling_steps)
|
||||
cfg_after_cfg_callback(after_cfg_callback_params)
|
||||
denoised = after_cfg_callback_params.x
|
||||
|
||||
self.step -= 1
|
||||
self.step += 1
|
||||
return x
|
||||
return denoised
|
||||
|
||||
CFGDenoiser.forward = mm_cfg_forward
|
||||
|
||||
|
|
|
|||
|
|
@ -83,6 +83,9 @@ class AnimateDiffProcess:
|
|||
p.batch_size = self.video_length
|
||||
if self.video_length == 0:
|
||||
self.video_length = p.batch_size
|
||||
self.video_default = True
|
||||
else:
|
||||
self.video_default = False
|
||||
if self.overlap == -1:
|
||||
self.overlap = self.batch_size // 4
|
||||
if "PNG" not in self.format:
|
||||
|
|
|
|||
Loading…
Reference in New Issue