* draft for freeinit

* update for v1.x

* Update README.md

* Update features.md

* Update how-to-use.md

* Update README.md

* Update README.md

---------

Co-authored-by: Chengsong Zhang <continuerevolution@gmail.com>
pull/463/head^2
Thiswinex 2024-03-10 17:27:58 +08:00 committed by GitHub
parent 12a503b8b7
commit a390500002
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 414 additions and 5 deletions

View File

@ -17,7 +17,9 @@ You might also be interested in another extension I created: [Segment Anything f
## Update
- [v2.0.0-a](https://github.com/continue-revolution/sd-webui-animatediff/tree/v2.0.0-a) in `03/02/2023`: The whole extension has been reworked to make it easier to maintain.
- Prerequisite: WebUI >= 1.8.0 & ControlNet >=1.1.441
- New feature: ControlNet inpaint / IP-Adapter prompt travel / SparseCtrl / ControlNet keyframe, see [ControlNet V2V](docs/features.md#controlnet-v2v)
- New feature:
- ControlNet inpaint / IP-Adapter prompt travel / SparseCtrl / ControlNet keyframe, see [ControlNet V2V](docs/features.md#controlnet-v2v)
- FreeInit, see [FreeInit](docs/features.md#FreeInit)
- Minor: mm filter based on sd version (click refresh button if you switch between SD1.5 and SDXL) / display extension version in infotext
- Breaking change: You must use Motion LoRA, Hotshot-XL, AnimateDiff V3 Motion Adapter from my [huggingface repo](https://huggingface.co/conrevo/AnimateDiff-A1111/tree/main).
@ -54,6 +56,7 @@ We thank all developers and community users who contribute to this repository in
- [@limbo0000](https://github.com/limbo0000) for responding to my questions about AnimateDiff
- [@neggles](https://github.com/neggles) and [@s9roll7](https://github.com/s9roll7) for developing [AnimateDiff CLI Prompt Travel](https://github.com/s9roll7/animatediff-cli-prompt-travel)
- [@zappityzap](https://github.com/zappityzap) for developing the majority of the [output features](https://github.com/continue-revolution/sd-webui-animatediff/blob/master/scripts/animatediff_output.py)
- [@thiswinex](https://github.com/thiswinex) for developing FreeInit
- [@lllyasviel](https://github.com/lllyasviel) for adding me as a collaborator of sd-webui-controlnet and offering technical support for Forge
- [@KohakuBlueleaf](https://github.com/KohakuBlueleaf) for helping with FP8 and LCM development
- [@TDS4874](https://github.com/TDS4874) and [@opparco](https://github.com/opparco) for resolving the grey issue which significantly improve the performance

View File

@ -30,6 +30,16 @@ The last line is tail prompt, which is optional. You can write no/single/multipl
smile
```
## FreeInit
It allows you to use more time to get more coherent and consistent video frames.
The default parameters provide satisfactory results for most use cases. Increasing the number of iterations can yield better outcomes, but it also prolongs the processing time. If your video contains more intense or rapid motions, consider switching the filter to Gaussian. For a detailed explanation of each parameter, please refer to the documentation in the [original repository](https://github.com/TianxingWu/FreeInit).
| without FreeInit | with FreeInit (default params) |
| --- | --- |
| ![00003-1234](https://github.com/thiswinex/sd-webui-animatediff/assets/29111172/631e1f4e-5c7e-44b8-bffb-e9f3e95ee304) | ![00002-1234](https://github.com/thiswinex/sd-webui-animatediff/assets/29111172/f4ba7132-7daf-4e26-86cc-766353e79fec) |
## ControlNet V2V
You need to go to txt2img / img2img-batch and submit source video or path to frames. Each ControlNet will find control images according to this priority:

View File

@ -83,5 +83,9 @@ It is quite similar to the way you use ControlNet. API will return a video in ba
1. **Interp X** — Replace each input frame with X interpolated output frames. [#128](https://github.com/continue-revolution/sd-webui-animatediff/pull/128).
1. **Video source** — [Optional] Video source file for [ControlNet V2V](features.md#controlnet-v2v). You MUST enable ControlNet. It will be the source control for ALL ControlNet units that you enable without submitting a single control image to `Single Image` tab or a path to `Batch Folder` tab in ControlNet panel. You can of course submit one control image via `Single Image` tab or an input directory via `Batch Folder` tab, which will override this video source input and work as usual.
1. **Video path** — [Optional] Folder for source frames for [ControlNet V2V](features.md#controlnet-v2v), but higher priority than `Video source`. You MUST enable ControlNet. It will be the source control for ALL ControlNet units that you enable without submitting a control image or a path to ControlNet. You can of course submit one control image via `Single Image` tab or an input directory via `Batch Folder` tab, which will override this video path input and work as usual.
1. **FreeInit** - [Optional] Using FreeInit to improve temporal consistency of your videos.
1. The default parameters provide satisfactory results for most use cases.
1. Use "Gaussian" filter when your motion is intense.
1. See [original repo of Freeinit](https://github.com/TianxingWu/FreeInit) to for more parameter settings.
See [ControlNet V2V](features.md#controlnet-v2v) for an example parameter fill-in and more explanation.

View File

@ -19,6 +19,7 @@ from scripts.animatediff_settings import on_ui_settings
from scripts.animatediff_infotext import update_infotext, infotext_pasted
from scripts.animatediff_utils import get_animatediff_arg
from scripts.animatediff_i2ibatch import * # this is necessary for CN to find the function
from scripts.animatediff_freeinit import AnimateDiffFreeInit
script_dir = scripts.basedir()
motion_module.set_script_dir(script_dir)
@ -64,6 +65,9 @@ class AnimateDiffScript(scripts.Script):
params.set_p(p)
params.prompt_scheduler = AnimateDiffPromptSchedule(p, params)
update_infotext(p, params)
if params.freeinit_enable:
self.freeinit_hacker = AnimateDiffFreeInit(params)
self.freeinit_hacker.hack(p, params)
self.hacked = True
elif self.hacked:
motion_module.restore(p.sd_model)

View File

@ -0,0 +1,322 @@
import torch
import torch.fft as fft
import math
import os
import re
import sys
from modules import sd_models, shared, sd_samplers, devices
from modules.paths import extensions_builtin_dir
from modules.processing import StableDiffusionProcessing, opt_C, opt_f, StableDiffusionProcessingTxt2Img, StableDiffusionProcessingImg2Img, decode_latent_batch
from types import MethodType
from scripts.animatediff_logger import logger_animatediff as logger
from scripts.animatediff_ui import AnimateDiffProcess
def ddim_add_noise(
original_samples: torch.FloatTensor,
noise: torch.FloatTensor,
timesteps: torch.IntTensor,
) -> torch.FloatTensor:
alphas_cumprod = shared.sd_model.alphas_cumprod
# Make sure alphas_cumprod and timestep have same device and dtype as original_samples
alphas_cumprod = alphas_cumprod.to(device=original_samples.device, dtype=original_samples.dtype)
timesteps = timesteps.to(original_samples.device)
sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5
sqrt_alpha_prod = sqrt_alpha_prod.flatten()
while len(sqrt_alpha_prod.shape) < len(original_samples.shape):
sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)
sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()
while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):
sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)
noisy_samples = sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noise
return noisy_samples
class AnimateDiffFreeInit:
def __init__(self, params):
self.num_iters = params.freeinit_iters
self.method = params.freeinit_filter
self.d_s = params.freeinit_ds
self.d_t = params.freeinit_dt
@torch.no_grad()
def init_filter(self, video_length, height, width, filter_params):
# initialize frequency filter for noise reinitialization
batch_size = 1
filter_shape = [
batch_size,
opt_C,
video_length,
height // opt_f,
width // opt_f
]
self.freq_filter = get_freq_filter(filter_shape, device=devices.device, params=filter_params)
def hack(self, p: StableDiffusionProcessing, params: AnimateDiffProcess):
# init filter
filter_params = {
'method': self.method,
'd_s': self.d_s,
'd_t': self.d_t,
}
self.init_filter(params.video_length, p.height, p.width, filter_params)
def sample_t2i(self, conditioning, unconditional_conditioning, seeds, subseeds, subseed_strength, prompts):
self.sampler = sd_samplers.create_sampler(self.sampler_name, self.sd_model)
# hack total progress bar works in an ugly way)
setattr(self.sampler, 'freeinit_num_iters', self.num_freeinit_iters)
setattr(self.sampler, 'freeinit_num_iter', 0)
def callback_hack(self, d):
step = d['i'] // self.freeinit_num_iters + self.freeinit_num_iter * (shared.state.sampling_steps // self.freeinit_num_iters)
if self.stop_at is not None and step > self.stop_at:
raise InterruptedException
shared.state.sampling_step = step
if d['i'] % self.freeinit_num_iters == 0:
shared.total_tqdm.update()
self.sampler.callback_state = MethodType(callback_hack, self.sampler)
# Sampling with FreeInit
x = self.rng.next()
x_dtype = x.dtype
for iter in range(self.num_freeinit_iters):
self.sampler.freeinit_num_iter = iter
if iter == 0:
initial_x = x.detach().clone()
else:
# z_0
diffuse_timesteps = torch.tensor(1000 - 1)
z_T = ddim_add_noise(x, initial_x, diffuse_timesteps) # [16, 4, 64, 64]
# z_T
# 2. create random noise z_rand for high-frequency
z_T = z_T.permute(1, 0, 2, 3)[None, ...] # [bs, 4, 16, 64, 64]
#z_rand = torch.randn(z_T.shape, device=devices.device)
z_rand = initial_x.detach().clone().permute(1, 0, 2, 3)[None, ...]
# 3. Roise Reinitialization
x = freq_mix_3d(z_T.to(dtype=torch.float32), z_rand, LPF=self.freq_filter)
x = x[0].permute(1, 0, 2, 3)
x = x.to(x_dtype)
x = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
devices.torch_gc()
samples = x
del x
if not self.enable_hr:
return samples
devices.torch_gc()
if self.latent_scale_mode is None:
decoded_samples = torch.stack(decode_latent_batch(self.sd_model, samples, target_device=devices.cpu, check_for_nans=True)).to(dtype=torch.float32)
else:
decoded_samples = None
with sd_models.SkipWritingToConfig():
sd_models.reload_model_weights(info=self.hr_checkpoint_info)
return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
def sample_i2i(self, conditioning, unconditional_conditioning, seeds, subseeds, subseed_strength, prompts):
x = self.rng.next()
x_dtype = x.dtype
if self.initial_noise_multiplier != 1.0:
self.extra_generation_params["Noise multiplier"] = self.initial_noise_multiplier
x *= self.initial_noise_multiplier
for iter in range(self.num_freeinit_iters):
if iter == 0:
initial_x = x.detach().clone()
else:
# z_0
diffuse_timesteps = torch.tensor(1000 - 1)
z_T = ddim_add_noise(x, initial_x, diffuse_timesteps) # [16, 4, 64, 64]
# z_T
# 2. create random noise z_rand for high-frequency
z_T = z_T.permute(1, 0, 2, 3)[None, ...] # [bs, 4, 16, 64, 64]
#z_rand = torch.randn(z_T.shape, device=devices.device)
z_rand = initial_x.detach().clone().permute(1, 0, 2, 3)[None, ...]
# 3. Roise Reinitialization
x = freq_mix_3d(z_T.to(dtype=torch.float32), z_rand, LPF=self.freq_filter)
x = x[0].permute(1, 0, 2, 3)
x = x.to(x_dtype)
x = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning)
samples = x
if self.mask is not None:
samples = samples * self.nmask + self.init_latent * self.mask
del x
devices.torch_gc()
return samples
if isinstance(p, StableDiffusionProcessingTxt2Img):
p.sample = MethodType(sample_t2i, p)
elif isinstance(p, StableDiffusionProcessingImg2Img):
p.sample = MethodType(sample_i2i, p)
else:
raise NotImplementedError
setattr(p, 'freq_filter', self.freq_filter)
setattr(p, 'num_freeinit_iters', self.num_iters)
def freq_mix_3d(x, noise, LPF):
"""
Noise reinitialization.
Args:
x: diffused latent
noise: randomly sampled noise
LPF: low pass filter
"""
# FFT
x_freq = fft.fftn(x, dim=(-3, -2, -1))
x_freq = fft.fftshift(x_freq, dim=(-3, -2, -1))
noise_freq = fft.fftn(noise, dim=(-3, -2, -1))
noise_freq = fft.fftshift(noise_freq, dim=(-3, -2, -1))
# frequency mix
HPF = 1 - LPF
x_freq_low = x_freq * LPF
noise_freq_high = noise_freq * HPF
x_freq_mixed = x_freq_low + noise_freq_high # mix in freq domain
# IFFT
x_freq_mixed = fft.ifftshift(x_freq_mixed, dim=(-3, -2, -1))
x_mixed = fft.ifftn(x_freq_mixed, dim=(-3, -2, -1)).real
return x_mixed
def get_freq_filter(shape, device, params: dict):
"""
Form the frequency filter for noise reinitialization.
Args:
shape: shape of latent (B, C, T, H, W)
params: filter parameters
"""
if params['method'] == "gaussian":
return gaussian_low_pass_filter(shape=shape, d_s=params['d_s'], d_t=params['d_t']).to(device)
elif params['method'] == "ideal":
return ideal_low_pass_filter(shape=shape, d_s=params['d_s'], d_t=params['d_t']).to(device)
elif params['method'] == "box":
return box_low_pass_filter(shape=shape, d_s=params['d_s'], d_t=params['d_t']).to(device)
elif params['method'] == "butterworth":
return butterworth_low_pass_filter(shape=shape, n=4, d_s=params['d_s'], d_t=params['d_t']).to(device)
else:
raise NotImplementedError
def gaussian_low_pass_filter(shape, d_s=0.25, d_t=0.25):
"""
Compute the gaussian low pass filter mask.
Args:
shape: shape of the filter (volume)
d_s: normalized stop frequency for spatial dimensions (0.0-1.0)
d_t: normalized stop frequency for temporal dimension (0.0-1.0)
"""
T, H, W = shape[-3], shape[-2], shape[-1]
mask = torch.zeros(shape)
if d_s==0 or d_t==0:
return mask
for t in range(T):
for h in range(H):
for w in range(W):
d_square = (((d_s/d_t)*(2*t/T-1))**2 + (2*h/H-1)**2 + (2*w/W-1)**2)
mask[..., t,h,w] = math.exp(-1/(2*d_s**2) * d_square)
return mask
def butterworth_low_pass_filter(shape, n=4, d_s=0.25, d_t=0.25):
"""
Compute the butterworth low pass filter mask.
Args:
shape: shape of the filter (volume)
n: order of the filter, larger n ~ ideal, smaller n ~ gaussian
d_s: normalized stop frequency for spatial dimensions (0.0-1.0)
d_t: normalized stop frequency for temporal dimension (0.0-1.0)
"""
T, H, W = shape[-3], shape[-2], shape[-1]
mask = torch.zeros(shape)
if d_s==0 or d_t==0:
return mask
for t in range(T):
for h in range(H):
for w in range(W):
d_square = (((d_s/d_t)*(2*t/T-1))**2 + (2*h/H-1)**2 + (2*w/W-1)**2)
mask[..., t,h,w] = 1 / (1 + (d_square / d_s**2)**n)
return mask
def ideal_low_pass_filter(shape, d_s=0.25, d_t=0.25):
"""
Compute the ideal low pass filter mask.
Args:
shape: shape of the filter (volume)
d_s: normalized stop frequency for spatial dimensions (0.0-1.0)
d_t: normalized stop frequency for temporal dimension (0.0-1.0)
"""
T, H, W = shape[-3], shape[-2], shape[-1]
mask = torch.zeros(shape)
if d_s==0 or d_t==0:
return mask
for t in range(T):
for h in range(H):
for w in range(W):
d_square = (((d_s/d_t)*(2*t/T-1))**2 + (2*h/H-1)**2 + (2*w/W-1)**2)
mask[..., t,h,w] = 1 if d_square <= d_s*2 else 0
return mask
def box_low_pass_filter(shape, d_s=0.25, d_t=0.25):
"""
Compute the ideal low pass filter mask (approximated version).
Args:
shape: shape of the filter (volume)
d_s: normalized stop frequency for spatial dimensions (0.0-1.0)
d_t: normalized stop frequency for temporal dimension (0.0-1.0)
"""
T, H, W = shape[-3], shape[-2], shape[-1]
mask = torch.zeros(shape)
if d_s==0 or d_t==0:
return mask
threshold_s = round(int(H // 2) * d_s)
threshold_t = round(T // 2 * d_t)
cframe, crow, ccol = T // 2, H // 2, W //2
mask[..., cframe - threshold_t:cframe + threshold_t, crow - threshold_s:crow + threshold_s, ccol - threshold_s:ccol + threshold_s] = 1.0
return mask

View File

@ -156,9 +156,10 @@ class AnimateDiffInfV2V:
mm_cn_restore(_context)
return x_out
logger.info("inner model forward hooked")
cfg_params.denoiser.inner_model.original_forward = cfg_params.denoiser.inner_model.forward
cfg_params.denoiser.inner_model.forward = MethodType(mm_sd_forward, cfg_params.denoiser.inner_model)
if getattr(cfg_params.denoiser.inner_model, 'original_forward', None) is None:
logger.info("inner model forward hooked")
cfg_params.denoiser.inner_model.original_forward = cfg_params.denoiser.inner_model.forward
cfg_params.denoiser.inner_model.forward = MethodType(mm_sd_forward, cfg_params.denoiser.inner_model)
cfg_params.text_cond = ad_params.text_cond
ad_params.step = cfg_params.denoiser.step

View File

@ -46,6 +46,11 @@ class AnimateDiffProcess:
video_source=None,
video_path='',
mask_path='',
freeinit_enable=False,
freeinit_filter="butterworth",
freeinit_ds=0.25,
freeinit_dt=0.25,
freeinit_iters=3,
latent_power=1,
latent_scale=32,
last_frame=None,
@ -68,6 +73,11 @@ class AnimateDiffProcess:
self.video_source = video_source
self.video_path = video_path
self.mask_path = mask_path
self.freeinit_enable = freeinit_enable
self.freeinit_filter = freeinit_filter
self.freeinit_ds = freeinit_ds
self.freeinit_dt = freeinit_dt
self.freeinit_iters = freeinit_iters
self.latent_power = latent_power
self.latent_scale = latent_scale
self.last_frame = last_frame
@ -82,7 +92,7 @@ class AnimateDiffProcess:
def get_list(self, is_img2img: bool):
return list(vars(self).values())[:(20 if is_img2img else 15)]
return list(vars(self).values())[:(25 if is_img2img else 20)]
def get_dict(self, is_img2img: bool):
@ -97,6 +107,7 @@ class AnimateDiffProcess:
"overlap": self.overlap,
"interp": self.interp,
"interp_x": self.interp_x,
"freeinit_enable": self.freeinit_enable,
}
if self.request_id:
infotext['request_id'] = self.request_id
@ -233,6 +244,14 @@ class AnimateDiffUiGroup:
self.params = AnimateDiffProcess()
AnimateDiffUiGroup.animatediff_ui_group.append(self)
# Free-init
self.filter_type_list = [
"butterworth",
"gaussian",
"box",
"ideal"
]
def get_model_list(self):
model_dir = motion_module.get_model_dir()
@ -350,6 +369,52 @@ class AnimateDiffUiGroup:
value=self.params.interp_x, label="Interp X", precision=0,
elem_id=f"{elemid_prefix}interp-x"
)
with gr.Accordion("FreeInit Params", open=False):
gr.Markdown(
"""
Adjust to control the smoothness.
"""
)
self.params.freeinit_enable = gr.Checkbox(
value=self.params.freeinit_enable,
label="Enable FreeInit",
elem_id=f"{elemid_prefix}freeinit-enable"
)
self.params.freeinit_filter = gr.Dropdown(
value=self.params.freeinit_filter,
label="Filter Type",
info="Default as Butterworth. To fix large inconsistencies, consider using Gaussian.",
choices=self.filter_type_list,
interactive=True,
elem_id=f"{elemid_prefix}freeinit-filter"
)
self.params.freeinit_ds = gr.Slider(
value=self.params.freeinit_ds,
minimum=0,
maximum=1,
step=0.125,
label="d_s",
info="Stop frequency for spatial dimensions (0.0-1.0)",
elem_id=f"{elemid_prefix}freeinit-ds"
)
self.params.freeinit_dt = gr.Slider(
value=self.params.freeinit_dt,
minimum=0,
maximum=1,
step=0.125,
label="d_t",
info="Stop frequency for temporal dimension (0.0-1.0)",
elem_id=f"{elemid_prefix}freeinit-dt"
)
self.params.freeinit_iters = gr.Slider(
value=self.params.freeinit_iters,
minimum=2,
maximum=5,
step=1,
label="FreeInit Iterations",
info="Larger value leads to smoother results & longer inference time.",
elem_id=f"{elemid_prefix}freeinit-dt",
)
self.params.video_source = gr.Video(
value=self.params.video_source,
label="Video source",