mirror of https://github.com/vladmandic/automatic
add wan asymmetric vae upscaler
Signed-off-by: Vladimir Mandic <mandic00@live.com>pull/4322/head
parent
df4571588b
commit
bc775f0530
|
|
@ -1,15 +1,6 @@
|
|||
# To use:
|
||||
#
|
||||
# pre-commit run -a
|
||||
#
|
||||
# Or:
|
||||
#
|
||||
# pre-commit install # (runs every time you commit in git)
|
||||
#
|
||||
# To update this file:
|
||||
#
|
||||
# pre-commit autoupdate
|
||||
#
|
||||
# To use: pre-commit run -a
|
||||
# Or: pre-commit install # (runs every time you commit in git)
|
||||
# To update this file: pre-commit autoupdate
|
||||
# See https://github.com/pre-commit/pre-commit
|
||||
|
||||
ci:
|
||||
|
|
@ -19,7 +10,7 @@ ci:
|
|||
repos:
|
||||
# Standard hooks
|
||||
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||
rev: v5.0.0
|
||||
rev: v6.0.0
|
||||
hooks:
|
||||
- id: check-added-large-files
|
||||
- id: check-case-conflict
|
||||
|
|
@ -35,6 +26,7 @@ repos:
|
|||
- id: check-json
|
||||
- id: check-toml
|
||||
- id: check-xml
|
||||
- id: debug-statements
|
||||
- id: end-of-file-fixer
|
||||
- id: mixed-line-ending
|
||||
- id: check-executables-have-shebangs
|
||||
|
|
|
|||
16
CHANGELOG.md
16
CHANGELOG.md
|
|
@ -5,9 +5,11 @@
|
|||
### Highlights for 2025-10-28
|
||||
|
||||
- Reorganization of **Reference Models** into *Base, Quantized, Distilled and Community* sections for easier navigation
|
||||
- New models: **HunyuanImage 2.1** capable of generating 2K images natively, **Pony 7** based on AuraFlow architecture and **Kandinsky 5** 10s video models
|
||||
- New models: **HunyuanImage 2.1** capable of generating 2K images natively, **Pony 7** based on AuraFlow architecture,
|
||||
**Kandinsky 5** 10s video models, **Krea Realtime** autoregressive variant of WAN-2.1
|
||||
- New **offline mode** to use previously downloaded models without internet connection
|
||||
- New SOTA model loader using **Run:ai streamer**
|
||||
- Optimizations to **WAN-2.2** given its popularity plus addition of native **VAE Upscaler** and optimized **pre-quantized** variants
|
||||
- Updates to `rocm` and `xpu` backends
|
||||
- Fixes, fixes, fixes... too many to list here!
|
||||
|
||||
|
|
@ -29,11 +31,14 @@
|
|||
second series of models in *Kandinsky5* series is T2V model optimized for 10sec videos and uses Qwen2.5 text encoder
|
||||
- [Pony 7](https://huggingface.co/purplesmartai/pony-v7-base)
|
||||
Pony 7 steps in a different direction from previous Pony models and is based on AuraFlow architecture and UMT5 encoder
|
||||
- **Models Auxiliary**
|
||||
- add **Qwen 3-VL** VLM for interrogate and prompt enhance, thanks @CalamitousFelicitousness
|
||||
- **Models Auxiliary**
|
||||
- [Qwen 3-VL](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) VLM for interrogate and prompt enhance, thanks @CalamitousFelicitousness
|
||||
this includes *2B, 4B and 8B* variants
|
||||
- add **Apple DepthPro** controlnet processor, thanks @nolbert82
|
||||
- add **LibreFlux** segmentation controlnet for FLUX.1
|
||||
- [WAN Asymettric Upscale](https://huggingface.co/spacepxl/Wan2.1-VAE-upscale2x)
|
||||
available as general purpose upscaler that can be used during standard workflow or process tab
|
||||
available as VAE for compatible video models: *WAN-2.x-14B, SkyReels-v2* models
|
||||
- [Apple DepthPro](https://huggingface.co/apple/DepthPro) controlnet processor, thanks @nolbert82
|
||||
- [LibreFlux controlnet](https://huggingface.co/neuralvfx/LibreFlux-ControlNet) segmentation controlnet for FLUX.1
|
||||
- **Features**
|
||||
- **offline mode**: enable in *settings -> hugginface*
|
||||
enables fully offline mode where previously downloaded models can be used as-is
|
||||
|
|
@ -75,6 +80,7 @@
|
|||
- fix `wan-2.2-14b-vace` single-stage exectution
|
||||
- fix `wan-2.2-5b` tiled vae decode
|
||||
- fix `controlnet` loading with quantization
|
||||
- video use pre-quantized text-encoder if selected model is pre-quantized
|
||||
- handle sparse `controlnet` models
|
||||
- catch `xet` warnings
|
||||
- validate pipelines on import
|
||||
|
|
|
|||
22
TODO.md
22
TODO.md
|
|
@ -4,15 +4,17 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
|
|||
|
||||
## Future Candidates
|
||||
|
||||
- [Kanvas](https://github.com/vladmandic/kanvas)
|
||||
- Transformers unified cache handler
|
||||
- Remote TE
|
||||
- Core: New inpaint/outpaint interface
|
||||
[Kanvas](https://github.com/vladmandic/kanvas)
|
||||
- Core: Create executable for SD.Next
|
||||
- Feature: Transformers unified cache handler
|
||||
- Remote Text-Encoder support
|
||||
- Refactor: [Modular pipelines and guiders](https://github.com/huggingface/diffusers/issues/11915)
|
||||
- Refactor: Sampler options
|
||||
- Refactor: move sampler options to settings to config
|
||||
- Refactor: [GGUF](https://huggingface.co/docs/diffusers/main/en/quantization/gguf)
|
||||
- Feature: LoRA add OMI format support for SD35/FLUX.1
|
||||
- Video Core: API
|
||||
- Video LTX: TeaCache and others, API, Conditioning preprocess Video: LTX API
|
||||
- Video tab: add full API support
|
||||
- Control tab: add overrides handling
|
||||
|
||||
### Under Consideration
|
||||
|
||||
|
|
@ -26,13 +28,19 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
|
|||
- [Dream0 guidance](https://huggingface.co/ByteDance/DreamO)
|
||||
- [ByteDance OneReward](https://github.com/bytedance/OneReward)
|
||||
- [ByteDance USO](https://github.com/bytedance/USO)
|
||||
- [Video Inpaint Pipeline](https://github.com/huggingface/diffusers/pull/12506)
|
||||
- Remove: `CodeFormer`
|
||||
- Remove: `GFPGAN`
|
||||
- ModernUI: Lite vs Expert mode
|
||||
- Engine: TensorRT acceleration
|
||||
|
||||
### New models
|
||||
### New models / Pipelines
|
||||
|
||||
- [Krea Realtime Video](https://huggingface.co/krea/krea-realtime-video)
|
||||
- [Wan-2.2 Animate](https://github.com/huggingface/diffusers/pull/12526)
|
||||
- [Wan-2.2 S2V](https://github.com/huggingface/diffusers/pull/12258)
|
||||
- [LongCat-Video](https://huggingface.co/meituan-longcat/LongCat-Video)
|
||||
- [MUG-V 10B](https://huggingface.co/MUG-V/MUG-V-inference)
|
||||
- [Chroma1 Radiance](https://huggingface.co/lodestones/Chroma1-Radiance)
|
||||
- [Ovi](https://github.com/character-ai/Ovi)
|
||||
- [Bytedance Lynx](https://github.com/bytedance/lynx)
|
||||
|
|
|
|||
|
|
@ -692,10 +692,7 @@ class DCSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
|
|||
rhos_c = torch.linalg.solve(R, b)
|
||||
|
||||
if self.predict_x0:
|
||||
try:
|
||||
x_t_ = sigma_t / sigma_s0 * x - alpha_t * h_phi_1 * m0
|
||||
except Exception as e:
|
||||
import pdb; pdb.set_trace()
|
||||
x_t_ = sigma_t / sigma_s0 * x - alpha_t * h_phi_1 * m0
|
||||
if D1s is not None:
|
||||
corr_res = torch.einsum("k,bkc...->bc...", rhos_c[:-1], D1s)
|
||||
else:
|
||||
|
|
|
|||
|
|
@ -7,6 +7,14 @@ from modules import shared, sd_models, devices, timer, errors
|
|||
debug = shared.log.trace if os.environ.get('SD_VIDEO_DEBUG', None) is not None else lambda *args, **kwargs: None
|
||||
|
||||
|
||||
def hijack_vae_upscale(*args, **kwargs):
|
||||
import torch.nn.functional as F
|
||||
tensor = shared.sd_model.vae.orig_decode(*args, **kwargs)[0]
|
||||
tensor = F.pixel_shuffle(tensor.movedim(2, 1), upscale_factor=2).movedim(1, 2) # vae returns 16-dim latents, we need to pixel shuffle to 4-dim images
|
||||
tensor = tensor.unsqueeze(0) # add batch dimension
|
||||
return tensor
|
||||
|
||||
|
||||
def hijack_vae_decode(*args, **kwargs):
|
||||
jobid = shared.state.begin('VAE Decode')
|
||||
t0 = time.time()
|
||||
|
|
@ -16,7 +24,10 @@ def hijack_vae_decode(*args, **kwargs):
|
|||
sd_models.move_model(shared.sd_model.vae, devices.device)
|
||||
if torch.is_tensor(args[0]):
|
||||
latents = args[0].to(device=devices.device, dtype=shared.sd_model.vae.dtype) # upcast to vae dtype
|
||||
res = shared.sd_model.vae.orig_decode(latents, *args[1:], **kwargs)
|
||||
if hasattr(shared.sd_model.vae, '_asymmetric_upscale_vae'):
|
||||
res = hijack_vae_upscale(latents, *args[1:], **kwargs)
|
||||
else:
|
||||
res = shared.sd_model.vae.orig_decode(latents, *args[1:], **kwargs)
|
||||
t1 = time.time()
|
||||
shared.log.debug(f'Decode: vae={shared.sd_model.vae.__class__.__name__} slicing={getattr(shared.sd_model.vae, "use_slicing", None)} tiling={getattr(shared.sd_model.vae, "use_tiling", None)} latents={list(latents.shape)}:{latents.device} dtype={latents.dtype} time={t1-t0:.3f}')
|
||||
else:
|
||||
|
|
|
|||
|
|
@ -617,7 +617,10 @@ class SDNQQuantizer(DiffusersQuantizer, HfQuantizer):
|
|||
|
||||
def _process_model_after_weight_loading(self, model, **kwargs): # pylint: disable=unused-argument
|
||||
if shared.opts.diffusers_offload_mode != "none":
|
||||
model = model.to(devices.cpu)
|
||||
try:
|
||||
model = model.to(device=devices.cpu)
|
||||
except Exception:
|
||||
model = model.to_empty(device=devices.cpu)
|
||||
devices.torch_gc(force=True, reason="sdnq")
|
||||
return model
|
||||
|
||||
|
|
|
|||
|
|
@ -112,16 +112,17 @@ class UpscalerAsymmetricVAE(Upscaler):
|
|||
import torchvision.transforms.functional as F
|
||||
import diffusers
|
||||
from modules import shared, devices
|
||||
if self.vae is None or selected_model != self.selected:
|
||||
if self.vae is None or (selected_model != self.selected):
|
||||
if 'v1' in selected_model:
|
||||
repo_id = 'Heasterian/AsymmetricAutoencoderKLUpscaler'
|
||||
else:
|
||||
repo_id = 'Heasterian/AsymmetricAutoencoderKLUpscaler_v2'
|
||||
self.vae = diffusers.AsymmetricAutoencoderKL.from_pretrained(repo_id, cache_dir=shared.opts.hfcache_dir)
|
||||
shared.log.debug(f'Upscaler load: vae="{repo_id}"')
|
||||
self.vae.requires_grad_(False)
|
||||
self.vae = self.vae.to(device=devices.device, dtype=devices.dtype)
|
||||
self.vae.eval()
|
||||
self.selected = selected_model
|
||||
shared.log.debug(f'Upscaler load: selected="{self.selected}" vae="{repo_id}"')
|
||||
img = img.resize((8 * (img.width // 8), 8 * (img.height // 8)), resample=Image.Resampling.LANCZOS).convert('RGB')
|
||||
tensor = (F.pil_to_tensor(img).unsqueeze(0) / 255.0).to(device=devices.device, dtype=devices.dtype)
|
||||
self.vae = self.vae.to(device=devices.device)
|
||||
|
|
@ -131,6 +132,54 @@ class UpscalerAsymmetricVAE(Upscaler):
|
|||
return upscaled
|
||||
|
||||
|
||||
class UpscalerWanUpscale(Upscaler):
|
||||
def __init__(self, dirname=None): # pylint: disable=unused-argument
|
||||
super().__init__(False)
|
||||
self.name = "WAN Upscale"
|
||||
self.vae_encode = None
|
||||
self.vae_decode = None
|
||||
self.selected = None
|
||||
self.scalers = [
|
||||
UpscalerData("WAN Asymmetric Upscale", None, self),
|
||||
]
|
||||
|
||||
def do_upscale(self, img: Image, selected_model=None):
|
||||
if selected_model is None:
|
||||
return img
|
||||
import torchvision.transforms.functional as F
|
||||
import torch.nn.functional as FN
|
||||
import diffusers
|
||||
from modules import shared, devices
|
||||
if (self.vae_encode is None) or (self.vae_decode is None) or (selected_model != self.selected):
|
||||
repo_encode = 'Qwen/Qwen-Image-Edit-2509'
|
||||
subfolder_encode = 'vae'
|
||||
self.vae_encode = diffusers.AutoencoderKLWan.from_pretrained(repo_encode, subfolder=subfolder_encode, cache_dir=shared.opts.hfcache_dir)
|
||||
self.vae_encode.requires_grad_(False)
|
||||
self.vae_encode = self.vae_encode.to(device=devices.device, dtype=devices.dtype)
|
||||
self.vae_encode.eval()
|
||||
repo_decode = 'spacepxl/Wan2.1-VAE-upscale2x'
|
||||
subfolder_decode = "diffusers/Wan2.1_VAE_upscale2x_imageonly_real_v1"
|
||||
self.vae_decode = diffusers.AutoencoderKLWan.from_pretrained(repo_decode, subfolder=subfolder_decode, cache_dir=shared.opts.hfcache_dir)
|
||||
self.vae_decode.requires_grad_(False)
|
||||
self.vae_decode = self.vae_decode.to(device=devices.device, dtype=devices.dtype)
|
||||
self.vae_decode.eval()
|
||||
self.selected = selected_model
|
||||
shared.log.debug(f'Upscaler load: selected="{self.selected}" encode="{repo_encode}" decode="{repo_decode}"')
|
||||
|
||||
self.vae_encode = self.vae_encode.to(device=devices.device)
|
||||
tensor = (F.pil_to_tensor(img).unsqueeze(0).unsqueeze(2) / 255.0).to(device=devices.device, dtype=devices.dtype)
|
||||
tensor = self.vae_encode.encode(tensor).latent_dist.mode()
|
||||
self.vae_encode.to(device=devices.cpu)
|
||||
|
||||
self.vae_decode = self.vae_decode.to(device=devices.device)
|
||||
tensor = self.vae_decode.decode(tensor).sample
|
||||
tensor = FN.pixel_shuffle(tensor.movedim(2, 1), upscale_factor=2).movedim(1, 2) # pixel shuffle needs [..., C, H, W] format
|
||||
self.vae_decode.to(device=devices.cpu)
|
||||
|
||||
upscaled = F.to_pil_image(tensor.squeeze().clamp(0.0, 1.0).float().cpu())
|
||||
return upscaled
|
||||
|
||||
|
||||
class UpscalerDCC(Upscaler):
|
||||
def __init__(self, dirname=None): # pylint: disable=unused-argument
|
||||
super().__init__(False)
|
||||
|
|
|
|||
|
|
@ -281,6 +281,17 @@ try:
|
|||
te_cls=getattr(transformers, 'UMT5EncoderModel', None),
|
||||
dit_cls=getattr(diffusers, 'SkyReelsV2Transformer3DModel', None)),
|
||||
],
|
||||
"""
|
||||
'Krea': [
|
||||
Model(name='Krea Realtime WAN-2.1 14B T2V',
|
||||
url='https://huggingface.co/krea/krea-realtime-video',
|
||||
repo='krea/krea-realtime-video',
|
||||
repo_cls=getattr(diffusers, 'WanPipeline', None),
|
||||
te='Wan-AI/Wan2.1-T2V-14B-Diffusers',
|
||||
te_cls=getattr(transformers, 'UMT5EncoderModel', None),
|
||||
dit_cls=getattr(diffusers, 'WanTransformer3DModel', None)),
|
||||
],
|
||||
"""
|
||||
'Mochi Video': [
|
||||
Model(name='None'),
|
||||
Model(name='Mochi 1 T2V',
|
||||
|
|
|
|||
|
|
@ -43,7 +43,10 @@ def load_model(selected: models_def.Model):
|
|||
selected.te_folder = ''
|
||||
selected.te_revision = None
|
||||
if selected.te_cls.__name__ == 'UMT5EncoderModel' and shared.opts.te_shared_t5:
|
||||
selected.te = 'Wan-AI/Wan2.2-TI2V-5B-Diffusers'
|
||||
if 'SDNQ' in selected.name:
|
||||
selected.te = 'Disty0/Wan2.2-T2V-A14B-SDNQ-uint4-svd-r32'
|
||||
else:
|
||||
selected.te = 'Wan-AI/Wan2.2-TI2V-5B-Diffusers'
|
||||
selected.te_folder = 'text_encoder'
|
||||
selected.te_revision = None
|
||||
if selected.te_cls.__name__ == 'LlamaModel' and shared.opts.te_shared_t5:
|
||||
|
|
@ -154,3 +157,28 @@ def load_model(selected: models_def.Model):
|
|||
shared.log.debug(f'Video hijacks: decode={decode} text={text} image={image} slicing={slicing} tiling={tiling} framewise={framewise}')
|
||||
shared.state.end(jobid)
|
||||
return msg
|
||||
|
||||
|
||||
def load_upscale_vae():
|
||||
if not hasattr(shared.sd_model, 'vae'):
|
||||
return
|
||||
if hasattr(shared.sd_model.vae, '_asymmetric_upscale_vae'):
|
||||
return # already loaded
|
||||
cls = shared.sd_model.vae.__class__.__name__
|
||||
if cls != 'AutoencoderKLWan':
|
||||
shared.log.warning('Video decode: upscale VAE unsupported')
|
||||
return
|
||||
|
||||
import diffusers
|
||||
repo_id = 'spacepxl/Wan2.1-VAE-upscale2x'
|
||||
subfolder = "diffusers/Wan2.1_VAE_upscale2x_imageonly_real_v1"
|
||||
vae_decode = diffusers.AutoencoderKLWan.from_pretrained(repo_id, subfolder=subfolder, cache_dir=shared.opts.hfcache_dir)
|
||||
vae_decode.requires_grad_(False)
|
||||
vae_decode = vae_decode.to(device=devices.device, dtype=devices.dtype)
|
||||
vae_decode.eval()
|
||||
shared.log.debug(f'Decode: load={repo_id}')
|
||||
shared.sd_model.orig_vae = shared.sd_model.vae
|
||||
shared.sd_model.vae = vae_decode
|
||||
shared.sd_model.vae._asymmetric_upscale_vae = True # pylint: disable=protected-access
|
||||
sd_hijack_vae.init_hijack(shared.sd_model)
|
||||
sd_models.apply_balanced_offload(shared.sd_model, force=True) # reapply offload
|
||||
|
|
|
|||
|
|
@ -113,6 +113,11 @@ def generate(*args, **kwargs):
|
|||
video_overrides.set_overrides(p, selected)
|
||||
debug(f'Video: task_args={p.task_args}')
|
||||
|
||||
if p.vae_type == 'Upscale':
|
||||
video_load.load_upscale_vae()
|
||||
elif hasattr(shared.sd_model, 'orig_vae'):
|
||||
shared.sd_model.vae = shared.sd_model.orig_vae
|
||||
|
||||
# run processing
|
||||
shared.state.disable_preview = True
|
||||
shared.log.debug(f'Video: cls={shared.sd_model.__class__.__name__} width={p.width} height={p.height} frames={p.frames} steps={p.steps}')
|
||||
|
|
|
|||
|
|
@ -141,7 +141,7 @@ def create_ui(prompt, negative, styles, overrides, init_image, init_strength, la
|
|||
guidance_true = gr.Slider(label='True guidance', minimum=-1.0, maximum=14.0, step=0.1, value=-1.0, elem_id="video_guidance_true")
|
||||
with gr.Accordion(open=False, label="Decode", elem_id='video_decode_accordion'):
|
||||
with gr.Row():
|
||||
vae_type = gr.Dropdown(label='VAE decode', choices=['Default', 'Tiny', 'Remote'], value='Default', elem_id="video_vae_type")
|
||||
vae_type = gr.Dropdown(label='VAE decode', choices=['Default', 'Tiny', 'Remote', 'Upscale'], value='Default', elem_id="video_vae_type")
|
||||
vae_tile_frames = gr.Slider(label='Tile frames', minimum=1, maximum=64, step=1, value=16, elem_id="video_vae_tile_frames")
|
||||
|
||||
vlm_enhance, vlm_model, vlm_system_prompt = ui_video_vlm.create_ui(prompt_element=prompt, image_element=init_image)
|
||||
|
|
|
|||
Loading…
Reference in New Issue