update video models

Signed-off-by: vladmandic <mandic00@live.com>
pull/4456/head
vladmandic 2025-12-09 09:22:28 +01:00
parent 1c2a81ee2d
commit f91af19094
4 changed files with 95 additions and 29 deletions

View File

@ -1,22 +1,23 @@
# Change Log for SD.Next
## Update for 2025-12-08
## Update for 2025-12-09
### TBD
Merge commit: `f903a36d9`
### Highlights for 2025-12-08
### Highlights for 2025-12-09
New native [kanvas](https://vladmandic.github.io/sdnext-docs/Kanvas/) module for image manipulation that fully replaces *img2img*, *inpaint* and *outpaint* controls, massive update to **Captioning/VQA** models and features
New generation of **Flux.2** large image model, new **Z-Image** model that is creating a lot of buzz and a first cloud model with **Google's Nano Banana** *2.5 Flash and 3.0 Pro* plus new **Photoroom PRX** model
New native [kanvas](https://vladmandic.github.io/sdnext-docs/Kanvas/) module for image manipulation that fully replaces *img2img*, *inpaint* and *outpaint* controls, massive update to **Captioning/VQA** models and features
New generation of **Flux.2** large image model, new **Z-Image** model that is creating a lot of buzz and a first cloud model with **Google's Nano Banana** *2.5 Flash and 3.0 Pro*, new **Photoroom PRX** model
Also new are **HunyuanVideo 1.5** and **Kandinsky 5 Pro** video models, plus a lot of internal improvements and fixes
![Screenshot](https://github.com/user-attachments/assets/54b25586-b611-4d70-a28f-ee3360944034)
[ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic)
### Details for 2025-12-08
### Details for 2025-12-09
- **Models**
- [Black Forest Labs FLUX.2 Dev](https://bfl.ai/blog/flux-2) and prequantized variation [SDNQ-SVD-Uint4](https://huggingface.co/Disty0/FLUX.2-dev-SDNQ-uint4-svd-r32)
**FLUX.2-Dev** is a brand new model from BFL and uses large 32B DiT together with Mistral 24B as text encoder
@ -33,6 +34,11 @@ New generation of **Flux.2** large image model, new **Z-Image** model that is cr
*note*: need to set `GOOGLE_API_KEY` environment variable with your key to use this model
- [Photoroom PRX 1024 Beta](https://huggingface.co/Photoroom/prx-1024-t2i-beta)
PRX (Photoroom Experimental) is a small 1.3B parameter t2i model trained entirely from scratch, it uses T5-Gemma text-encoder
- [HunyuanVideo 1.5](https://huggingface.co/tencent/HunyuanVideo-1.5) in T2V and I2V variants, both standard and distilled and both 720p and 480p resolutions
**HunyuanVideo 1.5** improves upon previous 1.0 version with better quality and higher resolution outputs, it uses Qwen2.5-VL text-encoder
distilled variants provide faster generation with slightly reduced quality
- [Kandinsky 5.0 Pro Video](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers) in T2V and I2V variants
larger 19B (and more powerful version) of previously released Lite 2B models
- **Kanvas**: new module for native canvas-based image manipulation
kanvas is a full replacement for *img2img, inpaint and outpaint* controls
see [docs](https://vladmandic.github.io/sdnext-docs/Kanvas/) for details

View File

@ -4,12 +4,9 @@
- <https://github.com/users/vladmandic/projects>
## Kanvas
- Reimplement llama remover
## Internal
- Reimplement llama remover for kanvas
- Deploy: Create executable for SD.Next
- Feature: Integrate natural language image search
[ImageDB](https://github.com/vladmandic/imagedb)
@ -35,13 +32,13 @@
- [SmoothCache](https://github.com/huggingface/diffusers/issues/11135)
- [STG](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#spatiotemporal-skip-guidance)
- [Video Inpaint Pipeline](https://github.com/huggingface/diffusers/pull/12506)
- [Sonic Inpaint](https://github.com/ubc-vision/sonic)
### New models / Pipelines
TODO: *Prioritize*!
- [Kandinsky 5 Pro and Lite](https://github.com/huggingface/diffusers/pull/12664)
- [HunyuanVideo-1.5](https://github.com/huggingface/diffusers/pull/12696)
- [NewBie Image Exp0.1](https://github.com/huggingface/diffusers/pull/12803)
- [Sana-I2V](https://github.com/huggingface/diffusers/pull/12634#issuecomment-3540534268)
- [Bria FIBO](https://huggingface.co/briaai/FIBO)
- [Bytedance Lynx](https://github.com/bytedance/lynx)

View File

@ -121,6 +121,8 @@ def guess_by_name(fn, current_guess):
new_guess = 'Kandinsky 2.2'
elif 'kandinsky-3' in fn.lower():
new_guess = 'Kandinsky 3.0'
elif 'kandinsky-5.0' in fn.lower():
new_guess = 'Kandinsky 5.0'
elif 'hunyuanimage3' in fn.lower() or 'hunyuanimage-3' in fn.lower():
new_guess = 'HunyuanImage3'
elif 'hunyuanimage' in fn.lower():

View File

@ -35,21 +35,70 @@ try:
'None': [],
'Hunyuan Video': [
Model(name='None'),
Model(name='Hunyuan Video T2V',
Model(name='Hunyuan Video 1.5 T2V 720p',
url='https://huggingface.co/tencent/HunyuanVideo-1.5',
vae_remote=False,
repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v',
repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
Model(name='Hunyuan Video 1.5 I2V 720p',
url='https://huggingface.co/tencent/HunyuanVideo-1.5',
vae_remote=False,
repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_i2v',
repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
Model(name='Hunyuan Video 1.5 I2V 720p Distilled',
url='https://huggingface.co/tencent/HunyuanVideo-1.5',
vae_remote=False,
repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_i2v_distilled',
repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
Model(name='Hunyuan Video 1.5 T2V 480p',
url='https://huggingface.co/tencent/HunyuanVideo-1.5',
vae_remote=False,
repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v',
repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
Model(name='Hunyuan Video 1.5 T2V 480p Distilled',
url='https://huggingface.co/tencent/HunyuanVideo-1.5',
vae_remote=False,
repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v_distilled',
repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
Model(name='Hunyuan Video 1.5 I2V 480p',
url='https://huggingface.co/tencent/HunyuanVideo-1.5',
vae_remote=False,
repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v',
repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
Model(name='Hunyuan Video 1.5 I2V 480p Distilled',
url='https://huggingface.co/tencent/HunyuanVideo-1.5',
vae_remote=False,
repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v_distilled',
repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
Model(name='Hunyuan Video 1.0 T2V',
url='https://huggingface.co/tencent/HunyuanVideo',
vae_remote=True,
repo='hunyuanvideo-community/HunyuanVideo',
repo_cls=getattr(diffusers, 'HunyuanVideoPipeline', None),
te_cls=getattr(transformers, 'LlamaModel', None),
dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)),
Model(name='Hunyuan Video I2V', # https://github.com/huggingface/diffusers/pull/10983
Model(name='Hunyuan Video 1.0 I2V', # https://github.com/huggingface/diffusers/pull/10983
url='https://huggingface.co/tencent/HunyuanVideo-I2V',
vae_remote=True,
repo='hunyuanvideo-community/HunyuanVideo-I2V',
repo_cls=getattr(diffusers, 'HunyuanVideoImageToVideoPipeline', None),
te_cls=getattr(transformers, 'LlavaForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)),
Model(name='SkyReels Hunyuan T2V', # https://github.com/huggingface/diffusers/pull/10837
Model(name='SkyReels Hunyuan 1.0 T2V', # https://github.com/huggingface/diffusers/pull/10837
url='https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-T2V',
vae_remote=True,
repo='hunyuanvideo-community/HunyuanVideo',
@ -58,7 +107,7 @@ try:
dit='Skywork/SkyReels-V1-Hunyuan-T2V',
dit_folder=None,
dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)),
Model(name='SkyReels Hunyuan I2V', # https://github.com/huggingface/diffusers/pull/10837
Model(name='SkyReels Hunyuan 1.0 I2V', # https://github.com/huggingface/diffusers/pull/10837
url='https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-I2V',
vae_remote=True,
repo='hunyuanvideo-community/HunyuanVideo',
@ -67,7 +116,7 @@ try:
dit='Skywork/SkyReels-V1-Hunyuan-I2V',
dit_folder=None,
dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)),
Model(name='Fast Hunyuan T2V', # https://github.com/hao-ai-lab/FastVideo/blob/8a77cf22c9b9e7f931f42bc4b35d21fd91d24e45/fastvideo/models/hunyuan/inference.py#L213
Model(name='Fast Hunyuan 1.0 T2V', # https://github.com/hao-ai-lab/FastVideo/blob/8a77cf22c9b9e7f931f42bc4b35d21fd91d24e45/fastvideo/models/hunyuan/inference.py#L213
url='https://huggingface.co/FastVideo/FastHunyuan',
vae_remote=True,
repo='hunyuanvideo-community/HunyuanVideo',
@ -383,41 +432,53 @@ try:
],
'Kandinsky': [
Model(name='Kandinsky 5.0 Lite 5s SFT T2V',
url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers',
repo='ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers',
url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers',
repo='kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers',
repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
Model(name='Kandinsky 5.0 Lite 5s CFG-distilled T2V',
url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers',
repo='ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers',
url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers',
repo='kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers',
repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
Model(name='Kandinsky 5.0 Lite 5s Steps-distilled T2V',
url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers',
repo='ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers',
url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers',
repo='kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers',
repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
Model(name='Kandinsky 5.0 Lite 10s SFT T2V',
url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers',
repo='ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers',
url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers',
repo='kandinskylab/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers',
repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
Model(name='Kandinsky 5.0 Lite 10s CFG-distilled T2V',
url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers',
repo='ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers',
url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers',
repo='kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers',
repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
Model(name='Kandinsky 5.0 Lite 10s Steps-distilled T2V',
url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers',
repo='ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers',
url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers',
repo='kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers',
repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
Model(name='Kandinsky 5.0 Pro 5s SFT T2V',
url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers',
repo='kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers',
repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
Model(name='Kandinsky 5.0 Pro 5s SFT I2V',
url='https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusers',
repo='kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusers',
repo_cls=getattr(diffusers, 'Kandinsky5I2VPipeline', None),
te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
],
}
t1 = time.time()