update video models

Signed-off-by: vladmandic <mandic00@live.com>
2025-12-09 09:22:28 +01:00 · 2025-12-09 09:22:28 +01:00 · f91af19094
parent 1c2a81ee2d
commit f91af19094
4 changed files with 95 additions and 29 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,22 +1,23 @@
 # Change Log for SD.Next

-## Update for 2025-12-08
+## Update for 2025-12-09

 ### TBD

 Merge commit: `f903a36d9`

-### Highlights for 2025-12-08
+### Highlights for 2025-12-09

-New native [kanvas](https://vladmandic.github.io/sdnext-docs/Kanvas/) module for image manipulation that fully replaces *img2img*, *inpaint* and *outpaint* controls, massive update to **Captioning/VQA** models and features   
-New generation of **Flux.2** large image model, new **Z-Image** model that is creating a lot of buzz and a first cloud model with **Google's Nano Banana** *2.5 Flash and 3.0 Pro* plus new **Photoroom PRX** model  
+New native [kanvas](https://vladmandic.github.io/sdnext-docs/Kanvas/) module for image manipulation that fully replaces *img2img*, *inpaint* and *outpaint* controls, massive update to **Captioning/VQA** models and features  
+New generation of **Flux.2** large image model, new **Z-Image** model that is creating a lot of buzz and a first cloud model with **Google's Nano Banana** *2.5 Flash and 3.0 Pro*, new **Photoroom PRX** model  
+Also new are **HunyuanVideo 1.5** and **Kandinsky 5 Pro** video models, plus a lot of internal improvements and fixes  

 ![Screenshot](https://github.com/user-attachments/assets/54b25586-b611-4d70-a28f-ee3360944034)

 [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic)  

-### Details for 2025-12-08
-
+### Details for 2025-12-09
+  
 - **Models**
  - [Black Forest Labs FLUX.2 Dev](https://bfl.ai/blog/flux-2) and prequantized variation [SDNQ-SVD-Uint4](https://huggingface.co/Disty0/FLUX.2-dev-SDNQ-uint4-svd-r32)  
    **FLUX.2-Dev** is a brand new model from BFL and uses large 32B DiT together with Mistral 24B as text encoder  
@ -33,6 +34,11 @@ New generation of **Flux.2** large image model, new **Z-Image** model that is cr
    *note*: need to set `GOOGLE_API_KEY` environment variable with your key to use this model  
  - [Photoroom PRX 1024 Beta](https://huggingface.co/Photoroom/prx-1024-t2i-beta)  
    PRX (Photoroom Experimental) is a small 1.3B parameter t2i model trained entirely from scratch, it uses T5-Gemma text-encoder  
+  - [HunyuanVideo 1.5](https://huggingface.co/tencent/HunyuanVideo-1.5) in T2V and I2V variants, both standard and distilled and both 720p and 480p resolutions  
+    **HunyuanVideo 1.5** improves upon previous 1.0 version with better quality and higher resolution outputs, it uses Qwen2.5-VL text-encoder  
+    distilled variants provide faster generation with slightly reduced quality  
+  - [Kandinsky 5.0 Pro Video](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers) in T2V and I2V variants  
+    larger 19B (and more powerful version) of previously released Lite 2B models  
 - **Kanvas**: new module for native canvas-based image manipulation  
  kanvas is a full replacement for *img2img, inpaint and outpaint* controls  
  see [docs](https://vladmandic.github.io/sdnext-docs/Kanvas/) for details  
--- a/TODO.md
+++ b/TODO.md
@ -4,12 +4,9 @@

 - <https://github.com/users/vladmandic/projects>

-## Kanvas
-
- Reimplement llama remover  
-
 ## Internal

+- Reimplement llama remover for kanvas
 - Deploy: Create executable for SD.Next  
 - Feature: Integrate natural language image search  
  [ImageDB](https://github.com/vladmandic/imagedb)  
@ -35,13 +32,13 @@
 - [SmoothCache](https://github.com/huggingface/diffusers/issues/11135)  
 - [STG](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#spatiotemporal-skip-guidance)  
 - [Video Inpaint Pipeline](https://github.com/huggingface/diffusers/pull/12506)
+- [Sonic Inpaint](https://github.com/ubc-vision/sonic)

 ### New models / Pipelines

 TODO: *Prioritize*!

- [Kandinsky 5 Pro and Lite](https://github.com/huggingface/diffusers/pull/12664)
- [HunyuanVideo-1.5](https://github.com/huggingface/diffusers/pull/12696)
+- [NewBie Image Exp0.1](https://github.com/huggingface/diffusers/pull/12803)
 - [Sana-I2V](https://github.com/huggingface/diffusers/pull/12634#issuecomment-3540534268)
 - [Bria FIBO](https://huggingface.co/briaai/FIBO)
 - [Bytedance Lynx](https://github.com/bytedance/lynx)
--- a/modules/sd_detect.py
+++ b/modules/sd_detect.py
@ -121,6 +121,8 @@ def guess_by_name(fn, current_guess):
        new_guess = 'Kandinsky 2.2'
    elif 'kandinsky-3' in fn.lower():
        new_guess = 'Kandinsky 3.0'
+    elif 'kandinsky-5.0' in fn.lower():
+        new_guess = 'Kandinsky 5.0'
    elif 'hunyuanimage3' in fn.lower() or 'hunyuanimage-3' in fn.lower():
        new_guess = 'HunyuanImage3'
    elif 'hunyuanimage' in fn.lower():
--- a/modules/video_models/models_def.py
+++ b/modules/video_models/models_def.py
@ -35,21 +35,70 @@ try:
        'None': [],
        'Hunyuan Video': [
            Model(name='None'),
-            Model(name='Hunyuan Video T2V',
+            Model(name='Hunyuan Video 1.5 T2V 720p',
+                url='https://huggingface.co/tencent/HunyuanVideo-1.5',
+                vae_remote=False,
+                repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v',
+                repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
+                dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
+            Model(name='Hunyuan Video 1.5 I2V 720p',
+                url='https://huggingface.co/tencent/HunyuanVideo-1.5',
+                vae_remote=False,
+                repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_i2v',
+                repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
+                dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
+            Model(name='Hunyuan Video 1.5 I2V 720p Distilled',
+                url='https://huggingface.co/tencent/HunyuanVideo-1.5',
+                vae_remote=False,
+                repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_i2v_distilled',
+                repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
+                dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
+            Model(name='Hunyuan Video 1.5 T2V 480p',
+                url='https://huggingface.co/tencent/HunyuanVideo-1.5',
+                vae_remote=False,
+                repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v',
+                repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
+                dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
+            Model(name='Hunyuan Video 1.5 T2V 480p Distilled',
+                url='https://huggingface.co/tencent/HunyuanVideo-1.5',
+                vae_remote=False,
+                repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v_distilled',
+                repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
+                dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
+            Model(name='Hunyuan Video 1.5 I2V 480p',
+                url='https://huggingface.co/tencent/HunyuanVideo-1.5',
+                vae_remote=False,
+                repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v',
+                repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
+                dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
+            Model(name='Hunyuan Video 1.5 I2V 480p Distilled',
+                url='https://huggingface.co/tencent/HunyuanVideo-1.5',
+                vae_remote=False,
+                repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v_distilled',
+                repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None),
+                dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)),
+            Model(name='Hunyuan Video 1.0 T2V',
                url='https://huggingface.co/tencent/HunyuanVideo',
                vae_remote=True,
                repo='hunyuanvideo-community/HunyuanVideo',
                repo_cls=getattr(diffusers, 'HunyuanVideoPipeline', None),
                te_cls=getattr(transformers, 'LlamaModel', None),
                dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)),
-            Model(name='Hunyuan Video I2V', # https://github.com/huggingface/diffusers/pull/10983
+            Model(name='Hunyuan Video 1.0 I2V', # https://github.com/huggingface/diffusers/pull/10983
                url='https://huggingface.co/tencent/HunyuanVideo-I2V',
                vae_remote=True,
                repo='hunyuanvideo-community/HunyuanVideo-I2V',
                repo_cls=getattr(diffusers, 'HunyuanVideoImageToVideoPipeline', None),
                te_cls=getattr(transformers, 'LlavaForConditionalGeneration', None),
                dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)),
-            Model(name='SkyReels Hunyuan T2V', # https://github.com/huggingface/diffusers/pull/10837
+            Model(name='SkyReels Hunyuan 1.0 T2V', # https://github.com/huggingface/diffusers/pull/10837
                url='https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-T2V',
                vae_remote=True,
                repo='hunyuanvideo-community/HunyuanVideo',
@ -58,7 +107,7 @@ try:
                dit='Skywork/SkyReels-V1-Hunyuan-T2V',
                dit_folder=None,
                dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)),
-            Model(name='SkyReels Hunyuan I2V', # https://github.com/huggingface/diffusers/pull/10837
+            Model(name='SkyReels Hunyuan 1.0 I2V', # https://github.com/huggingface/diffusers/pull/10837
                url='https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-I2V',
                vae_remote=True,
                repo='hunyuanvideo-community/HunyuanVideo',
@ -67,7 +116,7 @@ try:
                dit='Skywork/SkyReels-V1-Hunyuan-I2V',
                dit_folder=None,
                dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)),
-            Model(name='Fast Hunyuan T2V', # https://github.com/hao-ai-lab/FastVideo/blob/8a77cf22c9b9e7f931f42bc4b35d21fd91d24e45/fastvideo/models/hunyuan/inference.py#L213
+            Model(name='Fast Hunyuan 1.0 T2V', # https://github.com/hao-ai-lab/FastVideo/blob/8a77cf22c9b9e7f931f42bc4b35d21fd91d24e45/fastvideo/models/hunyuan/inference.py#L213
                url='https://huggingface.co/FastVideo/FastHunyuan',
                vae_remote=True,
                repo='hunyuanvideo-community/HunyuanVideo',
@ -383,41 +432,53 @@ try:
        ],
        'Kandinsky': [
            Model(name='Kandinsky 5.0 Lite 5s SFT T2V',
-                url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers',
-                repo='ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers',
+                url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers',
+                repo='kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers',
                repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
                te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
                dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
            Model(name='Kandinsky 5.0 Lite 5s CFG-distilled T2V',
-                url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers',
-                repo='ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers',
+                url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers',
+                repo='kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers',
                repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
                te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
                dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
            Model(name='Kandinsky 5.0 Lite 5s Steps-distilled T2V',
-                url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers',
-                repo='ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers',
+                url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers',
+                repo='kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers',
                repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
                te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
                dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
            Model(name='Kandinsky 5.0 Lite 10s SFT T2V',
-                url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers',
-                repo='ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers',
+                url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers',
+                repo='kandinskylab/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers',
                repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
                te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
                dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
            Model(name='Kandinsky 5.0 Lite 10s CFG-distilled T2V',
-                url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers',
-                repo='ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers',
+                url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers',
+                repo='kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers',
                repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
                te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
                dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
            Model(name='Kandinsky 5.0 Lite 10s Steps-distilled T2V',
-                url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers',
-                repo='ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers',
+                url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers',
+                repo='kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers',
                repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
                te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
                dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
+            Model(name='Kandinsky 5.0 Pro 5s SFT T2V',
+                url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers',
+                repo='kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers',
+                repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
+                dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
+            Model(name='Kandinsky 5.0 Pro 5s SFT I2V',
+                url='https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusers',
+                repo='kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusers',
+                repo_cls=getattr(diffusers, 'Kandinsky5I2VPipeline', None),
+                te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None),
+                dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)),
        ],
    }
    t1 = time.time()