diff --git a/CHANGELOG.md b/CHANGELOG.md index 51c92eec1..0aba17d79 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,22 +1,23 @@ # Change Log for SD.Next -## Update for 2025-12-08 +## Update for 2025-12-09 ### TBD Merge commit: `f903a36d9` -### Highlights for 2025-12-08 +### Highlights for 2025-12-09 -New native [kanvas](https://vladmandic.github.io/sdnext-docs/Kanvas/) module for image manipulation that fully replaces *img2img*, *inpaint* and *outpaint* controls, massive update to **Captioning/VQA** models and features -New generation of **Flux.2** large image model, new **Z-Image** model that is creating a lot of buzz and a first cloud model with **Google's Nano Banana** *2.5 Flash and 3.0 Pro* plus new **Photoroom PRX** model +New native [kanvas](https://vladmandic.github.io/sdnext-docs/Kanvas/) module for image manipulation that fully replaces *img2img*, *inpaint* and *outpaint* controls, massive update to **Captioning/VQA** models and features +New generation of **Flux.2** large image model, new **Z-Image** model that is creating a lot of buzz and a first cloud model with **Google's Nano Banana** *2.5 Flash and 3.0 Pro*, new **Photoroom PRX** model +Also new are **HunyuanVideo 1.5** and **Kandinsky 5 Pro** video models, plus a lot of internal improvements and fixes ![Screenshot](https://github.com/user-attachments/assets/54b25586-b611-4d70-a28f-ee3360944034) [ReadMe](https://github.com/vladmandic/automatic/blob/master/README.md) | [ChangeLog](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md) | [Docs](https://vladmandic.github.io/sdnext-docs/) | [WiKi](https://github.com/vladmandic/automatic/wiki) | [Discord](https://discord.com/invite/sd-next-federal-batch-inspectors-1101998836328697867) | [Sponsor](https://github.com/sponsors/vladmandic) -### Details for 2025-12-08 - +### Details for 2025-12-09 + - **Models** - [Black Forest Labs FLUX.2 Dev](https://bfl.ai/blog/flux-2) and prequantized variation [SDNQ-SVD-Uint4](https://huggingface.co/Disty0/FLUX.2-dev-SDNQ-uint4-svd-r32) **FLUX.2-Dev** is a brand new model from BFL and uses large 32B DiT together with Mistral 24B as text encoder @@ -33,6 +34,11 @@ New generation of **Flux.2** large image model, new **Z-Image** model that is cr *note*: need to set `GOOGLE_API_KEY` environment variable with your key to use this model - [Photoroom PRX 1024 Beta](https://huggingface.co/Photoroom/prx-1024-t2i-beta) PRX (Photoroom Experimental) is a small 1.3B parameter t2i model trained entirely from scratch, it uses T5-Gemma text-encoder + - [HunyuanVideo 1.5](https://huggingface.co/tencent/HunyuanVideo-1.5) in T2V and I2V variants, both standard and distilled and both 720p and 480p resolutions + **HunyuanVideo 1.5** improves upon previous 1.0 version with better quality and higher resolution outputs, it uses Qwen2.5-VL text-encoder + distilled variants provide faster generation with slightly reduced quality + - [Kandinsky 5.0 Pro Video](https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers) in T2V and I2V variants + larger 19B (and more powerful version) of previously released Lite 2B models - **Kanvas**: new module for native canvas-based image manipulation kanvas is a full replacement for *img2img, inpaint and outpaint* controls see [docs](https://vladmandic.github.io/sdnext-docs/Kanvas/) for details diff --git a/TODO.md b/TODO.md index 04ec94d77..df73a2409 100644 --- a/TODO.md +++ b/TODO.md @@ -4,12 +4,9 @@ - -## Kanvas - -- Reimplement llama remover - ## Internal +- Reimplement llama remover for kanvas - Deploy: Create executable for SD.Next - Feature: Integrate natural language image search [ImageDB](https://github.com/vladmandic/imagedb) @@ -35,13 +32,13 @@ - [SmoothCache](https://github.com/huggingface/diffusers/issues/11135) - [STG](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#spatiotemporal-skip-guidance) - [Video Inpaint Pipeline](https://github.com/huggingface/diffusers/pull/12506) +- [Sonic Inpaint](https://github.com/ubc-vision/sonic) ### New models / Pipelines TODO: *Prioritize*! -- [Kandinsky 5 Pro and Lite](https://github.com/huggingface/diffusers/pull/12664) -- [HunyuanVideo-1.5](https://github.com/huggingface/diffusers/pull/12696) +- [NewBie Image Exp0.1](https://github.com/huggingface/diffusers/pull/12803) - [Sana-I2V](https://github.com/huggingface/diffusers/pull/12634#issuecomment-3540534268) - [Bria FIBO](https://huggingface.co/briaai/FIBO) - [Bytedance Lynx](https://github.com/bytedance/lynx) diff --git a/modules/sd_detect.py b/modules/sd_detect.py index 76048fd86..f093e9585 100644 --- a/modules/sd_detect.py +++ b/modules/sd_detect.py @@ -121,6 +121,8 @@ def guess_by_name(fn, current_guess): new_guess = 'Kandinsky 2.2' elif 'kandinsky-3' in fn.lower(): new_guess = 'Kandinsky 3.0' + elif 'kandinsky-5.0' in fn.lower(): + new_guess = 'Kandinsky 5.0' elif 'hunyuanimage3' in fn.lower() or 'hunyuanimage-3' in fn.lower(): new_guess = 'HunyuanImage3' elif 'hunyuanimage' in fn.lower(): diff --git a/modules/video_models/models_def.py b/modules/video_models/models_def.py index bcb62bcf2..f2e2f21c9 100644 --- a/modules/video_models/models_def.py +++ b/modules/video_models/models_def.py @@ -35,21 +35,70 @@ try: 'None': [], 'Hunyuan Video': [ Model(name='None'), - Model(name='Hunyuan Video T2V', + Model(name='Hunyuan Video 1.5 T2V 720p', + url='https://huggingface.co/tencent/HunyuanVideo-1.5', + vae_remote=False, + repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v', + repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None), + dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)), + Model(name='Hunyuan Video 1.5 I2V 720p', + url='https://huggingface.co/tencent/HunyuanVideo-1.5', + vae_remote=False, + repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_i2v', + repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None), + dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)), + Model(name='Hunyuan Video 1.5 I2V 720p Distilled', + url='https://huggingface.co/tencent/HunyuanVideo-1.5', + vae_remote=False, + repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_i2v_distilled', + repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None), + dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)), + Model(name='Hunyuan Video 1.5 T2V 480p', + url='https://huggingface.co/tencent/HunyuanVideo-1.5', + vae_remote=False, + repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v', + repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None), + dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)), + Model(name='Hunyuan Video 1.5 T2V 480p Distilled', + url='https://huggingface.co/tencent/HunyuanVideo-1.5', + vae_remote=False, + repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v_distilled', + repo_cls=getattr(diffusers, 'HunyuanVideo15Pipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None), + dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)), + Model(name='Hunyuan Video 1.5 I2V 480p', + url='https://huggingface.co/tencent/HunyuanVideo-1.5', + vae_remote=False, + repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v', + repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None), + dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)), + Model(name='Hunyuan Video 1.5 I2V 480p Distilled', + url='https://huggingface.co/tencent/HunyuanVideo-1.5', + vae_remote=False, + repo='hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v_distilled', + repo_cls=getattr(diffusers, 'HunyuanVideo15ImageToVideoPipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLTextModel', None), + dit_cls=getattr(diffusers, 'HunyuanVideo15Transformer3DModel', None)), + Model(name='Hunyuan Video 1.0 T2V', url='https://huggingface.co/tencent/HunyuanVideo', vae_remote=True, repo='hunyuanvideo-community/HunyuanVideo', repo_cls=getattr(diffusers, 'HunyuanVideoPipeline', None), te_cls=getattr(transformers, 'LlamaModel', None), dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)), - Model(name='Hunyuan Video I2V', # https://github.com/huggingface/diffusers/pull/10983 + Model(name='Hunyuan Video 1.0 I2V', # https://github.com/huggingface/diffusers/pull/10983 url='https://huggingface.co/tencent/HunyuanVideo-I2V', vae_remote=True, repo='hunyuanvideo-community/HunyuanVideo-I2V', repo_cls=getattr(diffusers, 'HunyuanVideoImageToVideoPipeline', None), te_cls=getattr(transformers, 'LlavaForConditionalGeneration', None), dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)), - Model(name='SkyReels Hunyuan T2V', # https://github.com/huggingface/diffusers/pull/10837 + Model(name='SkyReels Hunyuan 1.0 T2V', # https://github.com/huggingface/diffusers/pull/10837 url='https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-T2V', vae_remote=True, repo='hunyuanvideo-community/HunyuanVideo', @@ -58,7 +107,7 @@ try: dit='Skywork/SkyReels-V1-Hunyuan-T2V', dit_folder=None, dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)), - Model(name='SkyReels Hunyuan I2V', # https://github.com/huggingface/diffusers/pull/10837 + Model(name='SkyReels Hunyuan 1.0 I2V', # https://github.com/huggingface/diffusers/pull/10837 url='https://huggingface.co/Skywork/SkyReels-V1-Hunyuan-I2V', vae_remote=True, repo='hunyuanvideo-community/HunyuanVideo', @@ -67,7 +116,7 @@ try: dit='Skywork/SkyReels-V1-Hunyuan-I2V', dit_folder=None, dit_cls=getattr(diffusers, 'HunyuanVideoTransformer3DModel', None)), - Model(name='Fast Hunyuan T2V', # https://github.com/hao-ai-lab/FastVideo/blob/8a77cf22c9b9e7f931f42bc4b35d21fd91d24e45/fastvideo/models/hunyuan/inference.py#L213 + Model(name='Fast Hunyuan 1.0 T2V', # https://github.com/hao-ai-lab/FastVideo/blob/8a77cf22c9b9e7f931f42bc4b35d21fd91d24e45/fastvideo/models/hunyuan/inference.py#L213 url='https://huggingface.co/FastVideo/FastHunyuan', vae_remote=True, repo='hunyuanvideo-community/HunyuanVideo', @@ -383,41 +432,53 @@ try: ], 'Kandinsky': [ Model(name='Kandinsky 5.0 Lite 5s SFT T2V', - url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers', - repo='ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers', + url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers', + repo='kandinskylab/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers', repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None), te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None), dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)), Model(name='Kandinsky 5.0 Lite 5s CFG-distilled T2V', - url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers', - repo='ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers', + url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers', + repo='kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers', repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None), te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None), dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)), Model(name='Kandinsky 5.0 Lite 5s Steps-distilled T2V', - url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers', - repo='ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers', + url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers', + repo='kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers', repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None), te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None), dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)), Model(name='Kandinsky 5.0 Lite 10s SFT T2V', - url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers', - repo='ai-forever/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers', + url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers', + repo='kandinskylab/Kandinsky-5.0-T2V-Lite-sft-10s-Diffusers', repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None), te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None), dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)), Model(name='Kandinsky 5.0 Lite 10s CFG-distilled T2V', - url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers', - repo='ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers', + url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers', + repo='kandinskylab/Kandinsky-5.0-T2V-Lite-nocfg-10s-Diffusers', repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None), te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None), dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)), Model(name='Kandinsky 5.0 Lite 10s Steps-distilled T2V', - url='https://huggingface.co/ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers', - repo='ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers', + url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers', + repo='kandinskylab/Kandinsky-5.0-T2V-Lite-distilled16steps-10s-Diffusers', repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None), te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None), dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)), + Model(name='Kandinsky 5.0 Pro 5s SFT T2V', + url='https://huggingface.co/kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers', + repo='kandinskylab/Kandinsky-5.0-T2V-Pro-sft-5s-Diffusers', + repo_cls=getattr(diffusers, 'Kandinsky5T2VPipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None), + dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)), + Model(name='Kandinsky 5.0 Pro 5s SFT I2V', + url='https://huggingface.co/kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusers', + repo='kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusers', + repo_cls=getattr(diffusers, 'Kandinsky5I2VPipeline', None), + te_cls=getattr(transformers, 'Qwen2_5_VLForConditionalGeneration', None), + dit_cls=getattr(diffusers, 'Kandinsky5Transformer3DModel', None)), ], } t1 = time.time()