10 KiB

Raw Blame History

TODO

Release

Implement unload_auxiliary_models
Release Launcher
Release Enso
Update ROCm
Tips Color Grading
Regen Localization

Internal

Integrate: Depth3D
Feature: Color grading in processing
Feature: RIFE update
Feature: RIFE in processing
Feature: SeedVR2 in processing
Feature: Add video models to Reference
Deploy: Lite vs Expert mode
Engine: mmgp
Engine: TensorRT acceleration
Feature: Auto handle scheduler prediction_type
Feature: Cache models in memory
Validate: Control tab add overrides handling
Feature: Integrate natural language image search ImageDB
Feature: Multi-user support
Feature: Settings profile manager
Feature: Video tab add full API support
Refactor: Unify huggingface and diffusers model folders
Refactor: GGUF
Refactor: move sampler options from settings to config
Reimplement llama remover for Kanvas, pending end-to-end review of Kanvas

OnHold

Feature: LoRA add OMI format support for SD35/FLUX.1, on-hold
Feature: Remote Text-Encoder support, sidelined for the moment

Modular

Pending finalization of modular pipelines implementation and development of compatibility layer

Switch to modular pipelines
Feature: Transformers unified cache handler
Refactor: Modular pipelines and guiders
MagCache
SmoothCache
STG

New models / Pipelines

TODO: Investigate which models are diffusers-compatible and prioritize!

Image-Base

Chroma Zeta: Image and video generator for creative effects and professional filters
Chroma Radiance: Pixel-space model eliminating VAE artifacts for high visual fidelity
Liquid: Unified vision-language auto-regressive generation paradigm
Lumina-DiMOO: Foundational multi-modal generation and understanding via discrete diffusion
nVidia Cosmos-Predict-2.5: Physics-aware world foundation model for consistent scene prediction
Liquid (unified multimodal generator): Auto-regressive generation paradigm across vision and language
Lumina-DiMOO: foundational multi-modal multi-task generation and understanding

Image-Edit

Meituan LongCat-Image-Edit-Turbo:6B instruction-following image editing with high visual consistency
VIBE Image-Edit: (Sana+Qwen-VL)Fast visual instruction-based image editing framework
LucyEdit:Instruction-guided video editing while preserving motion and identity
Step1X-Edit:Multimodal image editing decoding MLLM tokens via DiT
OneReward:Reinforcement learning grounded generative reward model for image editing
ByteDance DreamO: image customization framework for IP adaptation and virtual try-on

Video

OpenMOSS MOVA: Unified foundation model for synchronized high-fidelity video and audio
Wan family (Wan2.1 / Wan2.2 variants): MoE-based foundational tools for cinematic T2V/I2V/TI2V example: Wan2.1-T2V-14B-CausVid distill / step-distill examples: Wan2.1-StepDistill-CfgDistill
Krea Realtime Video: (Wan2.1)Distilled real-time video diffusion using self-forcing techniques
MAGI-1 (autoregressive video): Autoregressive video generation allowing infinite and timeline control
MUG-V 10B (video generation): large-scale DiT-based video generation system trained via flow-matching
Ovi (audio/video generation): (Wan2.2)Speech-to-video with synchronized sound effects and music
HunyuanVideo-Avatar / HunyuanCustom: (HunyuanVideo)MM-DiT based dynamic emotion-controllable dialogue generation
Sana Image→Video (Sana-I2V): (Sana)Compact Linear DiT framework for efficient high-resolution video
Wan-2.2 S2V (diffusers PR): (Wan2.2)Audio-driven cinematic speech-to-video generation
LongCat-Video: Unified framework for minutes-long coherent video generation via Block Sparse Attention
LTXVideo / LTXVideo LongMulti (diffusers PR): Real-time DiT-based generation with production-ready camera controls
DiffSynth-Studio (ModelScope): (Wan2.2)Comprehensive training and quantization tools for Wan video models
Phantom (Phantom HuMo): Human-centric video generation framework focus on subject ID consistency
CausVid-Plus / WAN-CausVid-Plus: (Wan2.1)Causal diffusion for high-quality temporally consistent long videos
Wan2GP (workflow/GUI for Wan): (Wan)Web-based UI focused on running complex video models for GPU-poor setups
LivePortrait: Efficient portrait animation system with high stitching and retargeting control
Magi (SandAI): High-quality autoregressive video generation framework
Ming (inclusionAI): Unified multimodal model for processing text, audio, image, and video

Other/Unsorted

DiffusionForcing: Full-sequence diffusion with autoregressive next-token prediction
Self-Forcing: Framework for improving temporal consistency in long-horizon video generation
SEVA: Stable Virtual Camera for novel view synthesis and 3D-consistent video
ByteDance USO: Unified Style-Subject Optimized framework for personalized image generation
ByteDance Lynx: State-of-the-art high-fidelity personalized video generation based on DiT
LanDiff: Coarse-to-fine text-to-video integrating Language and Diffusion Models
Video Inpaint Pipeline: Unified inpainting pipeline implementation within Diffusers library
Sonic Inpaint: Audio-driven portrait animation system focus on global audio perception
Make-It-Count: CountGen method for precise numerical control of objects via object identity features
ControlNeXt: Lightweight architecture for efficient controllable image and video generation
MS-Diffusion: Layout-guided multi-subject image personalization framework
UniRef: Unified model for segmentation tasks designed as foundation model plug-in
FlashFace: High-fidelity human image customization and face swapping framework
ReNO: Reward-based Noise Optimization to improve text-to-image quality during inference

Not Planned

Bria FIBO: Fully JSON based
Bria FiboEdit: Fully JSON based
LoRAdapter: Not recently updated
SD3 UltraEdit: Based on SD3
PowerPaint: Based on SD15
FreeCustom: Based on SD15
AnyDoor: Based on SD21
AnyText2: Based on SD15
DragonDiffusion: Based on SD15
DenseDiffusion: Based on SD15
IC-Light: Based on SD15

Code TODO

npm run todo

fc: autodetect distilled based on model
fc: autodetect tensor format based on model
hypertile: vae breaks when using non-standard sizes
install: switch to pytorch source when it becomes available
loader: load receipe
loader: save receipe
lora: add other quantization types
lora: add t5 key support for sd35/f1
lora: maybe force imediate quantization
model load: force-reloading entire model as loading transformers only leads to massive memory usage
model load: implement model in-memory caching
modernui: monkey-patch for missing tabs.select event
modules/lora/lora_extract.py:188:9: W0511: TODO: lora: support pre-quantized flux
modules/modular_guiders.py:65:58: W0511: TODO: guiders
processing: remove duplicate mask params
resize image: enable full VAE mode for resize-latent

10 KiB Raw Blame History

TODO

Release

Internal

OnHold

Modular

New models / Pipelines

Image-Base

Image-Edit

Video

Other/Unsorted

Not Planned

Code TODO

10 KiB

Raw Blame History