automatic/TODO.md

10 KiB

TODO

Release

  • Implement unload_auxiliary_models
  • Release Launcher
  • Release Enso
  • Update ROCm
  • Tips Color Grading
  • Regen Localization

Internal

  • Integrate: Depth3D
  • Feature: Color grading in processing
  • Feature: RIFE update
  • Feature: RIFE in processing
  • Feature: SeedVR2 in processing
  • Feature: Add video models to Reference
  • Deploy: Lite vs Expert mode
  • Engine: mmgp
  • Engine: TensorRT acceleration
  • Feature: Auto handle scheduler prediction_type
  • Feature: Cache models in memory
  • Validate: Control tab add overrides handling
  • Feature: Integrate natural language image search ImageDB
  • Feature: Multi-user support
  • Feature: Settings profile manager
  • Feature: Video tab add full API support
  • Refactor: Unify huggingface and diffusers model folders
  • Refactor: GGUF
  • Refactor: move sampler options from settings to config
  • Reimplement llama remover for Kanvas, pending end-to-end review of Kanvas

OnHold

  • Feature: LoRA add OMI format support for SD35/FLUX.1, on-hold
  • Feature: Remote Text-Encoder support, sidelined for the moment

Modular

Pending finalization of modular pipelines implementation and development of compatibility layer

New models / Pipelines

TODO: Investigate which models are diffusers-compatible and prioritize!

Image-Base

  • Chroma Zeta: Image and video generator for creative effects and professional filters
  • Chroma Radiance: Pixel-space model eliminating VAE artifacts for high visual fidelity
  • Liquid: Unified vision-language auto-regressive generation paradigm
  • Lumina-DiMOO: Foundational multi-modal generation and understanding via discrete diffusion
  • nVidia Cosmos-Predict-2.5: Physics-aware world foundation model for consistent scene prediction
  • Liquid (unified multimodal generator): Auto-regressive generation paradigm across vision and language
  • Lumina-DiMOO: foundational multi-modal multi-task generation and understanding

Image-Edit

  • Meituan LongCat-Image-Edit-Turbo:6B instruction-following image editing with high visual consistency
  • VIBE Image-Edit: (Sana+Qwen-VL)Fast visual instruction-based image editing framework
  • LucyEdit:Instruction-guided video editing while preserving motion and identity
  • Step1X-Edit:Multimodal image editing decoding MLLM tokens via DiT
  • OneReward:Reinforcement learning grounded generative reward model for image editing
  • ByteDance DreamO: image customization framework for IP adaptation and virtual try-on

Video

Other/Unsorted

  • DiffusionForcing: Full-sequence diffusion with autoregressive next-token prediction
  • Self-Forcing: Framework for improving temporal consistency in long-horizon video generation
  • SEVA: Stable Virtual Camera for novel view synthesis and 3D-consistent video
  • ByteDance USO: Unified Style-Subject Optimized framework for personalized image generation
  • ByteDance Lynx: State-of-the-art high-fidelity personalized video generation based on DiT
  • LanDiff: Coarse-to-fine text-to-video integrating Language and Diffusion Models
  • Video Inpaint Pipeline: Unified inpainting pipeline implementation within Diffusers library
  • Sonic Inpaint: Audio-driven portrait animation system focus on global audio perception
  • Make-It-Count: CountGen method for precise numerical control of objects via object identity features
  • ControlNeXt: Lightweight architecture for efficient controllable image and video generation
  • MS-Diffusion: Layout-guided multi-subject image personalization framework
  • UniRef: Unified model for segmentation tasks designed as foundation model plug-in
  • FlashFace: High-fidelity human image customization and face swapping framework
  • ReNO: Reward-based Noise Optimization to improve text-to-image quality during inference

Not Planned

Code TODO

npm run todo

  • fc: autodetect distilled based on model
  • fc: autodetect tensor format based on model
  • hypertile: vae breaks when using non-standard sizes
  • install: switch to pytorch source when it becomes available
  • loader: load receipe
  • loader: save receipe
  • lora: add other quantization types
  • lora: add t5 key support for sd35/f1
  • lora: maybe force imediate quantization
  • model load: force-reloading entire model as loading transformers only leads to massive memory usage
  • model load: implement model in-memory caching
  • modernui: monkey-patch for missing tabs.select event
  • modules/lora/lora_extract.py:188:9: W0511: TODO: lora: support pre-quantized flux
  • modules/modular_guiders.py:65:58: W0511: TODO: guiders
  • processing: remove duplicate mask params
  • resize image: enable full VAE mode for resize-latent