automatic/TODO.md

11 KiB

TODO

Internal

  • Update: transformers==5.0.0, owner @CalamitousFelicitousness
  • Deploy: Create executable for SD.Next
  • Deploy: Lite vs Expert mode
  • Engine: mmgp
  • Engine: sharpfin instead of torchvision
  • Engine: TensorRT acceleration
  • Feature: Auto handle scheduler prediction_type
  • Feature: Cache models in memory
  • Feature: Control tab add overrides handling
  • Feature: Integrate natural language image search ImageDB
  • Feature: LoRA add OMI format support for SD35/FLUX.1, on-hold
  • Feature: Multi-user support
  • Feature: Remote Text-Encoder support, sidelined for the moment
  • Feature: Settings profile manager
  • Feature: Video tab add full API support
  • Refactor: Unify huggingface and diffusers model folders
  • Refactor: Move nunchaku models to refernce instead of internal decision, owner @CalamitousFelicitousness
  • Refactor: GGUF
  • Refactor: move sampler options to settings to config
  • Refactor: remove CodeFormer, owner @CalamitousFelicitousness
  • Refactor: remove GFPGAN, owner @CalamitousFelicitousness
  • Reimplement llama remover for Kanvas, pending end-to-end review of Kanvas

Modular

Pending finalization of modular pipelines implementation and development of compatibility layer

New models / Pipelines

TODO: Investigate which models are diffusers-compatible and prioritize!

Image-Base

  • Chroma Zeta: Image and video generator for creative effects and professional filters
  • Chroma Radiance: Pixel-space model eliminating VAE artifacts for high visual fidelity
  • Liquid: Unified vision-language auto-regressive generation paradigm
  • Lumina-DiMOO: Foundational multi-modal generation and understanding via discrete diffusion
  • nVidia Cosmos-Predict-2.5: Physics-aware world foundation model for consistent scene prediction
  • Liquid (unified multimodal generator): Auto-regressive generation paradigm across vision and language
  • Lumina-DiMOO: foundational multi-modal multi-task generation and understanding

Image-Edit

  • Meituan LongCat-Image-Edit-Turbo:6B instruction-following image editing with high visual consistency
  • VIBE Image-Edit: (Sana+Qwen-VL)Fast visual instruction-based image editing framework
  • LucyEdit:Instruction-guided video editing while preserving motion and identity
  • Step1X-Edit:Multimodal image editing decoding MLLM tokens via DiT
  • OneReward:Reinforcement learning grounded generative reward model for image editing
  • ByteDance DreamO: image customization framework for IP adaptation and virtual try-on

Video

Other/Unsorted

  • DiffusionForcing: Full-sequence diffusion with autoregressive next-token prediction
  • Self-Forcing: Framework for improving temporal consistency in long-horizon video generation
  • SEVA: Stable Virtual Camera for novel view synthesis and 3D-consistent video
  • ByteDance USO: Unified Style-Subject Optimized framework for personalized image generation
  • ByteDance Lynx: State-of-the-art high-fidelity personalized video generation based on DiT
  • LanDiff: Coarse-to-fine text-to-video integrating Language and Diffusion Models
  • Video Inpaint Pipeline: Unified inpainting pipeline implementation within Diffusers library
  • Sonic Inpaint: Audio-driven portrait animation system focus on global audio perception
  • Make-It-Count: CountGen method for precise numerical control of objects via object identity features
  • ControlNeXt: Lightweight architecture for efficient controllable image and video generation
  • MS-Diffusion: Layout-guided multi-subject image personalization framework
  • UniRef: Unified model for segmentation tasks designed as foundation model plug-in
  • FlashFace: High-fidelity human image customization and face swapping framework
  • ReNO: Reward-based Noise Optimization to improve text-to-image quality during inference

Not Planned

Migration

Asyncio

rmtree

  • onerror deprecated and replaced with onexc in Python 3.12
    def excRemoveReadonly(func, path, exc: BaseException):
        import stat
        shared.log.debug(f'Exception during cleanup: {func} {path} {type(exc).__name__}')
        if func in (os.rmdir, os.remove, os.unlink) and isinstance(exc, PermissionError):
            shared.log.debug(f'Retrying cleanup: {path}')
            os.chmod(path, stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO)
            func(path)
    # ...
      try:
          shutil.rmtree(found.path, ignore_errors=False, onexc=excRemoveReadonly)

Code TODO

npm run todo

  • fc: autodetect distilled based on model
  • fc: autodetect tensor format based on model
  • hypertile: vae breaks when using non-standard sizes
  • install: switch to pytorch source when it becomes available
  • loader: load receipe
  • loader: save receipe
  • lora: add other quantization types
  • lora: add t5 key support for sd35/f1
  • lora: maybe force imediate quantization
  • model load: force-reloading entire model as loading transformers only leads to massive memory usage
  • model load: implement model in-memory caching
  • modernui: monkey-patch for missing tabs.select event
  • modules/lora/lora_extract.py:188:9: W0511: TODO: lora: support pre-quantized flux
  • modules/modular_guiders.py:65:58: W0511: TODO: guiders
  • processing: remove duplicate mask params
  • resize image: enable full VAE mode for resize-latent