automatic/TODO.md

9.6 KiB
Raw Permalink Blame History

TODO

Internal

  • Feature: Gallery thumb-size, quick delete/download/info, disable delete on Reference
  • Feature: Diffusers TextKVCache
  • Feature: Chat-based interface
  • Feature: Add cloud providers
  • Feature: Multi-image inputs
  • Feature: RIFE update
  • Feature: RIFE in processing
  • Feature: SeedVR2 in processing
  • Feature: Add video models to Reference
  • Feature: Add https://huggingface.co/briaai/RMBG-2.0 to REMBG
  • Deploy: Lite vs Expert mode
  • Engine: TensorRT acceleration
  • Feature: Auto handle scheduler prediction_type
  • Feature: Cache models in memory
  • Feature: JSON image metadata
  • Feature: Integrate natural language image search ImageDB
  • Feature: Multi-user support
  • Feature: Settings profile manager
  • Feature: Video tab add full API support
  • Validate: Control tab add overrides handling
  • Refactor: Unify huggingface and diffusers model folders
  • Refactor: GGUF
  • Reimplement llama remover for Kanvas
  • Integrate: Depth3D

OnHold

  • Feature: LoRA add OMI format support for SD35/FLUX.1, on-hold
  • Feature: Remote Text-Encoder support, sidelined for the moment

Modular

Pending finalization of modular pipelines implementation and development of compatibility layer

New models / Pipelines

TODO: Investigate which models are diffusers-compatible and prioritize!

Image

Video

Other/Unsorted

  • ByteDance DreamO
    • Unified image customization framework combining face identity preservation, virtual try-on, style transfer, etc.
    • Created: 2025-05 | Updated: 2025-08 | Stars: 1,700
  • ControlNeXt
    • Lightweight controllable generation framework for images and videos (SD1.5, SDXL, SVD) that uses up to 90% fewer trainable parameters than ControlNet
    • Created: 2024-08 | Updated: 2024-08 | Stars: 1,600
  • ByteDance USO
    • Unified model for both style-transfer and subject-driven image generation from one or two reference images
    • Created: 2025-08 | Updated: 2025-09 | Stars: 1,200
  • TwinFlow
    • Distillation technique that converts large image generation models into 12 step generators without requiring a separate teacher model
    • Created: 2025-12 | Updated: 2026-02 | Stars: 506
  • FlashFace
    • Zero-shot face personalization method that generates images of a specific person from one or a few reference photos
    • Created: 2024-03 | Updated: 2024-05 | Stars: 436
  • DiffSynth-Engine
    • Alternative to diffusers library that unlocks some diffsynth specific capabilities
    • Created: 2024-05 | Updated: 2026-03 | Stars: 393
  • MS-Diffusion
    • Multi-subject image personalization framework that uses layout guidance to place multiple reference subjects in a single generated image without identity confusion
    • Created: 2024-04 | Updated: 2025-07 | Stars: 309
  • RamTorch
    • Alternative memory management and offloading library
    • Created: 2025-09 | Updated: 2026-04 | Stars: 266
  • UniRef
    • Unified segmentation model that handles referring image segmentation and few-shot segmentation
    • Created: 2023-04 | Updated: 2025-04 | Stars: 238
  • FreeFuse
    • Training-free method to combine multiple subject LoRAs in one image generation without conflicts, by automatically routing each LoRA's influence to its target spatial region.
    • Created: 2026-01 | Updated: 2026-03 | Stars: 178
  • mmgp
    • Alternative memory management and offloading library
    • Created: 2024-03 | Updated: 2026-02 | Stars: 175
  • ReNO
    • Inference-time technique that improves one-step text-to-image models by iteratively optimizing the initial noise using reward model signals, boosting prompt accuracy in 2050 seconds
    • Created: 2024-06 | Updated: 2025-09 | Stars: 166
  • RegionE
    • Speeds up instruction-based image editing by skipping redundant computation in image regions that are not being changed.
    • Created: 2025-10 | Updated: 2026-02 | Stars: 98
  • Make-It-Count
    • Method that reliably generates the exact number of objects requested by tracking instance identities during denoising
    • Created: 2024-04 | Updated: 2025-04 | Stars: 96
  • FaceClip
    • Identity-preserving image generation model that jointly encodes a face and a text prompt into a shared embedding to produce portraits matching both the subject's appearance and the scene description
    • Created: 2025-04 | Updated: 2025-04 | Likes: 88
  • T5Gemma Adapter
    • Experiment that replaces the SDXL text encoder with a T5Gemma LLM via a trained adapter for richer prompt understanding
    • Created: 2025-07 | Updated: 2025-10 | Stars: 67
  • Sonic Inpaint
    • Image inpainting method that optimizes for better masked-region filling
    • Created: 2025-11 | Updated: 2026-01 | Stars: 23
  • SEVA
    • Model that generates novel-view images of a scene from a single input photo.
    • Created: 2025-04 | Updated: 2025-06 | Stars: N/A (draft PR)
  • Bria FIBO RMBG
    • Background removal model trained on Bria FIBO dataset
    • Created: 2025-08 | Updated: 2025-09 | Stars: N/A (private model)

Code TODO

npm run todo

installer.py:642:15: W0511: TODO rocm: switch to pytorch source when it becomes available (fixme)
modules/transformer_cache.py:29:61: W0511: TODO fc: autodetect tensor format based on model (fixme)
modules/transformer_cache.py:30:50: W0511: TODO fc: autodetect distilled based on model (fixme)
modules/processing_class.py:404:32: W0511: TODO processing: remove duplicate mask params (fixme)
modules/sd_samplers_diffusers.py:355:31: W0511: TODO enso-required (fixme)
modules/sd_models.py:1356:5: W0511: TODO model load: implement model in-memory caching (fixme)
modules/ui_models_load.py:257:5: W0511: TODO loader: load receipe (fixme)
modules/ui_models_load.py:264:5: W0511: TODO loader: save receipe (fixme)
modules/sd_hijack_hypertile.py:123:17: W0511: TODO hypertile: vae breaks when using non-standard sizes (fixme)
modules/sd_unet.py:77:39: W0511: TODO model load: force-reloading entire model as loading transformers only leads to massive memory usage (fixme)
modules/modular_guiders.py:66:51: W0511: TODO: guiders (fixme)