11 KiB

Raw Blame History

TODO

Internal

Update: transformers==5.0.0, owner @CalamitousFelicitousness
Deploy: Create executable for SD.Next
Deploy: Lite vs Expert mode
Engine: mmgp
Engine: sharpfin instead of torchvision
Engine: TensorRT acceleration
Feature: Auto handle scheduler prediction_type
Feature: Cache models in memory
Feature: Control tab add overrides handling
Feature: Integrate natural language image search ImageDB
Feature: LoRA add OMI format support for SD35/FLUX.1, on-hold
Feature: Multi-user support
Feature: Remote Text-Encoder support, sidelined for the moment
Feature: Settings profile manager
Feature: Video tab add full API support
Refactor: Unify huggingface and diffusers model folders
Refactor: Move nunchaku models to refernce instead of internal decision, owner @CalamitousFelicitousness
Refactor: GGUF
Refactor: move sampler options to settings to config
Refactor: remove CodeFormer, owner @CalamitousFelicitousness
Refactor: remove GFPGAN, owner @CalamitousFelicitousness
Reimplement llama remover for Kanvas, pending end-to-end review of Kanvas

Modular

Pending finalization of modular pipelines implementation and development of compatibility layer

Switch to modular pipelines
Feature: Transformers unified cache handler
Refactor: Modular pipelines and guiders
MagCache
SmoothCache
STG

New models / Pipelines

TODO: Investigate which models are diffusers-compatible and prioritize!

Image-Base

Chroma Zeta: Image and video generator for creative effects and professional filters
Chroma Radiance: Pixel-space model eliminating VAE artifacts for high visual fidelity
Liquid: Unified vision-language auto-regressive generation paradigm
Lumina-DiMOO: Foundational multi-modal generation and understanding via discrete diffusion
nVidia Cosmos-Predict-2.5: Physics-aware world foundation model for consistent scene prediction
Liquid (unified multimodal generator): Auto-regressive generation paradigm across vision and language
Lumina-DiMOO: foundational multi-modal multi-task generation and understanding

Image-Edit

Meituan LongCat-Image-Edit-Turbo:6B instruction-following image editing with high visual consistency
VIBE Image-Edit: (Sana+Qwen-VL)Fast visual instruction-based image editing framework
LucyEdit:Instruction-guided video editing while preserving motion and identity
Step1X-Edit:Multimodal image editing decoding MLLM tokens via DiT
OneReward:Reinforcement learning grounded generative reward model for image editing
ByteDance DreamO: image customization framework for IP adaptation and virtual try-on

Video

OpenMOSS MOVA: Unified foundation model for synchronized high-fidelity video and audio
Wan family (Wan2.1 / Wan2.2 variants): MoE-based foundational tools for cinematic T2V/I2V/TI2V example: Wan2.1-T2V-14B-CausVid distill / step-distill examples: Wan2.1-StepDistill-CfgDistill
Krea Realtime Video: (Wan2.1)Distilled real-time video diffusion using self-forcing techniques
MAGI-1 (autoregressive video): Autoregressive video generation allowing infinite and timeline control
MUG-V 10B (video generation): large-scale DiT-based video generation system trained via flow-matching
Ovi (audio/video generation): (Wan2.2)Speech-to-video with synchronized sound effects and music
HunyuanVideo-Avatar / HunyuanCustom: (HunyuanVideo)MM-DiT based dynamic emotion-controllable dialogue generation
Sana Image→Video (Sana-I2V): (Sana)Compact Linear DiT framework for efficient high-resolution video
Wan-2.2 S2V (diffusers PR): (Wan2.2)Audio-driven cinematic speech-to-video generation
LongCat-Video: Unified framework for minutes-long coherent video generation via Block Sparse Attention
LTXVideo / LTXVideo LongMulti (diffusers PR): Real-time DiT-based generation with production-ready camera controls
DiffSynth-Studio (ModelScope): (Wan2.2)Comprehensive training and quantization tools for Wan video models
Phantom (Phantom HuMo): Human-centric video generation framework focus on subject ID consistency
CausVid-Plus / WAN-CausVid-Plus: (Wan2.1)Causal diffusion for high-quality temporally consistent long videos
Wan2GP (workflow/GUI for Wan): (Wan)Web-based UI focused on running complex video models for GPU-poor setups
LivePortrait: Efficient portrait animation system with high stitching and retargeting control
Magi (SandAI): High-quality autoregressive video generation framework
Ming (inclusionAI): Unified multimodal model for processing text, audio, image, and video

Other/Unsorted

DiffusionForcing: Full-sequence diffusion with autoregressive next-token prediction
Self-Forcing: Framework for improving temporal consistency in long-horizon video generation
SEVA: Stable Virtual Camera for novel view synthesis and 3D-consistent video
ByteDance USO: Unified Style-Subject Optimized framework for personalized image generation
ByteDance Lynx: State-of-the-art high-fidelity personalized video generation based on DiT
LanDiff: Coarse-to-fine text-to-video integrating Language and Diffusion Models
Video Inpaint Pipeline: Unified inpainting pipeline implementation within Diffusers library
Sonic Inpaint: Audio-driven portrait animation system focus on global audio perception
Make-It-Count: CountGen method for precise numerical control of objects via object identity features
ControlNeXt: Lightweight architecture for efficient controllable image and video generation
MS-Diffusion: Layout-guided multi-subject image personalization framework
UniRef: Unified model for segmentation tasks designed as foundation model plug-in
FlashFace: High-fidelity human image customization and face swapping framework
ReNO: Reward-based Noise Optimization to improve text-to-image quality during inference

Not Planned

Bria FIBO: Fully JSON based
Bria FiboEdit: Fully JSON based
LoRAdapter: Not recently updated
SD3 UltraEdit: Based on SD3
PowerPaint: Based on SD15
FreeCustom: Based on SD15
AnyDoor: Based on SD21
AnyText2: Based on SD15
DragonDiffusion: Based on SD15
DenseDiffusion: Based on SD15
IC-Light: Based on SD15

Migration

Asyncio

Policy system is deprecated and will be removed in Python 3.16 Python 3.14 removalsasyncio https://docs.python.org/3.14/library/asyncio-policy.html Affected files: webui.py cli/sdapi.py Migration: asyncio.run asyncio.Runner

rmtree

onerror deprecated and replaced with onexc in Python 3.12

    def excRemoveReadonly(func, path, exc: BaseException):
        import stat
        shared.log.debug(f'Exception during cleanup: {func} {path} {type(exc).__name__}')
        if func in (os.rmdir, os.remove, os.unlink) and isinstance(exc, PermissionError):
            shared.log.debug(f'Retrying cleanup: {path}')
            os.chmod(path, stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO)
            func(path)
    # ...
      try:
          shutil.rmtree(found.path, ignore_errors=False, onexc=excRemoveReadonly)

Code TODO

npm run todo

fc: autodetect distilled based on model
fc: autodetect tensor format based on model
hypertile: vae breaks when using non-standard sizes
install: switch to pytorch source when it becomes available
loader: load receipe
loader: save receipe
lora: add other quantization types
lora: add t5 key support for sd35/f1
lora: maybe force imediate quantization
model load: force-reloading entire model as loading transformers only leads to massive memory usage
model load: implement model in-memory caching
modernui: monkey-patch for missing tabs.select event
modules/lora/lora_extract.py:188:9: W0511: TODO: lora: support pre-quantized flux
modules/modular_guiders.py:65:58: W0511: TODO: guiders
processing: remove duplicate mask params
resize image: enable full VAE mode for resize-latent

11 KiB Raw Blame History

TODO

Internal

Modular

New models / Pipelines

Image-Base

Image-Edit

Video

Other/Unsorted

Not Planned

Migration

Asyncio

rmtree

Code TODO

11 KiB

Raw Blame History