mirror of https://github.com/vladmandic/automatic
11 KiB
11 KiB
TODO
Internal
- Update:
transformers==5.0.0, owner @CalamitousFelicitousness - Deploy: Create executable for SD.Next
- Deploy: Lite vs Expert mode
- Engine: mmgp
- Engine:
TensorRTacceleration - Feature: Auto handle scheduler
prediction_type - Feature: Cache models in memory
- Validate: Control tab add overrides handling
- Feature: Integrate natural language image search ImageDB
- Feature: Multi-user support
- Feature: Settings profile manager
- Feature: Video tab add full API support
- Refactor: Unify huggingface and diffusers model folders
- Refactor: GGUF
- Refactor: move sampler options from settings to config
- Reimplement
llamaremover for Kanvas, pending end-to-end review ofKanvas
OnHold
- Feature: LoRA add OMI format support for SD35/FLUX.1, on-hold
- Feature: Remote Text-Encoder support, sidelined for the moment
Modular
Pending finalization of modular pipelines implementation and development of compatibility layer
- Switch to modular pipelines
- Feature: Transformers unified cache handler
- Refactor: Modular pipelines and guiders
- MagCache
- SmoothCache
- STG
New models / Pipelines
TODO: Investigate which models are diffusers-compatible and prioritize!
Image-Base
- Chroma Zeta: Image and video generator for creative effects and professional filters
- Chroma Radiance: Pixel-space model eliminating VAE artifacts for high visual fidelity
- Liquid: Unified vision-language auto-regressive generation paradigm
- Lumina-DiMOO: Foundational multi-modal generation and understanding via discrete diffusion
- nVidia Cosmos-Predict-2.5: Physics-aware world foundation model for consistent scene prediction
- Liquid (unified multimodal generator): Auto-regressive generation paradigm across vision and language
- Lumina-DiMOO: foundational multi-modal multi-task generation and understanding
Image-Edit
- Meituan LongCat-Image-Edit-Turbo:6B instruction-following image editing with high visual consistency
- VIBE Image-Edit: (Sana+Qwen-VL)Fast visual instruction-based image editing framework
- LucyEdit:Instruction-guided video editing while preserving motion and identity
- Step1X-Edit:Multimodal image editing decoding MLLM tokens via DiT
- OneReward:Reinforcement learning grounded generative reward model for image editing
- ByteDance DreamO: image customization framework for IP adaptation and virtual try-on
Video
- OpenMOSS MOVA: Unified foundation model for synchronized high-fidelity video and audio
- Wan family (Wan2.1 / Wan2.2 variants): MoE-based foundational tools for cinematic T2V/I2V/TI2V example: Wan2.1-T2V-14B-CausVid distill / step-distill examples: Wan2.1-StepDistill-CfgDistill
- Krea Realtime Video: (Wan2.1)Distilled real-time video diffusion using self-forcing techniques
- MAGI-1 (autoregressive video): Autoregressive video generation allowing infinite and timeline control
- MUG-V 10B (video generation): large-scale DiT-based video generation system trained via flow-matching
- Ovi (audio/video generation): (Wan2.2)Speech-to-video with synchronized sound effects and music
- HunyuanVideo-Avatar / HunyuanCustom: (HunyuanVideo)MM-DiT based dynamic emotion-controllable dialogue generation
- Sana Image→Video (Sana-I2V): (Sana)Compact Linear DiT framework for efficient high-resolution video
- Wan-2.2 S2V (diffusers PR): (Wan2.2)Audio-driven cinematic speech-to-video generation
- LongCat-Video: Unified framework for minutes-long coherent video generation via Block Sparse Attention
- LTXVideo / LTXVideo LongMulti (diffusers PR): Real-time DiT-based generation with production-ready camera controls
- DiffSynth-Studio (ModelScope): (Wan2.2)Comprehensive training and quantization tools for Wan video models
- Phantom (Phantom HuMo): Human-centric video generation framework focus on subject ID consistency
- CausVid-Plus / WAN-CausVid-Plus: (Wan2.1)Causal diffusion for high-quality temporally consistent long videos
- Wan2GP (workflow/GUI for Wan): (Wan)Web-based UI focused on running complex video models for GPU-poor setups
- LivePortrait: Efficient portrait animation system with high stitching and retargeting control
- Magi (SandAI): High-quality autoregressive video generation framework
- Ming (inclusionAI): Unified multimodal model for processing text, audio, image, and video
Other/Unsorted
- DiffusionForcing: Full-sequence diffusion with autoregressive next-token prediction
- Self-Forcing: Framework for improving temporal consistency in long-horizon video generation
- SEVA: Stable Virtual Camera for novel view synthesis and 3D-consistent video
- ByteDance USO: Unified Style-Subject Optimized framework for personalized image generation
- ByteDance Lynx: State-of-the-art high-fidelity personalized video generation based on DiT
- LanDiff: Coarse-to-fine text-to-video integrating Language and Diffusion Models
- Video Inpaint Pipeline: Unified inpainting pipeline implementation within Diffusers library
- Sonic Inpaint: Audio-driven portrait animation system focus on global audio perception
- Make-It-Count: CountGen method for precise numerical control of objects via object identity features
- ControlNeXt: Lightweight architecture for efficient controllable image and video generation
- MS-Diffusion: Layout-guided multi-subject image personalization framework
- UniRef: Unified model for segmentation tasks designed as foundation model plug-in
- FlashFace: High-fidelity human image customization and face swapping framework
- ReNO: Reward-based Noise Optimization to improve text-to-image quality during inference
Not Planned
- Bria FIBO: Fully JSON based
- Bria FiboEdit: Fully JSON based
- LoRAdapter: Not recently updated
- SD3 UltraEdit: Based on SD3
- PowerPaint: Based on SD15
- FreeCustom: Based on SD15
- AnyDoor: Based on SD21
- AnyText2: Based on SD15
- DragonDiffusion: Based on SD15
- DenseDiffusion: Based on SD15
- IC-Light: Based on SD15
Migration
Asyncio
- Policy system is deprecated and will be removed in Python 3.16
Python 3.14 removalsasyncio
https://docs.python.org/3.14/library/asyncio-policy.html
Affected files:
webui.pycli/sdapi.pyMigration: asyncio.run asyncio.Runner
rmtree
onerrordeprecated and replaced withonexcin Python 3.12
def excRemoveReadonly(func, path, exc: BaseException):
import stat
shared.log.debug(f'Exception during cleanup: {func} {path} {type(exc).__name__}')
if func in (os.rmdir, os.remove, os.unlink) and isinstance(exc, PermissionError):
shared.log.debug(f'Retrying cleanup: {path}')
os.chmod(path, stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO)
func(path)
# ...
try:
shutil.rmtree(found.path, ignore_errors=False, onexc=excRemoveReadonly)
Code TODO
npm run todo
- fc: autodetect distilled based on model
- fc: autodetect tensor format based on model
- hypertile: vae breaks when using non-standard sizes
- install: switch to pytorch source when it becomes available
- loader: load receipe
- loader: save receipe
- lora: add other quantization types
- lora: add t5 key support for sd35/f1
- lora: maybe force imediate quantization
- model load: force-reloading entire model as loading transformers only leads to massive memory usage
- model load: implement model in-memory caching
- modernui: monkey-patch for missing tabs.select event
- modules/lora/lora_extract.py:188:9: W0511: TODO: lora: support pre-quantized flux
- modules/modular_guiders.py:65:58: W0511: TODO: guiders
- processing: remove duplicate mask params
- resize image: enable full VAE mode for resize-latent