mirror of https://github.com/vladmandic/automatic
7.9 KiB
7.9 KiB
TODO
Internal
Project board: https://github.com/users/vladmandic/projects
- Update:
transformers==5.0.0 - Deploy: Create executable for SD.Next
- Deploy: Lite vs Expert mode
- Engine: mmgp
- Engine: sharpfin instead of
torchvision - Engine:
TensorRTacceleration - Feature: Auto handle scheduler
prediction_type - Feature: Cache models in memory
- Feature: Control tab: add overrides handling
- Feature: Integrate natural language image search ImageDB
- Feature: LoRA add OMI format support for SD35/FLUX.1
- Feature: Multi-user support
- Feature: Remote Text-Encoder support
- Feature: Settings profile manager
- Feature: Video tab: add full API support
- Refactor: Unify huggingface and diffusers model folders
- Refactor: Move
nunchakumodels to refernce instead of internal decision - Refactor: GGUF
- Refactor: move sampler options to settings to config
- Refactor: remove
CodeFormer - Refactor: remove
GFPGAN - Reimplement
llamaremover for Kanvas
Modular
- Switch to modular pipelines
- Feature: Transformers unified cache handler
- Refactor: Modular pipelines and guiders
New models / Pipelines
TODO: Investigate which models are diffusers-compatible and prioritize!
Text-to-Image
Image-to-Image
- Meituan LongCat-Image-Edit-Turbo
- VIBE Image-Edit
- Bria FiboEdit (diffusers)
- LucyEdit (diffusers PR)
- SD3 UltraEdit
- Step1X-Edit
- OneReward (mask-guided / object removal)
Text-to-Video
- OpenMOSS MOVA
- Wan family (Wan2.1 / Wan2.2 variants)
- example: Wan2.1-T2V-14B-CausVid
- distill / step-distill examples: Wan2.1-StepDistill-CfgDistill
- Krea Realtime Video
- MAGI-1 (autoregressive video)
- MUG-V 10B (video generation)
- Ovi (audio/video generation)
Image-to-Video
- MUG-V 10B
- HunyuanVideo-Avatar / HunyuanCustom
- Sana Image→Video (Sana-I2V)
- Wan-2.2 S2V (diffusers PR)
Video Editing / Long-Video / Animation Tooling
- LongCat-Video
- LTXVideo / LTXVideo LongMulti (diffusers PR)
- DiffSynth-Studio (ModelScope)
- Phantom (Phantom HuMo)
- CausVid-Plus / WAN-CausVid-Plus
- Wan2GP (workflow/GUI for Wan)
Multimodal
- Cosmos-Predict-2.5 (NVIDIA)
- Liquid (unified multimodal generator)
- Lumina-DiMOO
- Ming (inclusionAI)
- Magi (SandAI)
- DreamO (ByteDance) — image customization framework
Other/Unsorted
- DiffusionForcing
- Self-Forcing
- SEVA
- ByteDance USO
- ByteDance Lynx
- LanDiff
- MagCache
- SmoothCache
- STG
- Video Inpaint Pipeline
- Sonic Inpaint
- BoxDiff
- Make-It-Count
- FreeCustom
- ControlNeXt
- MS-Diffusion
- UniRef
- AnyDoor
- AnyText2
- DragonDiffusion
- DenseDiffusion
- FlashFace
- PowerPaint
- IC-Light
- ReNO
- LoRAdapter
- LivePortrait
Migration
Asyncio
- Policy system is deprecated and will be removed in Python 3.16
rmtree
onerrordeprecated and replaced withonexcin Python 3.12
def excRemoveReadonly(func, path, exc: BaseException):
import stat
shared.log.debug(f'Exception during cleanup: {func} {path} {type(exc).__name__}')
if func in (os.rmdir, os.remove, os.unlink) and isinstance(exc, PermissionError):
shared.log.debug(f'Retrying cleanup: {path}')
os.chmod(path, stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO)
func(path)
# ...
try:
shutil.rmtree(found.path, ignore_errors=False, onexc=excRemoveReadonly)
Code TODO
npm run todo
- fc: autodetect distilled based on model
- fc: autodetect tensor format based on model
- hypertile: vae breaks when using non-standard sizes
- install: switch to pytorch source when it becomes available
- loader: load receipe
- loader: save receipe
- lora: add other quantization types
- lora: add t5 key support for sd35/f1
- lora: maybe force imediate quantization
- model load: force-reloading entire model as loading transformers only leads to massive memory usage
- model load: implement model in-memory caching
- modernui: monkey-patch for missing tabs.select event
- modules/lora/lora_extract.py:188:9: W0511: TODO: lora: support pre-quantized flux
- modules/modular_guiders.py:65:58: W0511: TODO: guiders
- processing: remove duplicate mask params
- resize image: enable full VAE mode for resize-latent