Implement the Lightricks two-stage recipe (diffusers PR #13217) for the
LTX-2.x Dev family: Stage 1 at half-res with full four-way guidance,
2x latent upsample, Stage 2 with distilled LoRA + scheduler swap + identity
guidance on STAGE_2_DISTILLED_SIGMA_VALUES.
Extends to both LTX-2.0 and LTX-2.3 Dev via per-family distilled-LoRA
repos carried on the caps; Distilled variants take the same flow minus
the LoRA swap. Auto-couples Refine with a fixed 2x upsample on any Dev
variant with a known LoRA when the user enables Refine without Upsample.
- caps: is_ltx_2_3, use_cross_timestep, default_dynamic_shift,
stage2_dev_lora_repo, supports_canonical_stage2, modality_default_scale,
guidance_rescale_default; LTX-2.x defaults realigned to canonical
cfg=3.0 / steps=30; per-variant STG block and four-way guidance wired
for non-distilled 2.x
- process: canonical Stage 1/Stage 2 helpers, scheduler + opts snapshot
under try/finally, per-family upsampler repo, audio latents threaded
from Stage 1 into Stage 2, use_cross_timestep gated per caps
- overrides: skip the redundant unsharded LTX-2.3 connectors blob and
share LTX2TextConnectors weights across 2.3 variants when te_shared_t5
- load: Gemma3 shared-TE path for LTX-2.3; gate use_dynamic_shifting=False
override to 0.9.x only so LTX-2.x stays on its canonical token-count
dynamic shift
Rework the LTX Video tab so one UI handles every registered variant
(0.9.0 through 2.3, Dev/Distilled/SDNQ-4Bit, T2V/I2V/Condition). Per-
variant behavior is driven from a single capability lookup rather than
substring matching on model names scattered across the backend.
- modules/ltx/ltx_capabilities.py: new module computing family, is_i2v,
distilled, supports_input_media, supports_multi_condition,
supports_image_cond_noise_scale, supports_decode_timestep,
supports_stg, supports_audio, supports_frame_rate_kwarg, and the
default CFG / steps / sampler_shift for a given model name by reading
its registered repo_cls in models_def.
- modules/ltx/ltx_ui.py: capability-gated UI. Selecting a model rewires
accordion visibility, slider interactivity, and defaults via a single
model.change handler. New controls: dedicated image input slot inside
the LTX tab (replaces the disconnected shared init_image for I2V),
condition strength slider, CFG / sampler shift / dynamic shift
sliders that were previously unreachable. Input media accordion
restructured so the image slot is always-visible while the video /
gallery prefix tabs only appear on Condition pipelines.
- modules/ltx/ltx_process.py: route the base pass through
processing.process_images(p) so LTX inherits standard scheduler
wiring, extra_networks activation, VAE handling, and error plumbing
from StableDiffusionProcessingVideo. The multi-pass latent path
(upsample / refine) stays on direct pipeline calls for latent
re-entry. Refine noise control gets family-specific kwargs:
denoise_strength for 0.9.x LTXConditionPipeline, noise_scale for all
2.x pipelines; the prior strength= injection crashed on 2.x and only
affected conditioning intensity on 0.9.x. Add torch_gc between every
stage boundary (base to upsample to refine to vae decode) so the
CUDA allocator cache does not retain the prior pass's allocations
across stages. Remove the TypeError fallback that silently passed
raw latents to save_video when VAE decode returned None on OOM;
those errors now surface cleanly.
- modules/ltx/ltx_util.py: get_conditions grows a family parameter and
builds LTX2VideoCondition (frames, index, strength) for 2.x or
LTXVideoCondition (image, video, frame_index, strength) for 0.9.x.
get_bucket floors to max(32, vae_spatial_compression_ratio) since LTX
pipelines validate divisibility by 32 regardless of family.
- modules/video_models/video_overrides.py: extend the I2V generator
reset to cover LTX2ImageToVideoPipeline and both Condition classes.
Keep the strength= kwarg injection gated to 0.9.x
LTXConditionPipeline only; LTX2ConditionPipeline.__call__ does not
accept it (per-condition strength lives on the LTX2VideoCondition
dataclass instead).