Commit Graph

2161 Commits (master)

Author SHA1 Message Date
Jukka Seppänen 084e08c6e2
Disable sageattention for SAM3 (#13529)
Causes Nans
2026-04-23 11:14:42 -07:00
Jukka Seppänen 749d5b4e8d
feat: SAM (segment anything) 3.1 support (CORE-34) (#13408) 2026-04-23 00:07:43 -04:00
rattus ec4b1659ab
ModelPatcherDynamic: force cast stray weights on comfy layers (#13487)
the mixed_precision ops can have input_scale parameters that are used
in tensor math but arent a weight or bias so dont get proper VRAM
management. Treat these as force-castable parameters like the non comfy
weight, random params are buffers already are.
2026-04-22 18:13:38 -04:00
blepping 9949c19c63
Derive InterruptProcessingException from BaseException (#13523) 2026-04-22 18:08:19 -04:00
Octopus cc6f9500a1
fix: use Parameter assignment for Stable_Zero123 cc_projection weights (fixes #13492) (#13518)
On Windows with aimdo enabled, disable_weight_init.Linear uses lazy
initialization that sets weight and bias to None to avoid unnecessary
memory allocation. This caused a crash when copy_() was called on the
None weight attribute in Stable_Zero123.__init__.

Replace copy_() with direct torch.nn.Parameter assignment, which works
correctly on both Windows (aimdo enabled) and other platforms.
2026-04-22 15:05:43 -07:00
Jukka Seppänen eb22225387
Support standalone LTXV audio VAEs (#13499) 2026-04-21 10:46:37 -07:00
comfyanonymous ad94d47221
Make the ltx audio vae more native. (#13486) 2026-04-21 11:02:42 -04:00
comfyanonymous 3d816db07f
Some optimizations to make Ernie inference a bit faster. (#13472) 2026-04-18 23:02:29 -04:00
Jukka Seppänen b9dedea57d
feat: SUPIR model support (CORE-17) (#13250) 2026-04-18 23:02:01 -04:00
Bedovyy b41ab53b6f
Use `ErnieTEModel_` not `ErnieTEModel`. (#13431) 2026-04-16 10:11:58 -04:00
Jun Yamog 1de83f91c3
Fix OOM regression in _apply() for quantized models during inference (#13372)
Skip unnecessary clone of inference-mode tensors when already inside
torch.inference_mode(), matching the existing guard in set_attr_param.
The unconditional clone introduced in 20561aa9 caused transient VRAM
doubling during model movement for FP8/quantized models.
2026-04-15 02:10:36 -07:00
comfyanonymous cb0bbde402
Fix ernie on devices that don't support fp64. (#13414) 2026-04-14 22:54:47 -04:00
comfyanonymous 722bc73319
Make text generation work with ministral model. (#13395)
Needs template before it works properly.
2026-04-13 20:43:57 -04:00
comfyanonymous 402ff1cdb7
Fix issue with ernie image. (#13393) 2026-04-13 16:38:42 -04:00
comfyanonymous c2657d5fb9
Fix typo. (#13382) 2026-04-12 23:37:13 -04:00
comfyanonymous 31283d2892
Implement Ernie Image model. (#13369) 2026-04-11 22:29:31 -04:00
comfyanonymous 55ebd287ee
Add a supports_fp64 function. (#13368) 2026-04-11 21:06:36 -04:00
Jukka Seppänen a134423890
SDPose: resize input always (#13349) 2026-04-10 11:26:55 -10:00
huemin b615af1c65
Add support for small flux.2 decoder (#13314) 2026-04-07 03:44:18 -04:00
comfyanonymous 40862c0776
Support Ace Step 1.5 XL model. (#13317) 2026-04-07 03:13:47 -04:00
comfyanonymous 0c63b4f6e3
Remove dead code. (#13251) 2026-04-01 20:22:06 -04:00
comfyanonymous e2ddf28d78
Fix some fp8 scaled checkpoints no longer working. (#13239) 2026-03-31 14:27:17 -07:00
rattus 8d723d2caa
Fix/tweak pinned memory accounting (#13221)
* mm: Lower windows pin threshold

Some workflows have more extranous use of shared GPU memory than is
accounted for in the 5% pin headroom. Lower this for safety.

* mm: Remove pin count clearing threshold.

TOTAL_PINNED_MEMORY is shared between the legacy and aimdo pinning
systems, however this catch-all assumes only the legacy system exists.
Remove the catch-all as the PINNED_MEMORY buffer is coherent already.
2026-03-29 16:43:24 -07:00
Jukka Seppänen a500f1edac
CORE-13 feat: Support RT-DETRv4 detection model (#12748) 2026-03-28 23:34:10 -04:00
comfyanonymous 3f77450ef1
Fix #13214 (#13216) 2026-03-28 22:35:59 -04:00
rattus b353a7c863
Integrate RAM cache with model RAM management (#13173) 2026-03-27 21:34:16 -04:00
comfyanonymous 3a56201da5
Allow flux conditioning without a pooled output. (#13198) 2026-03-27 20:36:26 -04:00
Jukka Seppänen b0fd65e884
fix: regression in text generate with LTXAV model (#13170) 2026-03-26 09:55:05 -07:00
comfyanonymous 2a1f402601
Make Qwen 8B work with TextGenerate node. (#13160) 2026-03-25 23:21:44 -04:00
Jukka Seppänen 404d7b9978
feat: Support Qwen3.5 text generation models (#12771) 2026-03-25 22:48:28 -04:00
Kohaku-Blueleaf 5ebb0c2e0b
FP8 bwd training (#13121) 2026-03-24 20:39:04 -04:00
Jukka Seppänen e87858e974
feat: LTX2: Support reference audio (ID-LoRA) (#13111) 2026-03-23 18:22:24 -04:00
Talmaj d49420b3c7
LongCat-Image edit (#13003) 2026-03-21 23:51:05 -04:00
rattus 25b6d1d629
wan: vae: Fix light/color change (#13101)
There was an issue where the resample split was too early and dropped one
of the rolling convolutions a frame early. This is most noticable as a
lighting/color change between pixel frames 5->6 (latent 2->3), or as a
lighting change between the first and last frame in an FLF wan flow.
2026-03-21 18:44:35 -04:00
comfyanonymous 11c15d8832
Fix fp16 intermediates giving different results. (#13100) 2026-03-21 17:53:25 -04:00
comfyanonymous b5d32e6ad2
Fix sampling issue with fp16 intermediates. (#13099) 2026-03-21 17:47:42 -04:00
Jedrzej Kosinski 87cda1fc25
Move inline comfy.context_windows imports to top-level in model_base.py (#13083)
The recent PR that added resize_cond_for_context_window methods to
model classes used inline 'import comfy.context_windows' in each
method body. This moves that import to the top-level import section,
replacing 4 duplicate inline imports with a single top-level one.
2026-03-20 20:03:42 -04:00
drozbay 589228e671
Add slice_cond and per-model context window cond resizing (#12645)
* Add slice_cond and per-model context window cond resizing

* Fix cond_value.size() call in context window cond resizing

* Expose additional advanced inputs for ContextWindowsManualNode

Necessary for WanAnimate context windows workflow, which needs cond_retain_index_list = 0 to work properly with its reference input.

---------
2026-03-19 20:42:42 -07:00
rattus f49856af57
ltx: vae: Fix missing init variable (#13074)
Forgot to push this ammendment. Previous test results apply to this.
2026-03-19 22:34:58 -04:00
rattus 82b868a45a
Fix VRAM leak in tiler fallback in video VAEs (#13073)
* sd: soft_empty_cache on tiler fallback

This doesnt cost a lot and creates the expected VRAM reduction in
resource monitors when you fallback to tiler.

* wan: vae: Don't recursion in local fns (move run_up)

Moved Decoder3d’s recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.

* ltx: vae: Don't recursion in local fns (move run_up)

Mov the recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.
2026-03-19 22:30:27 -04:00
comfyanonymous 8458ae2686
Revert "fix: run text encoders on MPS GPU instead of CPU for Apple Silicon (#…" (#13070)
This reverts commit b941913f1d.
2026-03-19 15:27:55 -04:00
Jukka Seppänen fd0261d2bc
Reduce tiled decode peak memory (#13050) 2026-03-19 13:29:34 -04:00
rattus ab14541ef7
memory: Add more exclusion criteria to pinned read (#13067) 2026-03-19 10:03:20 -07:00
rattus fabed694a2
ltx: vae: implement chunked encoder + CPU IO chunking (Big VRAM reductions) (#13062)
* ltx: vae: add cache state to downsample block

* ltx: vae: Add time stride awareness to causal_conv_3d

* ltx: vae: Automate truncation for encoder

Other VAEs just truncate without error. Do the same.

* sd/ltx: Make chunked_io a flag in its own right

Taking this bi-direcitonal, so make it a for-purpose named flag.

* ltx: vae: implement chunked encoder + CPU IO chunking

People are doing things with big frame counts in LTX including V2V
flows. Implement the time-chunked encoder to keep the VRAM down, with
the converse of the new CPU pre-allocation technique, where the chunks
are brought from the CPU JIT.

* ltx: vae-encode: round chunk sizes more strictly

Only powers of 2 and multiple of 8 are valid due to cache slicing.
2026-03-19 09:58:47 -07:00
comfyanonymous f6b869d7d3
fp16 intermediates doen't work for some text enc models. (#13056) 2026-03-18 19:42:28 -04:00
comfyanonymous 56ff88f951
Fix regression. (#13053) 2026-03-18 18:35:25 -04:00
Jukka Seppänen 9fff091f35
Further Reduce LTX VAE decode peak RAM usage (#13052) 2026-03-18 18:32:26 -04:00
comfyanonymous dcd659590f
Make more intermediate values follow the intermediate dtype. (#13051) 2026-03-18 18:14:18 -04:00
Anton Bukov b941913f1d
fix: run text encoders on MPS GPU instead of CPU for Apple Silicon (#12809)
On Apple Silicon, `vram_state` is set to `VRAMState.SHARED` because
CPU and GPU share unified memory. However, `text_encoder_device()`
only checked for `HIGH_VRAM` and `NORMAL_VRAM`, causing all text
encoders to fall back to CPU on MPS devices.

Adding `VRAMState.SHARED` to the condition allows non-quantized text
encoders (e.g. bf16 Gemma 3 12B) to run on the MPS GPU, providing
significant speedup for text encoding and prompt generation.

Note: quantized models (fp4/fp8) that use float8_e4m3fn internally
will still fall back to CPU via the `supports_cast()` check in
`CLIP.__init__()`, since MPS does not support fp8 dtypes.
2026-03-17 21:21:32 -04:00
rattus cad24ce262
cascade: remove dead weight init code (#13026)
This weight init process is fully shadowed be the weight load and
doesnt work in dynamic_vram were the weight allocation is deferred.
2026-03-17 20:59:10 -04:00