Commit Graph

4970 Commits (c6e15cb36601b8d4f1b131df423359d049d31b4a)

Author SHA1 Message Date
comfyanonymous c6e15cb366 Revert "Fix some fp8 scaled checkpoints no longer working. (#13239)"
This reverts commit 4504ee718a.
2026-04-03 03:09:58 -04:00
comfyanonymous 6aba6bd435 ComfyUI version 0.18.4 2026-04-03 03:06:35 -04:00
Daxiong (Lin) dedec11116 Update template to 0.9.43 (#13265) 2026-04-03 03:04:35 -04:00
Alexander Piskun b818eacb0b feat(api-nodes): new Partner nodes for Wan2.7 (#13264)
Signed-off-by: bigcat88 <bigcat88@icloud.com>
2026-04-03 03:03:57 -04:00
comfyanonymous 4504ee718a Fix some fp8 scaled checkpoints no longer working. (#13239) 2026-04-03 03:03:50 -04:00
rattus 94ab69f609 Fix/tweak pinned memory accounting (#13221)
* mm: Lower windows pin threshold

Some workflows have more extranous use of shared GPU memory than is
accounted for in the 5% pin headroom. Lower this for safety.

* mm: Remove pin count clearing threshold.

TOTAL_PINNED_MEMORY is shared between the legacy and aimdo pinning
systems, however this catch-all assumes only the legacy system exists.
Remove the catch-all as the PINNED_MEMORY buffer is coherent already.
2026-04-03 03:03:42 -04:00
ComfyUI Wiki 173e1aa2df chore: update workflow templates to v0.9.38 (#13176) 2026-03-26 16:02:18 -04:00
Alexander Piskun bbc8977597 feat(api-nodes): added new Topaz model (#13175)
Signed-off-by: bigcat88 <bigcat88@icloud.com>
2026-03-26 16:01:16 -04:00
comfyanonymous a0ae3f3bd4 ComfyUI v0.18.2 2026-03-24 17:41:52 -04:00
Alexander Piskun cc9273e655 feat(api-nodes): update xAI Grok nodes (#13140) 2026-03-24 17:39:48 -04:00
comfyanonymous a6e967a391 Update templates package version. (#13141) 2026-03-24 17:38:24 -04:00
comfyanonymous ebf6b52e32 ComfyUI v0.18.1 2026-03-21 22:32:16 -04:00
rattus 25b6d1d629
wan: vae: Fix light/color change (#13101)
There was an issue where the resample split was too early and dropped one
of the rolling convolutions a frame early. This is most noticable as a
lighting/color change between pixel frames 5->6 (latent 2->3), or as a
lighting change between the first and last frame in an FLF wan flow.
2026-03-21 18:44:35 -04:00
comfyanonymous 11c15d8832
Fix fp16 intermediates giving different results. (#13100) 2026-03-21 17:53:25 -04:00
comfyanonymous b5d32e6ad2
Fix sampling issue with fp16 intermediates. (#13099) 2026-03-21 17:47:42 -04:00
comfyanonymous a11f68dd3b
Fix canny node not working with fp16. (#13085) 2026-03-20 23:15:50 -04:00
comfyanonymous dc719cde9c ComfyUI version 0.18.0 2026-03-20 20:09:15 -04:00
Jedrzej Kosinski 87cda1fc25
Move inline comfy.context_windows imports to top-level in model_base.py (#13083)
The recent PR that added resize_cond_for_context_window methods to
model classes used inline 'import comfy.context_windows' in each
method body. This moves that import to the top-level import section,
replacing 4 duplicate inline imports with a single top-level one.
2026-03-20 20:03:42 -04:00
comfyanonymous 45d5c83a30
Make EmptyImage node follow intermediate device/dtype. (#13079) 2026-03-20 16:08:26 -04:00
Alexander Piskun c646d211be
feat(api-nodes): add Quiver SVG nodes (#13047) 2026-03-20 12:23:16 -07:00
drozbay 589228e671
Add slice_cond and per-model context window cond resizing (#12645)
* Add slice_cond and per-model context window cond resizing

* Fix cond_value.size() call in context window cond resizing

* Expose additional advanced inputs for ContextWindowsManualNode

Necessary for WanAnimate context windows workflow, which needs cond_retain_index_list = 0 to work properly with its reference input.

---------
2026-03-19 20:42:42 -07:00
Alexander Piskun e4455fd43a
[API Nodes] mark seedream-3-0-t2i and seedance-1-0-lite models as deprecated (#13060)
* chore(api-nodes): mark seedream-3-0-t2i and seedance-1-0-lite models as deprecated

* fix(api-nodes): fixed old regression in the ByteDanceImageReference node

---------

Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-03-19 20:05:01 -07:00
rattus f49856af57
ltx: vae: Fix missing init variable (#13074)
Forgot to push this ammendment. Previous test results apply to this.
2026-03-19 22:34:58 -04:00
rattus 82b868a45a
Fix VRAM leak in tiler fallback in video VAEs (#13073)
* sd: soft_empty_cache on tiler fallback

This doesnt cost a lot and creates the expected VRAM reduction in
resource monitors when you fallback to tiler.

* wan: vae: Don't recursion in local fns (move run_up)

Moved Decoder3d’s recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.

* ltx: vae: Don't recursion in local fns (move run_up)

Mov the recursive run_up out of forward into a class
method to avoid nested closure self-reference cycles. This avoids
cyclic garbage that delays garbage of tensors which in turn delays
VRAM release before tiled fallback.
2026-03-19 22:30:27 -04:00
comfyanonymous 8458ae2686
Revert "fix: run text encoders on MPS GPU instead of CPU for Apple Silicon (#…" (#13070)
This reverts commit b941913f1d.
2026-03-19 15:27:55 -04:00
Jukka Seppänen fd0261d2bc
Reduce tiled decode peak memory (#13050) 2026-03-19 13:29:34 -04:00
rattus ab14541ef7
memory: Add more exclusion criteria to pinned read (#13067) 2026-03-19 10:03:20 -07:00
rattus 6589562ae3
ltx: vae: implement chunked encoder + CPU IO chunking (Big VRAM reductions) (#13062)
* ltx: vae: add cache state to downsample block

* ltx: vae: Add time stride awareness to causal_conv_3d

* ltx: vae: Automate truncation for encoder

Other VAEs just truncate without error. Do the same.

* sd/ltx: Make chunked_io a flag in its own right

Taking this bi-direcitonal, so make it a for-purpose named flag.

* ltx: vae: implement chunked encoder + CPU IO chunking

People are doing things with big frame counts in LTX including V2V
flows. Implement the time-chunked encoder to keep the VRAM down, with
the converse of the new CPU pre-allocation technique, where the chunks
are brought from the CPU JIT.

* ltx: vae-encode: round chunk sizes more strictly

Only powers of 2 and multiple of 8 are valid due to cache slicing.
2026-03-19 10:01:12 -07:00
rattus fabed694a2
ltx: vae: implement chunked encoder + CPU IO chunking (Big VRAM reductions) (#13062)
* ltx: vae: add cache state to downsample block

* ltx: vae: Add time stride awareness to causal_conv_3d

* ltx: vae: Automate truncation for encoder

Other VAEs just truncate without error. Do the same.

* sd/ltx: Make chunked_io a flag in its own right

Taking this bi-direcitonal, so make it a for-purpose named flag.

* ltx: vae: implement chunked encoder + CPU IO chunking

People are doing things with big frame counts in LTX including V2V
flows. Implement the time-chunked encoder to keep the VRAM down, with
the converse of the new CPU pre-allocation technique, where the chunks
are brought from the CPU JIT.

* ltx: vae-encode: round chunk sizes more strictly

Only powers of 2 and multiple of 8 are valid due to cache slicing.
2026-03-19 09:58:47 -07:00
comfyanonymous f6b869d7d3
fp16 intermediates doen't work for some text enc models. (#13056) 2026-03-18 19:42:28 -04:00
comfyanonymous 56ff88f951
Fix regression. (#13053) 2026-03-18 18:35:25 -04:00
Jukka Seppänen 9fff091f35
Further Reduce LTX VAE decode peak RAM usage (#13052) 2026-03-18 18:32:26 -04:00
comfyanonymous dcd659590f
Make more intermediate values follow the intermediate dtype. (#13051) 2026-03-18 18:14:18 -04:00
Alexander Brown b67ed2a45f
Update comfyui-frontend-package version to 1.41.21 (#13035) 2026-03-18 16:36:39 -04:00
Alexander Piskun 06957022d4
fix(api-nodes): add support for "thought_image" in Nano Banana 2 and corrected price badges (#13038) 2026-03-18 10:21:58 -07:00
Anton Bukov b941913f1d
fix: run text encoders on MPS GPU instead of CPU for Apple Silicon (#12809)
On Apple Silicon, `vram_state` is set to `VRAMState.SHARED` because
CPU and GPU share unified memory. However, `text_encoder_device()`
only checked for `HIGH_VRAM` and `NORMAL_VRAM`, causing all text
encoders to fall back to CPU on MPS devices.

Adding `VRAMState.SHARED` to the condition allows non-quantized text
encoders (e.g. bf16 Gemma 3 12B) to run on the MPS GPU, providing
significant speedup for text encoding and prompt generation.

Note: quantized models (fp4/fp8) that use float8_e4m3fn internally
will still fall back to CPU via the `supports_cast()` check in
`CLIP.__init__()`, since MPS does not support fp8 dtypes.
2026-03-17 21:21:32 -04:00
rattus cad24ce262
cascade: remove dead weight init code (#13026)
This weight init process is fully shadowed be the weight load and
doesnt work in dynamic_vram were the weight allocation is deferred.
2026-03-17 20:59:10 -04:00
comfyanonymous 68d542cc06
Fix case where pixel space VAE could cause issues. (#13030) 2026-03-17 20:46:22 -04:00
Jukka Seppänen 735a0465e5
Inplace VAE output processing to reduce peak RAM consumption. (#13028) 2026-03-17 20:20:49 -04:00
Dr.Lt.Data 8b9d039f26
bump manager version to 4.1b6 (#13022) 2026-03-17 18:17:03 -04:00
rattus 035414ede4
Reduce WAN VAE VRAM, Save use cases for OOM/Tiler (#13014)
* wan: vae: encoder: Add feature cache layer that corks singles

If a downsample only gives you a single frame, save it to the feature
cache and return nothing to the top level. This increases the
efficiency of cacheability, but also prepares support for going two
by two rather than four by four on the frames.

* wan: remove all concatentation with the feature cache

The loopers are now responsible for ensuring that non-final frames are
processes at least two-by-two, elimiating the need for this cat case.

* wan: vae: recurse and chunk for 2+2 frames on decode

Avoid having to clone off slices of 4 frame chunks and reduce the size
of the big 6 frame convolutions down to 4. Save the VRAMs.

* wan: encode frames 2x2.

Reduce VRAM usage greatly by encoding frames 2 at a time rather than
4.

* wan: vae: remove cloning

The loopers now control the chunking such there is noever more than 2
frames, so just cache these slices directly and avoid the clone
allocations completely.

* wan: vae: free consumer caller tensors on recursion

* wan: vae: restyle a little to match LTX
2026-03-17 17:34:39 -04:00
rattus 1a157e1f97
Reduce LTX VAE VRAM usage and save use cases from OOMs/Tiler (#13013)
* ltx: vae: scale the chunk size with the users VRAM

Scale this linearly down for users with low VRAM.

* ltx: vae: free non-chunking recursive intermediates

* ltx: vae: cleanup some intermediates

The conv layer can be the VRAM peak and it does a torch.cat. So cleanup
the pieces of the cat. Also clear our the cache ASAP as each layer detect
its end as this VAE surges in VRAM at the end due to the ended padding
increasing the size of the final frame convolutions off-the-books to
the chunker. So if all the earlier layers free up their cache it can
offset that surge.

Its a fragmentation nightmare, and the chance of it having to recache the
pyt allocator is very high, but you wont OOM.
2026-03-17 17:32:43 -04:00
Christian Byrne ed7c2c6579
Mark weight_dtype as advanced input in Load Diffusion Model node (#12769)
Mark the weight_dtype parameter in UNETLoader (Load Diffusion Model) as
an advanced input to reduce UI complexity for new users. The parameter
is now hidden behind an expandable Advanced section, matching the
pattern used for other advanced inputs like device, tile_size, and
overlap.

Amp-Thread-ID: https://ampcode.com/threads/T-019cbaf1-d3c0-718e-a325-318baba86dec
2026-03-17 07:24:00 -07:00
ComfyUI Wiki 379fbd1a82
chore: update workflow templates to v0.9.26 (#13012) 2026-03-16 21:53:18 -07:00
Paulo Muggler Moreira 8cc746a864
fix: disable SageAttention for Hunyuan3D v2.1 DiT (#12772) 2026-03-16 22:27:27 -04:00
Christian Byrne 9a870b5102
fix: atomic writes for userdata to prevent data loss on crash (#12987)
Write to a temp file in the same directory then os.replace() onto the
target path.  If the process crashes mid-write, the original file is
left intact instead of being truncated to zero bytes.

Fixes #11298
2026-03-16 21:56:35 -04:00
comfyanonymous ca17fc8355
Fix potential issue. (#13009) 2026-03-16 21:38:40 -04:00
Kohaku-Blueleaf 20561aa919
[Trainer] FP4, 8, 16 training by native dtype support and quant linear autograd function (#12681) 2026-03-16 21:31:50 -04:00
comfyanonymous 7a16e8aa4e
Add --enable-dynamic-vram options to force enable it. (#13002) 2026-03-16 16:50:13 -04:00
blepping b202f842af
Skip running model finalizers at exit (#12994) 2026-03-16 16:00:42 -04:00