Commit Graph

49 Commits (6e3c6087d4f4e24d8eedd8d5c6e17c9e88eb91ff)

Author SHA1 Message Date
rattus 535c16ce6e
Widen OOM_EXCEPTION to AcceleratorError form (#12835)
Pytorch only filters for OOMs in its own allocators however there are
paths that can OOM on allocators made outside the pytorch allocators.
These manifest as an AllocatorError as pytorch does not have universal
error translation to its OOM type on exception. Handle it. A log I have
for this also shows a double report of the error async, so call the
async discarder to cleanup and make these OOMs look like OOMs.
2026-03-10 00:41:02 -04:00
comfyanonymous 88e6370527
Remove workaround for old pytorch. (#12480) 2026-02-15 20:43:53 -05:00
rattus 0fd1b78736
Reduce LTX2 VAE VRAM consumption (#12028)
* causal_video_ae: Remove attention ResNet

This attention_head_dim argument does not exist on this constructor so
this is dead code. Remove as generic attention mid VAE conflicts with
temporal roll.

* ltx-vae: consoldate causal/non-causal code paths

* ltx-vae: add cache rolling adder

* ltx-vae: use cached adder for resnet

* ltx-vae: Implement rolling VAE

Implement a temporal rolling VAE for the LTX2 VAE.

Usually when doing temporal rolling VAEs you can just chunk on time relying
on causality and cache behind you as you go. The LTX VAE is however
non-causal.

So go whole hog and implement per layer run ahead and backpressure between
the decoder layers using recursive state beween the layers.

Operations are ammended with temporal_cache_state{} which they can use to
hold any state then need for partial execution. Convolutions cache their
inputs behind the up to N-1 frames, and skip connections need to cache the
mismatch between convolution input and output that happens due to missing
future (non-causal) input.

Each call to run_up() processes a layer accross a range on input that
may or may not be complete. It goes depth first to process as much as
possible to try and digest frames to the final output ASAP. If layers run
out of input due to convolution losses, they simply return without action
effectively applying back-pressure to the earlier layers. As the earlier
layers do more work and caller deeper, the partial states are reconciled
and output continues to digest depth first as much as possible.

Chunking is done using a size quota rather than a fixed frame length and
any layer can initiate chunking, and multiple layers can chunk at different
granulatiries. This remove the old limitation of always having to process
1 latent frame to entirety and having to hold 8 full decoded frames as
the VRAM peak.
2026-01-22 16:54:18 -05:00
comfyanonymous 65cfcf5b1b
New Year ruff cleanup. (#11595) 2026-01-01 22:06:14 -05:00
rattus 73f5649196
Implement temporal rolling VAE (Major VRAM reductions in Hunyuan and Kandinsky) (#10995)
* hunyuan upsampler: rework imports

Remove the transitive import of VideoConv3d and Resnet and takes these
from actual implementation source.

* model: remove unused give_pre_end

According to git grep, this is not used now, and was not used in the
initial commit that introduced it (see below).

This semantic is difficult to implement temporal roll VAE for (and would
defeat the purpose). Rather than implement the complex if, just delete
the unused feature.

(venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline
220afe33 (HEAD) Initial commit.
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py:                 resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py:        self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py:        if self.give_pre_end:

(venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master
Previous HEAD position was 220afe33 Initial commit.
HEAD is now at 9d8a8179 Enable async offloading by default on Nvidia. (#10953)
(venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre
comfy/ldm/modules/diffusionmodules/model.py:                 resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,
comfy/ldm/modules/diffusionmodules/model.py:        self.give_pre_end = give_pre_end
comfy/ldm/modules/diffusionmodules/model.py:        if self.give_pre_end:

* move refiner VAE temporal roller to core

Move the carrying conv op to the common VAE code and give it a better
name. Roll the carry implementation logic for Resnet into the base
class and scrap the Hunyuan specific subclass.

* model: Add temporal roll to main VAE decoder

If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolloing VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).

* model: Add temporal roll to main VAE encoder

If there are no attention layers, its a standard resnet and VideoConv3d
is asked for, substitute in the temporal rolling VAE algorithm. This
reduces VAE usage by the temporal dimension (can be huge VRAM savings).
2025-12-02 22:49:29 -05:00
rattus 277237ccc1
attention: use flag based OOM fallback (#11038)
Exception ref all local variables for the lifetime of exception
context. Just set a flag and then if to dump the exception before
falling back.
2025-12-02 17:24:19 -05:00
comfyanonymous 33bd9ed9cb
Implement hunyuan image refiner model. (#9817) 2025-09-12 00:43:20 -04:00
comfyanonymous b288fb0db8
Small refactor of some vae code. (#9787) 2025-09-09 18:09:56 -04:00
comfyanonymous 9df8792d4b
Make last PR not crash comfy on old pytorch. (#9324) 2025-08-13 15:12:41 -04:00
contentis 3da5a07510
SDPA backend priority (#9299) 2025-08-13 14:53:27 -04:00
chaObserv 61b08d4ba6
Replace manual x * sigmoid(x) with torch silu in VAE nonlinearity (#9057) 2025-07-30 19:25:56 -04:00
comfyanonymous 1cd6cd6080 Disable pytorch attention in VAE for AMD. 2025-02-14 05:42:14 -05:00
comfyanonymous 96e2a45193 Remove useless code. 2025-01-23 05:56:23 -05:00
comfyanonymous 008761166f Optimize first attention block in cosmos VAE. 2025-01-15 21:48:46 -05:00
comfyanonymous 4c5c4ddeda Fix regression in VAE code on old pytorch versions. 2024-12-18 03:08:28 -05:00
comfyanonymous bda1482a27 Basic Hunyuan Video model support. 2024-12-16 19:35:40 -05:00
Chenlei Hu d9d7f3c619
Lint all unused variables (#5989)
* Enable F841

* Autofix

* Remove all unused variable assignment
2024-12-12 17:59:16 -05:00
Chenlei Hu 0fd4e6c778
Lint unused import (#5973)
* Lint unused import

* nit

* Remove unused imports

* revert fix_torch import

* nit
2024-12-09 15:24:39 -05:00
comfyanonymous 98f828fad9 Remove unnecessary code. 2024-05-18 09:36:44 -04:00
comfyanonymous 2a813c3b09 Switch some more prints to logging. 2024-03-11 16:34:58 -04:00
comfyanonymous 261bcbb0d9 A few missing comfy ops in the VAE. 2023-12-22 04:05:42 -05:00
comfyanonymous 77755ab8db Refactor comfy.ops
comfy.ops -> comfy.ops.disable_weight_init

This should make it more clear what they actually do.

Some unused code has also been removed.
2023-12-11 23:27:13 -05:00
comfyanonymous d44a2de49f Make VAE code closer to sgm. 2023-10-17 15:18:51 -04:00
comfyanonymous 23680a9155 Refactor the attention stuff in the VAE. 2023-10-17 03:19:29 -04:00
comfyanonymous 88733c997f pytorch_attention_enabled can now return True when xformers is enabled. 2023-10-11 21:30:57 -04:00
comfyanonymous 1a4bd9e9a6 Refactor the attention functions.
There's no reason for the whole CrossAttention object to be repeated when
only the operation in the middle changes.
2023-10-11 20:38:48 -04:00
comfyanonymous 1938f5c5fe Add a force argument to soft_empty_cache to force a cache empty. 2023-09-04 00:58:18 -04:00
comfyanonymous bed116a1f9 Remove optimization that caused border. 2023-08-29 11:21:36 -04:00
comfyanonymous 1c794a2161 Fallback to slice attention if xformers doesn't support the operation. 2023-08-27 22:24:42 -04:00
comfyanonymous d935ba50c4 Make --bf16-vae work on torch 2.0 2023-08-27 21:33:53 -04:00
comfyanonymous 95d796fc85 Faster VAE loading. 2023-07-29 16:28:30 -04:00
comfyanonymous fa28d7334b Remove useless code. 2023-06-23 12:35:26 -04:00
comfyanonymous b8636a44aa Make scaled_dot_product switch to sliced attention on OOM. 2023-05-20 16:01:02 -04:00
comfyanonymous 797c4e8d3b Simplify and improve some vae attention code. 2023-05-20 15:07:21 -04:00
comfyanonymous bae4fb4a9d Fix imports. 2023-05-04 18:10:29 -04:00
comfyanonymous 73c3e11e83 Fix model_management import so it doesn't get executed twice. 2023-04-15 19:04:33 -04:00
comfyanonymous e46b1c3034 Disable xformers in VAE when xformers == 0.0.18 2023-04-04 22:22:02 -04:00
comfyanonymous 3ed4a4e4e6 Try again with vae tiled decoding if regular fails because of OOM. 2023-03-22 14:49:00 -04:00
comfyanonymous c692509c2b Try to improve VAEEncode memory usage a bit. 2023-03-22 02:45:18 -04:00
comfyanonymous 83f23f82b8 Add pytorch attention support to VAE. 2023-03-13 12:45:54 -04:00
comfyanonymous a256a2abde --disable-xformers should not even try to import xformers. 2023-03-13 11:36:48 -04:00
comfyanonymous 0f3ba7482f Xformers is now properly disabled when --cpu used.
Added --windows-standalone-build option, currently it only opens
makes the code open up comfyui in the browser.
2023-03-12 15:44:16 -04:00
comfyanonymous 1de86851b1 Try to fix memory issue. 2023-03-11 15:15:13 -05:00
comfyanonymous cc8baf1080 Make VAE use common function to get free memory. 2023-03-05 14:20:07 -05:00
comfyanonymous 509c7dfc6d Use real softmax in split op to fix issue with some images. 2023-02-10 03:13:49 -05:00
comfyanonymous 773cdabfce Same thing but for the other places where it's used. 2023-02-09 12:43:29 -05:00
comfyanonymous e8c499ddd4 Split optimization for VAE attention block. 2023-02-08 22:04:20 -05:00
comfyanonymous 5b4e312749 Use inplace operations for less OOM issues. 2023-02-08 22:04:13 -05:00
comfyanonymous 220afe3310 Initial commit. 2023-01-16 22:37:14 -05:00