ComfyUI

Commit Graph

Author	SHA1	Message	Date
rattus	535c16ce6e	Widen OOM_EXCEPTION to AcceleratorError form (#12835 ) Pytorch only filters for OOMs in its own allocators however there are paths that can OOM on allocators made outside the pytorch allocators. These manifest as an AllocatorError as pytorch does not have universal error translation to its OOM type on exception. Handle it. A log I have for this also shows a double report of the error async, so call the async discarder to cleanup and make these OOMs look like OOMs.	2026-03-10 00:41:02 -04:00
comfyanonymous	88e6370527	Remove workaround for old pytorch. (#12480 )	2026-02-15 20:43:53 -05:00
rattus	0fd1b78736	Reduce LTX2 VAE VRAM consumption (#12028 ) * causal_video_ae: Remove attention ResNet This attention_head_dim argument does not exist on this constructor so this is dead code. Remove as generic attention mid VAE conflicts with temporal roll. * ltx-vae: consoldate causal/non-causal code paths * ltx-vae: add cache rolling adder * ltx-vae: use cached adder for resnet * ltx-vae: Implement rolling VAE Implement a temporal rolling VAE for the LTX2 VAE. Usually when doing temporal rolling VAEs you can just chunk on time relying on causality and cache behind you as you go. The LTX VAE is however non-causal. So go whole hog and implement per layer run ahead and backpressure between the decoder layers using recursive state beween the layers. Operations are ammended with temporal_cache_state{} which they can use to hold any state then need for partial execution. Convolutions cache their inputs behind the up to N-1 frames, and skip connections need to cache the mismatch between convolution input and output that happens due to missing future (non-causal) input. Each call to run_up() processes a layer accross a range on input that may or may not be complete. It goes depth first to process as much as possible to try and digest frames to the final output ASAP. If layers run out of input due to convolution losses, they simply return without action effectively applying back-pressure to the earlier layers. As the earlier layers do more work and caller deeper, the partial states are reconciled and output continues to digest depth first as much as possible. Chunking is done using a size quota rather than a fixed frame length and any layer can initiate chunking, and multiple layers can chunk at different granulatiries. This remove the old limitation of always having to process 1 latent frame to entirety and having to hold 8 full decoded frames as the VRAM peak.	2026-01-22 16:54:18 -05:00
comfyanonymous	65cfcf5b1b	New Year ruff cleanup. (#11595 )	2026-01-01 22:06:14 -05:00
rattus	73f5649196	Implement temporal rolling VAE (Major VRAM reductions in Hunyuan and Kandinsky) (#10995 ) * hunyuan upsampler: rework imports Remove the transitive import of VideoConv3d and Resnet and takes these from actual implementation source. * model: remove unused give_pre_end According to git grep, this is not used now, and was not used in the initial commit that introduced it (see below). This semantic is difficult to implement temporal roll VAE for (and would defeat the purpose). Rather than implement the complex if, just delete the unused feature. (venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline `220afe33` (HEAD) Initial commit. (venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False, comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end: (venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master Previous HEAD position was `220afe33` Initial commit. HEAD is now at `9d8a8179` Enable async offloading by default on Nvidia. (#10953) (venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False, comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end: * move refiner VAE temporal roller to core Move the carrying conv op to the common VAE code and give it a better name. Roll the carry implementation logic for Resnet into the base class and scrap the Hunyuan specific subclass. * model: Add temporal roll to main VAE decoder If there are no attention layers, its a standard resnet and VideoConv3d is asked for, substitute in the temporal rolloing VAE algorithm. This reduces VAE usage by the temporal dimension (can be huge VRAM savings). * model: Add temporal roll to main VAE encoder If there are no attention layers, its a standard resnet and VideoConv3d is asked for, substitute in the temporal rolling VAE algorithm. This reduces VAE usage by the temporal dimension (can be huge VRAM savings).	2025-12-02 22:49:29 -05:00
rattus	277237ccc1	attention: use flag based OOM fallback (#11038 ) Exception ref all local variables for the lifetime of exception context. Just set a flag and then if to dump the exception before falling back.	2025-12-02 17:24:19 -05:00
comfyanonymous	33bd9ed9cb	Implement hunyuan image refiner model. (#9817 )	2025-09-12 00:43:20 -04:00
comfyanonymous	b288fb0db8	Small refactor of some vae code. (#9787 )	2025-09-09 18:09:56 -04:00
comfyanonymous	9df8792d4b	Make last PR not crash comfy on old pytorch. (#9324 )	2025-08-13 15:12:41 -04:00
contentis	3da5a07510	SDPA backend priority (#9299 )	2025-08-13 14:53:27 -04:00
chaObserv	61b08d4ba6	Replace manual x * sigmoid(x) with torch silu in VAE nonlinearity (#9057 )	2025-07-30 19:25:56 -04:00
comfyanonymous	1cd6cd6080	Disable pytorch attention in VAE for AMD.	2025-02-14 05:42:14 -05:00
comfyanonymous	96e2a45193	Remove useless code.	2025-01-23 05:56:23 -05:00
comfyanonymous	008761166f	Optimize first attention block in cosmos VAE.	2025-01-15 21:48:46 -05:00
comfyanonymous	4c5c4ddeda	Fix regression in VAE code on old pytorch versions.	2024-12-18 03:08:28 -05:00
comfyanonymous	bda1482a27	Basic Hunyuan Video model support.	2024-12-16 19:35:40 -05:00
Chenlei Hu	d9d7f3c619	Lint all unused variables (#5989 ) * Enable F841 * Autofix * Remove all unused variable assignment	2024-12-12 17:59:16 -05:00
Chenlei Hu	0fd4e6c778	Lint unused import (#5973 ) * Lint unused import * nit * Remove unused imports * revert fix_torch import * nit	2024-12-09 15:24:39 -05:00
comfyanonymous	98f828fad9	Remove unnecessary code.	2024-05-18 09:36:44 -04:00
comfyanonymous	2a813c3b09	Switch some more prints to logging.	2024-03-11 16:34:58 -04:00
comfyanonymous	261bcbb0d9	A few missing comfy ops in the VAE.	2023-12-22 04:05:42 -05:00
comfyanonymous	77755ab8db	Refactor comfy.ops comfy.ops -> comfy.ops.disable_weight_init This should make it more clear what they actually do. Some unused code has also been removed.	2023-12-11 23:27:13 -05:00
comfyanonymous	d44a2de49f	Make VAE code closer to sgm.	2023-10-17 15:18:51 -04:00
comfyanonymous	23680a9155	Refactor the attention stuff in the VAE.	2023-10-17 03:19:29 -04:00
comfyanonymous	88733c997f	pytorch_attention_enabled can now return True when xformers is enabled.	2023-10-11 21:30:57 -04:00
comfyanonymous	1a4bd9e9a6	Refactor the attention functions. There's no reason for the whole CrossAttention object to be repeated when only the operation in the middle changes.	2023-10-11 20:38:48 -04:00
comfyanonymous	1938f5c5fe	Add a force argument to soft_empty_cache to force a cache empty.	2023-09-04 00:58:18 -04:00
comfyanonymous	bed116a1f9	Remove optimization that caused border.	2023-08-29 11:21:36 -04:00
comfyanonymous	1c794a2161	Fallback to slice attention if xformers doesn't support the operation.	2023-08-27 22:24:42 -04:00
comfyanonymous	d935ba50c4	Make --bf16-vae work on torch 2.0	2023-08-27 21:33:53 -04:00
comfyanonymous	95d796fc85	Faster VAE loading.	2023-07-29 16:28:30 -04:00
comfyanonymous	fa28d7334b	Remove useless code.	2023-06-23 12:35:26 -04:00
comfyanonymous	b8636a44aa	Make scaled_dot_product switch to sliced attention on OOM.	2023-05-20 16:01:02 -04:00
comfyanonymous	797c4e8d3b	Simplify and improve some vae attention code.	2023-05-20 15:07:21 -04:00
comfyanonymous	bae4fb4a9d	Fix imports.	2023-05-04 18:10:29 -04:00
comfyanonymous	73c3e11e83	Fix model_management import so it doesn't get executed twice.	2023-04-15 19:04:33 -04:00
comfyanonymous	e46b1c3034	Disable xformers in VAE when xformers == 0.0.18	2023-04-04 22:22:02 -04:00
comfyanonymous	3ed4a4e4e6	Try again with vae tiled decoding if regular fails because of OOM.	2023-03-22 14:49:00 -04:00
comfyanonymous	c692509c2b	Try to improve VAEEncode memory usage a bit.	2023-03-22 02:45:18 -04:00
comfyanonymous	83f23f82b8	Add pytorch attention support to VAE.	2023-03-13 12:45:54 -04:00
comfyanonymous	a256a2abde	--disable-xformers should not even try to import xformers.	2023-03-13 11:36:48 -04:00
comfyanonymous	0f3ba7482f	Xformers is now properly disabled when --cpu used. Added --windows-standalone-build option, currently it only opens makes the code open up comfyui in the browser.	2023-03-12 15:44:16 -04:00
comfyanonymous	1de86851b1	Try to fix memory issue.	2023-03-11 15:15:13 -05:00
comfyanonymous	cc8baf1080	Make VAE use common function to get free memory.	2023-03-05 14:20:07 -05:00
comfyanonymous	509c7dfc6d	Use real softmax in split op to fix issue with some images.	2023-02-10 03:13:49 -05:00
comfyanonymous	773cdabfce	Same thing but for the other places where it's used.	2023-02-09 12:43:29 -05:00
comfyanonymous	e8c499ddd4	Split optimization for VAE attention block.	2023-02-08 22:04:20 -05:00
comfyanonymous	5b4e312749	Use inplace operations for less OOM issues.	2023-02-08 22:04:13 -05:00
comfyanonymous	220afe3310	Initial commit.	2023-01-16 22:37:14 -05:00

49 Commits (6e3c6087d4f4e24d8eedd8d5c6e17c9e88eb91ff)