ComfyUI

Commit Graph

Author	SHA1	Message	Date
rattus	5f41584e96	Disable dynamic_vram when weight hooks applied (#12653 ) * sd: add support for clip model reconstruction * nodes: SetClipHooks: Demote the dynamic model patcher * mp: Make dynamic_disable more robust The backup need to not be cloned. In addition add a delegate object to ModelPatcherDynamic so that non-cloning code can do ModelPatcherDynamic demotion * sampler_helpers: Demote to non-dynamic model patcher when hooking * code rabbit review comments	2026-02-28 16:50:18 -05:00
Jukka Seppänen	1f6744162f	feat: Support SCAIL WanVideo model (#12614 )	2026-02-28 16:49:12 -05:00
fappaz	95e1059661	fix(ace15): handle missing lm_metadata in memory estimation during checkpoint export #12669 (#12686 )	2026-02-28 01:18:40 -05:00
Talmaj	ac4412d0fa	Native LongCat-Image implementation (#12597 )	2026-02-27 23:04:34 -05:00
rattus	e721e24136	ops: implement lora requanting for non QuantizedTensor fp8 (#12668 ) Allow non QuantizedTensor layer to set want_requant to get the post lora calculation stochastic cast down to the original input dtype. This is then used by the legacy fp8 Linear implementation to set the compute_dtype to the preferred lora dtype but then want_requant it back down to fp8. This fixes the issue with --fast fp8_matrix_mult is combined with --fast dynamic_vram which doing a lora on an fp8_ non QT model.	2026-02-27 19:05:51 -05:00
Reiner "Tiles" Prokein	25ec3d96a3	Class WanVAE, def encode, feat_map is using self.decoder instead of self.encoder (#12682 )	2026-02-27 19:03:45 -05:00
vickytsang	35e9fce775	Enable Pytorch Attention for gfx950 (#12641 )	2026-02-26 20:16:12 -05:00
Jukka Seppänen	c7f7d52b68	feat: Support SDPose-OOD (#12661 )	2026-02-26 19:59:05 -05:00
fappaz	b233dbe0bc	feat(ace-step): add ACE-Step 1.5 lycoris key alias mapping for LoKR #12638 (#12665 )	2026-02-26 18:19:19 -05:00
comfyanonymous	8a4d85c708	Cleanups to the last PR. (#12646 )	2026-02-26 01:30:31 -05:00
Tavi Halperin	a4522017c5	feat: per-guide attention strength control in self-attention (#12518 ) Implements per-guide attention attenuation via log-space additive bias in self-attention. Each guide reference tracks its own strength and optional spatial mask in conditioning metadata (guide_attention_entries).	2026-02-26 01:25:23 -05:00
Jukka Seppänen	907e5dcbbf	initial FlowRVS support (#12637 )	2026-02-25 23:38:46 -05:00
comfyanonymous	7253531670	Fix ltxav te mem estimation. (#12643 )	2026-02-25 23:13:47 -05:00
comfyanonymous	e14b04478c	Fix LTXAV text enc min length. (#12640 ) Should have been 1024 instead of 512	2026-02-25 22:36:02 -05:00
rattus	4f5b7dbf1f	Fix Aimdo fallback on probe to not use zero-copy sft (#12634 ) * utils: dont use comfy sft loader in aimdo fallback This was going to the raw command line switch and should respect main.py probe of whether aimdo actually loaded successfully. * ops: dont use deferred linear load in Aimdo fallback Avoid changes of behaviour on --fast dynamic_vram when aimdo doesnt work.	2026-02-25 16:49:48 -05:00
rattus	3ebe1ac22e	Disable dynamic_vram when using torch compiler (#12612 ) * mp: attach re-construction arguments to model patcher When making a model-patcher from a unet or ckpt, attach a callable function that can be called to replay the model construction. This can be used to deep clone model patcher WRT the actual model. Originally written by Kosinkadink `f4b99bc623` * mp: Add disable_dynamic clone argument Add a clone argument that lets a caller clone a ModelPatcher but disable dynamic to demote the clone to regular MP. This is useful for legacy features where dynamic_vram support is missing or TBD. * torch_compile: disable dynamic_vram This is a bigger feature. Disable for the interim to preserve functionality.	2026-02-24 19:13:46 -05:00
comfyanonymous	599f9c5010	Don't crash right away if op is uninitialized. (#12615 )	2026-02-24 12:28:25 -05:00
comfyanonymous	84aba95e03	Temporality unbreak some LTXAV workflows to give people time to migrate. (#12605 )	2026-02-24 00:50:03 -05:00
comfyanonymous	caa43d2395	Fix issue loading fp8 ltxav checkpoints. (#12582 )	2026-02-22 16:00:02 -05:00
comfyanonymous	07ca6852e8	Fix dtype issue in embeddings connector. (#12570 )	2026-02-22 03:18:20 -05:00
comfyanonymous	f266b8d352	Move LTXAV av embedding connectors to diffusion model. (#12569 )	2026-02-21 22:29:58 -05:00
rattus	0bfb936ab4	comfy-aimdo 0.2 - Improved pytorch allocator integration (#12557 ) Integrate comfy-aimdo 0.2 which takes a different approach to installing the memory allocator hook. Instead of using the complicated and buggy pytorch MemPool+CudaPluggableAlloctor, cuda is directly hooked making the process much more transparent to both comfy and pytorch. As far as pytorch knows, aimdo doesnt exist anymore, and just operates behind the scenes. Remove all the mempool setup stuff for dynamic_vram and bump the comfy-aimdo version. Remove the allocator object from memory_management and demote its use as an enablment check to a boolean flag. Comfy-aimdo 0.2 also support the pytorch cuda async allocator, so remove the dynamic_vram based force disablement of cuda_malloc and just go back to the old settings of allocators based on command line input.	2026-02-21 10:52:57 -08:00
Terry Jia	f394af8d0f	feat: add gradient-slider display mode for FLOAT inputs (#12536 ) * feat: add gradient-slider display mode for FLOAT inputs * fix: use precise type annotation list[list[float]] for gradient_stops Amp-Thread-ID: https://ampcode.com/threads/T-019c7eea-be2b-72ce-a51f-838376f9b7a7 --------- Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com> Co-authored-by: bymyself <cbyrne@comfy.org>	2026-02-20 22:52:32 -08:00
comfyanonymous	5f2117528a	Force min length 1 when tokenizing for text generation. (#12538 )	2026-02-19 22:57:44 -05:00
comfyanonymous	0301ccf745	Small cleanup and try to get qwen 3 work with the text gen. (#12537 )	2026-02-19 22:42:28 -05:00
Jukka Seppänen	6d11cc7354	feat: Add basic text generation support with native models, initially supporting Gemma3 (#12392 )	2026-02-18 20:49:43 -05:00
rattus	58dcc97dcf	ops: limit return of requants (#12506 ) This check was far too broad and the dtype is not a reliable indicator of wanting the requant (as QT returns the compute dtype as the dtype). So explictly plumb whether fp8mm wants the requant or not.	2026-02-17 15:32:27 -05:00
chaObserv	44f8598521	Fix anima LLM adapter forward when manual cast (#12504 )	2026-02-17 07:56:44 -08:00
comfyanonymous	c39653163d	Fix anima preprocess text embeds not using right inference dtype. (#12501 )	2026-02-17 00:29:20 -05:00
comfyanonymous	18927538a1	Implement NAG on all the models based on the Flux code. (#12500 ) Use the Normalized Attention Guidance node. Flux, Flux2, Klein, Chroma, Chroma radiance, Hunyuan Video, etc..	2026-02-16 23:30:34 -05:00
comfyanonymous	4454fab7f0	Remove code to support RMSNorm on old pytorch. (#12499 )	2026-02-16 20:09:24 -05:00
comfyanonymous	88e6370527	Remove workaround for old pytorch. (#12480 )	2026-02-15 20:43:53 -05:00
rattus	c0370044cd	MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash) (#12446 ) * lora: add weight shape calculations. This lets the loader know if a lora will change the shape of a weight so it can take appropriate action. * MPDynamic: force load flux img_in weight This weight is a bit special, in that the lora changes its geometry. This is rather unique, not handled by existing estimate and doesn't work for either offloading or dynamic_vram. Fix for dynamic_vram as a special case. Ideally we can fully precalculate these lora geometry changes at load time, but just get these models working first.	2026-02-15 20:30:09 -05:00
comfyanonymous	e1ede29d82	Remove unsafe pickle loading code that was used on pytorch older than 2.4 (#12473 ) ComfyUI hasn't started on pytorch 2.4 since last month.	2026-02-14 22:53:52 -05:00
krigeta	dc9822b7df	Add working Qwen 2512 ControlNet (Fun ControlNet) support (#12359 )	2026-02-13 22:23:52 -05:00
comfyanonymous	712efb466b	Add left padding to LTXAV text encoder. (#12456 )	2026-02-13 21:56:54 -05:00
comfyanonymous	726af73867	Fix some custom nodes. (#12455 )	2026-02-13 20:21:10 -05:00
comfyanonymous	831351a29e	Support generating attention masks for left padded text encoders. (#12454 )	2026-02-13 20:15:23 -05:00
comfyanonymous	e1add563f9	Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432 )	2026-02-13 15:35:13 -05:00
rattus	8902907d7a	dynamic_vram: Training fixes (#12442 )	2026-02-13 15:29:37 -05:00
rattus	ae79e33345	llama: use a more efficient rope implementation (#12434 ) Get rid of the cat and unary negation and inplace add-cmul the two halves of the rope. Precompute -sin once at the start of the model rather than every transformer block. This is slightly faster on both GPU and CPU bound setups.	2026-02-12 19:56:42 -05:00
rattus	117e214354	ModelPatcherDynamic: force load non leaf weights (#12433 ) The current behaviour of the default ModelPatcher is to .to a model only if its fully loaded, which is how random non-leaf weights get loaded in non-LowVRAM conditions. The however means they never get loaded in dynamic_vram. In the dynamic_vram case, force load them to the GPU.	2026-02-12 19:51:50 -05:00
askmyteapot	e5ae670a40	Update ace15.py to allow min_p sampling (#12373 )	2026-02-11 20:28:48 -05:00
rattus	3fe61cedda	model_patcher: guard against none model_dtype (#12410 ) Handle the case where the _model_dtype exists but is none with the intended fallback.	2026-02-11 14:54:02 -05:00
rattus	2a4328d639	ace15: Use dynamic_vram friendly trange (#12409 ) Factor out the ksampler trange and use it in ACE LLM to prevent the silent stall at 0 and rate distortion due to first-step model load.	2026-02-11 14:53:42 -05:00
rattus	d297a749a2	dynamic_vram: Fix windows Aimdo crash + Fix LLM performance (#12408 ) * model_management: lazy-cache aimdo_tensor These tensors cosntructed from aimdo-allocations are CPU expensive to make on the pytorch side. Add a cache version that will be valid with signature match to fast path past whatever torch is doing. * dynamic_vram: Minimize fast path CPU work Move as much as possible inside the not resident if block and cache the formed weight and bias rather than the flat intermediates. In extreme layer weight rates this adds up.	2026-02-11 14:50:16 -05:00
comfyanonymous	76a7fa96db	Make built in lora training work on anima. (#12402 )	2026-02-10 22:04:32 -05:00
Kohaku-Blueleaf	cdcf4119b3	[Trainer] training with proper offloading (#12189 ) * Fix bypass dtype/device moving * Force offloading mode for training * training context var * offloading implementation in training node * fix wrong input type * Support bypass load lora model, correct adapter/offloading handling	2026-02-10 21:45:19 -05:00
rattus	123a7874a9	ops: Fix vanilla-fp8 loaded lora quality (#12390 ) This was missing the stochastic rounding required for fp8 downcast to be consistent with model_patcher.patch_weight_to_device. Missed in testing as I spend too much time with quantized tensors and overlooked the simpler ones.	2026-02-10 13:38:28 -05:00
rattus	f719f9c062	sd: delay VAE dtype archive until after override (#12388 ) VAEs have host specific dtype logic that should override the dynamic _model_dtype. Defer the archiving of model dtypes until after.	2026-02-10 13:37:46 -05:00
rattus	fe053ba5eb	mp: dont deep-clone objects from model_options (#12382 ) If there are non-trivial python objects nested in the model_options, this causes all sorts of issues. Traverse lists and dicts so clones can safely overide settings and BYO objects but stop there on the deepclone.	2026-02-10 13:37:17 -05:00
comfyanonymous	a4be04c5d7	Ace step prompts match now. (#12376 )	2026-02-09 19:45:56 -05:00
blepping	baf8c87455	Iimprovements to ACE-Steps 1.5 text encoding (part 2) (#12350 )	2026-02-09 19:41:49 -05:00
rattus	62315fbb15	Dynamic VRAM fixes - Ace 1.5 performance + a VRAM leak (#12368 ) * revert threaded model loader change This change was only needed to get around the pytorch 2.7 mempool bugs, and should have been reverted along with #12260. This fixes a different memory leak where pytorch gets confused about cache emptying. * load non comfy weights * MPDynamic: Pre-generate the tensors for vbars Apparently this is an expensive operation that slows down things. * bump to aimdo 1.8 New features: watermark limit feature logging enhancements -O2 build on linux	2026-02-09 16:16:08 -05:00
comfyanonymous	f350a84261	Disable prompt weights for ltxv2. (#12354 )	2026-02-07 19:16:28 -05:00
comfyanonymous	17e7df43d1	Pad ace step 1.5 ref audio if not long enough. (#12341 )	2026-02-07 00:02:11 -05:00
comfyanonymous	039955c527	Some fixes to previous pr. (#12339 )	2026-02-06 20:14:52 -05:00
tdrussell	6a26328842	Support fp16 for Cosmos-Predict2 and Anima (#12249 )	2026-02-06 20:12:15 -05:00
comfyanonymous	204e65b8dc	Fix bug with last pr (#12338 )	2026-02-06 19:48:20 -05:00
asagi4	a831c19b70	Fix return_word_ids=True with Anima tokenizer (#12328 )	2026-02-06 19:38:04 -05:00
comfyanonymous	eba6c940fd	Make ace step 1.5 base model work properly with default workflow. (#12337 )	2026-02-06 19:14:56 -05:00
comfyanonymous	c2d7f07dbf	Fix issue when using disable_unet_model_creation (#12315 )	2026-02-05 19:24:09 -05:00
comfyanonymous	458292fef0	Fix some lowvram stuff with ace step 1.5 (#12312 )	2026-02-05 19:15:04 -05:00
comfyanonymous	6555dc65b8	Make ace step 1.5 work without the llm. (#12311 )	2026-02-05 16:43:45 -05:00
comfyanonymous	35183543e0	Add VAE tiled decode node for audio. (#12299 )	2026-02-05 01:12:04 -05:00
blepping	a246cc02b2	Improvements to ACE-Steps 1.5 text encoding (#12283 )	2026-02-05 00:17:37 -05:00
comfyanonymous	a50c32d63f	Disable sage attention on ace step 1.5 (#12297 )	2026-02-04 22:15:30 -05:00
comfyanonymous	6125b80979	Add llm sampling options and make reference audio work on ace step 1.5 (#12295 )	2026-02-04 21:29:22 -05:00
comfyanonymous	c8fcbd66ee	Try to fix ace text encoder slowness on some configs. (#12290 )	2026-02-04 19:37:05 -05:00
comfyanonymous	26dd7eb421	Fix ace step nan issue on some hardware/pytorch configs. (#12289 )	2026-02-04 18:25:06 -05:00
rattus	ef73070ea4	mp: Fix checkpoint saving (#12268 ) Fix regression in the recent model saving refactor. Pass the non unet pieces down the layers so that checkpoints are complete.	2026-02-04 02:08:45 -05:00
rattus	d30c609f5a	utils: safetensors: dont slice data on torch level (#12266 ) Torch has alignment enforcement when viewing with data type changes but only relative to itself. Do all tensor constructions straight off the memory-view individually so pytorch doesnt see an alignment problem. The is needed for handling misaligned safetensors weights, which are reasonably common in third party models. This limits usage of this safetensors loader to GPU compute only as CPUs kernnel are very likely to bus error. But it works for dynamic_vram, where we really dont want to take a deep copy and we always use GPU copy_ which disentangles the misalignment.	2026-02-04 01:48:47 -05:00
comfyanonymous	a31681564d	Fix crash with ace step 1.5 (#12264 )	2026-02-04 00:03:21 -05:00
rattus	855849c658	mm: Remove Aimdo exemption for empty_cache (#12260 ) Its more important to get the torch caching allocator GC up and running than supporting the pyt2.7 bug. Switch it on. Defeature dynamic_vram + pyt2.7.	2026-02-03 21:39:19 -05:00
comfyanonymous	fe2511468d	Support the 4B ace step 1.5 lm model. (#12257 ) Can be used as an alternative to the 1.7B	2026-02-03 19:01:38 -05:00
comfyanonymous	b8315e66cb	Fix tiled vae for ace step 1.5 (#12253 )	2026-02-03 14:40:45 -05:00
comfyanonymous	ab1050bec3	Support ace step 1.5 base model loras. (#12252 )	2026-02-03 13:54:23 -05:00
comfyanonymous	85fc35e8fa	Fix mac issue. (#12250 )	2026-02-03 12:19:39 -05:00
comfyanonymous	223364743c	llama: cast logits as a comfy-weight (#12248 ) This is using a different layers weight with .to(). Change it to use the ops caster if the original layer is a comfy weight so that it picks up dynamic_vram and async_offload functionality in full. Co-authored-by: Rattus <rattus128@gmail.com>	2026-02-03 11:31:36 -05:00
comfyanonymous	affe881354	Fix some issues with mac. (#12247 )	2026-02-03 11:07:04 -05:00
comfyanonymous	f5030e26fd	Add progress bar to ace step. (#12242 )	2026-02-03 04:09:30 -05:00
comfyanonymous	3c1a1a2df8	Basic support for the ace step 1.5 model. (#12237 )	2026-02-03 00:06:18 -05:00
comfyanonymous	c05a08ae66	Add back function. (#12234 )	2026-02-02 19:52:07 -05:00
rattus	de9ada6a41	Dynamic VRAM unloading fix (#12227 ) * mp: fix full dynamic unloading This was not unloading dynamic models when requesting a full unload via the unpatch() code path. This was ok, i your workflow was all dynamic models but fails with big VRAM leaks if you need to fully unload something for a regular ModelPatcher It also fices the "unload models" button. * mm: load models outside of Aimdo Mempool In dynamic_vram mode, escape the Aimdo mempool and load into the regular mempool. Use a dummy thread to do it.	2026-02-02 17:35:20 -05:00
rattus	37f711d4a1	mm: Fix cast buffers with intel offloading (#12229 ) Intel has offloading support but there were some nvidia calls in the new cast buffer stuff.	2026-02-02 17:34:46 -05:00
comfyanonymous	dd86b15521	Enable embeddings for some qwen 3 models. (#12218 )	2026-02-02 03:51:09 -05:00
comfyanonymous	021ba20719	Fix issue with parameters on root model object. (#12216 )	2026-02-01 20:12:52 -05:00
rattus	2b5da3b72e	dynamic_vram: silence pytorch buffer warning (#12210 ) This is log clutter and concerning to users. Its a false alarm.	2026-02-01 20:09:55 -05:00
rattus	794d05bdb1	dynamic_vram: respect argument cast dtypes in non-comfy weights (#12209 ) This function has a dtype argument that allows the caller to set the dtype in the cast. TIL Some models override this on weight casts, which means its the highest priority. Priority scheme is: argument > model dtype > state dict dtype	2026-02-01 20:09:21 -05:00
rattus	361b9a82a3	fix pinning with model defined dtype (#12208 ) pinned memory was converted back to pinning the CPU side weight without any changes. Fix the pinner to use the CPU weight and not the model defined geometry. This will either save RAM or stop buffer overruns when the types mismatch. Fix the model defined weight caster to use the [ s.weight, s.bias ] interpretation, as xfer_dest might be the flattened pin now. Fix the detection of needing to cast to not be conditional on !pin.	2026-02-01 08:42:32 -08:00
comfyanonymous	667a1b8878	Fix some custom nodes breaking. (#12203 )	2026-02-01 01:55:18 -05:00
rattus	f8acd9c402	Reduce RAM usage, fix VRAM OOMs, and fix Windows shared memory spilling with adaptive model loading (#11845 )	2026-02-01 01:01:11 -05:00
comfyanonymous	873de5f37a	KV cache implementation for using llama models for text generation. (#12195 )	2026-01-31 21:11:11 -05:00
comfyanonymous	b8f848bfe3	Fix model not working with any res. (#12186 )	2026-01-31 00:12:48 -05:00
comfyanonymous	c9b633d84f	Add missing spacial downscale ratios. (#12146 )	2026-01-28 20:52:51 -05:00
guill	dcff27fe3f	Add support for dev-only nodes. (#12106 ) When a node is declared as dev-only, it doesn't show in the default UI unless the dev mode is enabled in the settings. The intention is to allow nodes related to unit testing to be included in ComfyUI distributions without confusing the average user.	2026-01-27 13:03:29 -08:00
rattus	6516ab335d	wan-vae: Switch off feature cache for single frame (#12090 ) The code throughout is None safe to just skip the feature cache saving step if none. Set it none in single frame use so qwen doesn't burn VRAM on the unused cache.	2026-01-26 19:40:19 -05:00
comfyanonymous	2129e7d278	Fix mistral 3 tokenizer code failing on latest transformers version and other breakage. (#12095 ) * Fix mistral 3 tokenizer code failing on latest transformers version. * Add requests to the requirements	2026-01-26 11:39:00 -05:00
Kohaku-Blueleaf	a97c98068f	[Weight-adapter/Trainer] Bypass forward mode in Weight adapter system (#11958 ) * Add API of bypass forward module * bypass implementation * add bypass fwd into nodes list/trainer	2026-01-24 22:56:22 -05:00
comfyanonymous	635406e283	Only enable fp16 on z image models that actually support it. (#12065 )	2026-01-24 22:32:28 -05:00
comfyanonymous	aef4e13588	Make empty latent node work with other models. (#12062 )	2026-01-24 19:23:20 -05:00
rattus	4e6a1b66a9	speed up and reduce VRAM of QWEN VAE and WAN (less so) (#12036 ) * ops: introduce autopad for conv3d This works around pytorch missing ability to causal pad as part of the kernel and avoids massive weight duplications for padding. * wan-vae: rework causal padding This currently uses F.pad which takes a full deep copy and is liable to be the VRAM peak. Instead, kick spatial padding back to the op and consolidate the temporal padding with the cat for the cache. * wan-vae: implement zero pad fast path The WAN VAE is also QWEN where it is used single-image. These convolutions are however zero padded 3d convolutions, which means the VAE is actually just 2D down the last element of the conv weight in the temporal dimension. Fast path this, to avoid adding zeros that then just evaporate in convoluton math but cost computation.	2026-01-23 19:56:14 -05:00
comfyanonymous	9cf299a9f9	Make regular empty latent node work properly on flux 2 variants. (#12050 )	2026-01-23 19:50:48 -05:00
ComfyUI Wiki	e89b22993a	Support ModelScope-Trainer/DiffSynth LoRA format for Flux.2 Klein models (#12042 )	2026-01-23 15:27:49 -05:00
Jukka Seppänen	55bd606e92	LTX2: Refactor forward function for better VRAM efficiency and fix spatial inpainting (#12046 ) * Disable timestep embed compression when inpainting Spatial inpainting not compatible with the compression * Reduce crossattn peak VRAM * LTX2: Refactor forward function for better VRAM efficiency	2026-01-23 15:26:38 -05:00
Omri Marom	d7f3241bf6	qwen_image: propagate attention mask. (#11966 )	2026-01-22 20:02:31 -05:00
comfyanonymous	09a2e67151	Support loading flux 2 klein checkpoints saved with SaveCheckpoint. (#12033 )	2026-01-22 18:20:48 -05:00
rattus	0fd1b78736	Reduce LTX2 VAE VRAM consumption (#12028 ) * causal_video_ae: Remove attention ResNet This attention_head_dim argument does not exist on this constructor so this is dead code. Remove as generic attention mid VAE conflicts with temporal roll. * ltx-vae: consoldate causal/non-causal code paths * ltx-vae: add cache rolling adder * ltx-vae: use cached adder for resnet * ltx-vae: Implement rolling VAE Implement a temporal rolling VAE for the LTX2 VAE. Usually when doing temporal rolling VAEs you can just chunk on time relying on causality and cache behind you as you go. The LTX VAE is however non-causal. So go whole hog and implement per layer run ahead and backpressure between the decoder layers using recursive state beween the layers. Operations are ammended with temporal_cache_state{} which they can use to hold any state then need for partial execution. Convolutions cache their inputs behind the up to N-1 frames, and skip connections need to cache the mismatch between convolution input and output that happens due to missing future (non-causal) input. Each call to run_up() processes a layer accross a range on input that may or may not be complete. It goes depth first to process as much as possible to try and digest frames to the final output ASAP. If layers run out of input due to convolution losses, they simply return without action effectively applying back-pressure to the earlier layers. As the earlier layers do more work and caller deeper, the partial states are reconciled and output continues to digest depth first as much as possible. Chunking is done using a size quota rather than a fixed frame length and any layer can initiate chunking, and multiple layers can chunk at different granulatiries. This remove the old limitation of always having to process 1 latent frame to entirety and having to hold 8 full decoded frames as the VRAM peak.	2026-01-22 16:54:18 -05:00
Jukka Seppänen	16b9aabd52	Support Multi/InfiniteTalk (#10179 ) * re-init * Update model_multitalk.py * whitespace... * Update model_multitalk.py * remove print * this is redundant * remove import * Restore preview functionality * Move block_idx to transformer_options * Remove LoopingSamplerCustomAdvanced * Remove looping functionality, keep extension functionality * Update model_multitalk.py * Handle ref_attn_mask with separate patch to avoid having to always return q and k from self_attn * Chunk attention map calculation for multiple speakers to reduce peak VRAM usage * Update model_multitalk.py * Add ModelPatch type back * Fix for latest upstream * Use DynamicCombo for cleaner node Basically just so that single_speaker mode hides mask inputs and 2nd audio input * Update nodes_wan.py	2026-01-21 23:09:48 -05:00
Jukka Seppänen	245f6139b6	More targeted embedding_connector loading for LTX2 text encoder (#11992 ) Reduces errors	2026-01-21 23:05:06 -05:00
Jukka Seppänen	3365ad18a5	Support LTX2 tiny vae (taeltx_2) (#11929 )	2026-01-21 23:03:51 -05:00
comfyanonymous	abe2ec26a6	Support the Anima model. (#12012 )	2026-01-21 19:44:28 -05:00
Markury	0fc15700be	Add LyCoris LoKr MLP layer support for Flux2 (#11997 )	2026-01-20 23:18:33 -05:00
comfyanonymous	e755268e7b	Config for Qwen 3 0.6B model. (#11998 )	2026-01-20 23:08:31 -05:00
Mylo	c4a14df9a3	Dynamically detect chroma radiance patch size (#11991 )	2026-01-20 18:46:11 -05:00
Ivan Zorin	965d0ed509	fix: remove normalization of audio in LTX Mel spectrogram creation (#11990 ) For LTX Audio VAE, remove normalization of audio during MEL spectrogram creation. This aligs inference with training and prevents loud audio from being attenuated.	2026-01-20 18:44:28 -05:00
comfyanonymous	8ccc0c94fa	Make omni stuff work on regular z image for easier testing. (#11985 )	2026-01-20 00:32:00 -05:00
comfyanonymous	2108167f9f	Support zimage omni base model. (#11979 )	2026-01-19 23:17:38 -05:00
comfyanonymous	70c91b8248	Fix #11963 (#11982 )	2026-01-19 22:32:40 -05:00
rkfg	0da5a0fe58	Convert mono audio to fake stereo for LTXV VAE encoding (#11965 )	2026-01-19 22:12:02 -05:00
comfyanonymous	e0eacb0688	Simpler way to implement the #11980 loras. (#11981 )	2026-01-19 22:00:36 -05:00
comfyanonymous	7ac999bf30	Add image sizes to clip vision outputs. (#11923 )	2026-01-16 23:02:28 -05:00
comfyanonymous	4c816d5c69	Adjust memory usage factor calculation for flux2 klein. (#11900 )	2026-01-15 20:06:40 -05:00
comfyanonymous	3b832231bb	Flux2 Klein support. (#11890 )	2026-01-15 10:33:15 -05:00
Jukka Seppänen	be518db5a7	Remove extraneous clip missing warnings when loading LTX2 embeddings_connector weights (#11874 )	2026-01-14 17:54:04 -05:00
rattus	80441eb15e	utils: fix lanczos grayscale upscaling (#11873 )	2026-01-14 17:53:16 -05:00
comfyanonymous	6165c38cb5	Optimize nvfp4 lora applying. (#11866 ) This changes results a bit but it also speeds up things a lot.	2026-01-14 00:49:38 -05:00
Silver	712cca36a1	feat: throttle ProgressBar updates to reduce WebSocket flooding (#11504 )	2026-01-13 22:41:44 -05:00
comfyanonymous	eff2b9d412	Optimize nvfp4 lora applying. (#11856 )	2026-01-13 19:37:19 -05:00
comfyanonymous	15b312de7a	Optimize nvfp4 lora applying. (#11854 )	2026-01-13 19:23:58 -05:00
comfyanonymous	1dcbd9efaf	Bump ltxav mem estimation a bit. (#11842 )	2026-01-13 01:42:07 -05:00
comfyanonymous	117e7a5853	Refactor to try to lower mem usage. (#11840 )	2026-01-12 21:01:52 -08:00
comfyanonymous	b3c0e4de57	Make loras work on nvfp4 models. (#11837 ) The initial applying is a bit slow but will probably be sped up in the future.	2026-01-12 22:33:54 -05:00
Jukka Seppänen	fd5c0755af	Reduce LTX2 VRAM use by more efficient timestep embed handling (#11829 )	2026-01-12 17:28:59 -05:00
comfyanonymous	c881a1d689	Support the siglip 2 naflex model as a clip vision model. (#11831 ) Not useful yet.	2026-01-12 17:05:54 -05:00
kelseyee	a3b5d4996a	Support ModelScope-Trainer DiffSynth lora for Z Image. (#11805 )	2026-01-12 15:38:46 -05:00
comfyanonymous	2f642d5d9b	Fix chroma fp8 te being treated as fp16. (#11795 )	2026-01-10 14:40:42 -08:00
comfyanonymous	cd912963f1	Fix issue with t5 text encoder in fp4. (#11794 )	2026-01-10 17:31:31 -05:00
DELUXA	6e4b1f9d00	pythorch_attn_by_def_on_gfx1200 (#11793 )	2026-01-10 16:51:05 -05:00
comfyanonymous	dc202a2e51	Properly save mixed ops. (#11772 )	2026-01-10 02:03:57 -05:00
comfyanonymous	bd0e6825e8	Be less strict when loading mixed ops weights. (#11769 )	2026-01-09 14:21:06 -05:00
Jedrzej Kosinski	1dc3da6314	Add most basic Asset support for models (#11315 ) * Brought over minimal elements from PR 10045 to reproduce seed_assets and register_assets_system without adding anything to the DB or server routes yet, for now making everything sync (can introduce async once everything is cleaned up and brought over) * Added db script to insert assets stuff, cleaned up some code; assets (models) now get added/rescanned * Added support for 5 http endpoints for assets * Replaced Optional with \| None in schemas_in.py and schemas_out.py * Remove two routes that will not be relevant yet in this PR: HEAD /api/assets/hash/<hash> and PUT /api/assets/<id>/preview * Remove some functions the two deleted endpoints were using * Don't show assets scan message upon calling /object_info endpoint * removed unsued import to satisfy ruff * Simplified hashing function tpye hint and _hash_file_obj * Satisfied ruff	2026-01-08 22:21:51 -05:00
comfyanonymous	1a20656448	Fix import issue. (#11746 )	2026-01-08 17:23:59 -05:00
comfyanonymous	0f11869d55	Better detection if AMD torch compiled with efficient attention. (#11745 )	2026-01-08 17:16:58 -05:00
comfyanonymous	50d6e1caf4	Tweak ltxv vae mem estimation. (#11722 )	2026-01-07 23:07:05 -05:00
comfyanonymous	21e8425087	Add warning for old pytorch. (#11718 )	2026-01-07 21:07:26 -05:00
rattus	b6c79a648a	ops: Fix offloading with FP8MM performance (#11697 ) This logic was checking comfy_cast_weights, and going straight to to the forward_comfy_cast_weights implementation without attempting to downscale input to fp8 in the event comfy_cast_weights is set. The main reason comfy_cast_weights would be set would be for async offload, which is not a good reason to nix FP8MM. So instead, and together the underlying exclusions for FP8MM which are: * having a weight_function (usually LowVramPatch) * force_cast_weights (compute dtype override) * the weight is not Quantized * the input is already quantized * the model or layer has MM explictily disabled. If you get past all of those exclusions, quantize the input tensor. Then hand the new input, quantized or not off to forward_comfy_cast_weights to handle it. If the weight is offloaded but input is quantized you will get an offloaded MM8.	2026-01-07 21:01:16 -05:00
comfyanonymous	25bc1b5b57	Add memory estimation function to ltxav text encoder. (#11716 )	2026-01-07 20:11:22 -05:00
comfyanonymous	3cd19e99c1	Increase ltxav mem estimation by a bit. (#11715 )	2026-01-07 20:04:56 -05:00
comfyanonymous	34751fe9f9	Lower ltxv text encoder vram use. (#11713 )	2026-01-07 19:12:15 -05:00

1 2 3 4 5 ...

2161 Commits (master)