ComfyUI

Commit Graph

Author	SHA1	Message	Date
comfyanonymous	68d542cc06	Fix case where pixel space VAE could cause issues. (#13030 )	2026-03-17 20:46:22 -04:00
Jukka Seppänen	735a0465e5	Inplace VAE output processing to reduce peak RAM consumption. (#13028 )	2026-03-17 20:20:49 -04:00
rattus	035414ede4	Reduce WAN VAE VRAM, Save use cases for OOM/Tiler (#13014 ) * wan: vae: encoder: Add feature cache layer that corks singles If a downsample only gives you a single frame, save it to the feature cache and return nothing to the top level. This increases the efficiency of cacheability, but also prepares support for going two by two rather than four by four on the frames. * wan: remove all concatentation with the feature cache The loopers are now responsible for ensuring that non-final frames are processes at least two-by-two, elimiating the need for this cat case. * wan: vae: recurse and chunk for 2+2 frames on decode Avoid having to clone off slices of 4 frame chunks and reduce the size of the big 6 frame convolutions down to 4. Save the VRAMs. * wan: encode frames 2x2. Reduce VRAM usage greatly by encoding frames 2 at a time rather than 4. * wan: vae: remove cloning The loopers now control the chunking such there is noever more than 2 frames, so just cache these slices directly and avoid the clone allocations completely. * wan: vae: free consumer caller tensors on recursion * wan: vae: restyle a little to match LTX	2026-03-17 17:34:39 -04:00
rattus	1a157e1f97	Reduce LTX VAE VRAM usage and save use cases from OOMs/Tiler (#13013 ) * ltx: vae: scale the chunk size with the users VRAM Scale this linearly down for users with low VRAM. * ltx: vae: free non-chunking recursive intermediates * ltx: vae: cleanup some intermediates The conv layer can be the VRAM peak and it does a torch.cat. So cleanup the pieces of the cat. Also clear our the cache ASAP as each layer detect its end as this VAE surges in VRAM at the end due to the ended padding increasing the size of the final frame convolutions off-the-books to the chunker. So if all the earlier layers free up their cache it can offset that surge. Its a fragmentation nightmare, and the chance of it having to recache the pyt allocator is very high, but you wont OOM.	2026-03-17 17:32:43 -04:00
Paulo Muggler Moreira	8cc746a864	fix: disable SageAttention for Hunyuan3D v2.1 DiT (#12772 )	2026-03-16 22:27:27 -04:00
comfyanonymous	ca17fc8355	Fix potential issue. (#13009 )	2026-03-16 21:38:40 -04:00
Kohaku-Blueleaf	20561aa919	[Trainer] FP4, 8, 16 training by native dtype support and quant linear autograd function (#12681 )	2026-03-16 21:31:50 -04:00
comfyanonymous	7a16e8aa4e	Add --enable-dynamic-vram options to force enable it. (#13002 )	2026-03-16 16:50:13 -04:00
blepping	b202f842af	Skip running model finalizers at exit (#12994 )	2026-03-16 16:00:42 -04:00
lostdisc	3814bf4454	Enable Pytorch Attention for gfx1150 (#12973 )	2026-03-15 12:45:30 -07:00
rattus	e84a200a3c	ops: opt out of deferred weight init if subclassed (#12967 ) If a subclass BYO _load_from_state_dict and doesnt call the super() the needed default init of these weights is missed and can lead to problems for uninitialized weights.	2026-03-15 11:49:49 -07:00
Jukka Seppänen	0904cc3fe5	LTXV: Accumulate VAE decode results on intermediate_device (#12955 )	2026-03-14 18:09:09 -07:00
comfyanonymous	c711b8f437	Add --fp16-intermediates to use fp16 for intermediate values between nodes (#12953 ) This is an experimental WIP option that might not work in your workflow but should lower memory usage if it does. Currently only the VAE and the load image node will output in fp16 when this option is turned on.	2026-03-14 19:18:19 -04:00
Jukka Seppänen	1c5db7397d	feat: Support mxfp8 (#12907 )	2026-03-14 18:36:29 -04:00
rattus	7810f49702	comfy aimdo 0.2.11 + Improved RAM Pressure release strategies - Windows speedups (#12925 ) * Implement seek and read for pins Source pins from an mmap is pad because its its a CPU->CPU copy that attempts to fully buffer the same data twice. Instead, use seek and read which avoids the mmap buffering while usually being a faster read in the first place (avoiding mmap faulting etc). * pinned_memory: Use Aimdo pinner The aimdo pinner bypasses pytorches CPU allocator which can leak windows commit charge. * ops: bypass init() of weight for embedding layer This similarly consumes large commit charge especially for TEs. It can cause a permanement leaked commit charge which can destabilize on systems close to the commit ceiling and generally confuses the RAM stats. * model_patcher: implement pinned memory counter Implement a pinned memory counter for better accounting of what volume of memory pins have. * implement touch accounting Implement accounting of touching mmapped tensors. * mm+mp: add residency mmap getter * utils: use the aimdo mmap to load sft files * model_management: Implement tigher RAM pressure semantics Implement a pressure release on entire MMAPs as windows does perform faster when mmaps are unloaded and model loads free ramp into fully unallocated RAM. Make the concept of freeing for pins a completely separate concept. Now that pins are loadable directly from original file and don' touch the mmap, tighten the freeing budget to just the current loaded model - what you have left over. This still over-frees pins, but its a lot better than before. So after the pins are freed with that algorithm, bounce entire MMAPs to free RAM based on what the model needs, deducting off any known resident-in-mmap tensors to the free quota to keep it as tight as possible. * comfy-aimdo 0.2.11 Comfy aimdo 0.2.11 * mm: Implement file_slice path for QT * ruff * ops: put meta-tensors in place to allow custom nodes to check geo	2026-03-13 22:18:08 -04:00
Terry Jia	3fa8c5686d	fix: use frontend-compatible format for Float gradient_stops (#12789 ) Co-authored-by: guill <jacob.e.segal@gmail.com> Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>	2026-03-12 10:14:28 -07:00
comfyanonymous	44f1246c89	Support flux 2 klein kv cache model: Use the FluxKVCache node. (#12905 )	2026-03-12 11:30:50 -04:00
comfyanonymous	f6274c06b4	Fix issue with batch_size > 1 on some models. (#12892 )	2026-03-11 16:37:31 -04:00
Adi Borochov	4f4f8659c2	fix: guard torch.AcceleratorError for compatibility with torch < 2.8.0 (#12874 ) * fix: guard torch.AcceleratorError for compatibility with torch < 2.8.0 torch.AcceleratorError was introduced in PyTorch 2.8.0. Accessing it directly raises AttributeError on older versions. Use a try/except fallback at module load time, consistent with the existing pattern used for OOM_EXCEPTION. * fix: address review feedback for AcceleratorError compat - Fall back to RuntimeError instead of type(None) for ACCELERATOR_ERROR, consistent with OOM_EXCEPTION fallback pattern and valid for except clauses - Add "out of memory" message introspection for RuntimeError fallback case - Use RuntimeError directly in discard_cuda_async_error except clause ---------	2026-03-11 10:04:13 -07:00
comfyanonymous	9642e4407b	Add pre attention and post input patches to qwen image model. (#12879 )	2026-03-11 00:09:35 -04:00
comfyanonymous	3ad36d6be6	Allow model patches to have a cleanup function. (#12878 ) The function gets called after sampling is finished.	2026-03-10 20:09:12 -04:00
rattus	535c16ce6e	Widen OOM_EXCEPTION to AcceleratorError form (#12835 ) Pytorch only filters for OOMs in its own allocators however there are paths that can OOM on allocators made outside the pytorch allocators. These manifest as an AllocatorError as pytorch does not have universal error translation to its OOM type on exception. Handle it. A log I have for this also shows a double report of the error async, so call the async discarder to cleanup and make these OOMs look like OOMs.	2026-03-10 00:41:02 -04:00
rattus	a912809c25	model_detection: deep clone pre edited edited weights (#12862 ) Deep clone these weights as needed to avoid segfaulting when it tries to touch the original mmap.	2026-03-09 23:50:10 -04:00
comfyanonymous	c4fb0271cd	Add a way for nodes to add pre attn patches to flux model. (#12861 )	2026-03-09 23:37:58 -04:00
Jukka Seppänen	06f85e2c79	Fix text encoder lora loading for wrapped models (#12852 )	2026-03-09 16:08:51 -04:00
Luke Mino-Altherr	29b24cb517	refactor(assets): modular architecture + async two-phase scanner & background seeder (#12621 )	2026-03-07 20:37:25 -05:00
rattus	bcf1a1fab1	mm: reset_cast_buffers: sync compute stream before free (#12822 ) Sync the compute stream before freeing the cast buffers. This can cause use after free issues when the cast stream frees the buffer while the compute stream is behind enough to still needs a casted weight.	2026-03-07 09:38:08 -08:00
comfyanonymous	d69d30819b	Don't run TE on cpu when dynamic vram enabled. (#12815 )	2026-03-06 19:11:16 -05:00
rattus	f466b06601	Fix fp16 audio encoder models (#12811 ) * mp: respect model_defined_dtypes in default caster This is needed for parametrizations when the dtype changes between sd and model. * audio_encoders: archive model dtypes Archive model dtypes to stop the state dict load override the dtypes defined by the core for compute etc.	2026-03-06 18:20:07 -05:00
comfyanonymous	17b43c2b87	LTX audio vae novram fixes. (#12796 )	2026-03-05 16:31:28 -05:00
Jukka Seppänen	8befce5c7b	Add manual cast to LTX2 vocoder conv_transpose1d (#12795 ) * Add manual cast to LTX2 vocoder * Update vocoder.py	2026-03-05 12:37:25 -08:00
comfyanonymous	1c3b651c0a	Refactor. (#12794 )	2026-03-05 13:35:56 -05:00
rattus	42e0e023ee	ops: Handle CPU weight in VBAR caster (#12792 ) This shouldn't happen but custom nodes gets there. Handle it as best we can.	2026-03-05 10:22:17 -08:00
comfyanonymous	4941671b5a	Fix cuda getting initialized in cpu mode. (#12779 )	2026-03-05 02:39:51 -05:00
comfyanonymous	f2ee7f2d36	Fix cublas ops on dynamic vram. (#12776 )	2026-03-05 01:21:55 -05:00
comfyanonymous	43c64b6308	Support the LTXAV 2.3 model. (#12773 )	2026-03-04 20:06:20 -05:00
comfyanonymous	ac4a943ff3	Initial load device should be cpu when using dynamic vram. (#12766 )	2026-03-04 16:33:14 -05:00
Jukka Seppänen	0a7446ade4	Pass tokens when loading text gen model for text generation (#12755 ) Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>	2026-03-04 08:59:56 -08:00
rattus	9b85cf9558	Comfy Aimdo 0.2.5 + Fix offload performance in DynamicVram (#12754 ) * ops: dont unpin nothing This was calling into aimdo in the none case (offloaded weight). Whats worse, is aimdo syncs for unpinning an offloaded weight, as that is the corner case of a weight getting evicted by its own use which does require a sync. But this was heppening every offloaded weight causing slowdown. * mp: fix get_free_memory policy The ModelPatcherDynamic get_free_memory was deducting the model from to try and estimate the conceptual free memory with doing any offloading. This is kind of what the old memory_memory_required was estimating in ModelPatcher load logic, however in practical reality, between over-estimates and padding, the loader usually underloaded models enough such that sampling could send CFG +/- through together even when partially loaded. So don't regress from the status quo and instead go all in on the idea that offloading is less of an issue than debatching. Tell the sampler it can use everything.	2026-03-04 07:49:13 -08:00
rattus	d531e3fb2a	model_patcher: Improve dynamic offload heuristic (#12759 ) Define a threshold below which a weight loading takes priority. This actually makes the offload consistent with non-dynamic, because what happens, is when non-dynamic fills ints to_load list, it will fill-up any left-over pieces that could fix large weights with small weights and load them, even though they were lower priority. This actually improves performance because the timy weights dont cost any VRAM and arent worth the control overhead of the DMA etc.	2026-03-04 07:47:44 -08:00
rattus	ac6513e142	DynamicVram: Add casting / fix torch Buffer weights (#12749 ) * respect model dtype in non-comfy caster * utils: factor out parent and name functionality of set_attr * utils: implement set_attr_buffer for torch buffers * ModelPatcherDynamic: Implement torch Buffer loading If there is a buffer in dynamic - force load it.	2026-03-03 18:19:40 -08:00
comfyanonymous	f719a9d928	Adjust memory usage factor of zeta model. (#12746 )	2026-03-03 17:35:22 -05:00
rattus	09bcbddfcf	ModelPatcherDynamic: Force load all non-comfy weights (#12739 ) * model_management: Remove non-comfy dynamic _v caster * Force pre-load non-comfy weights to GPU in ModelPatcherDynamic Non-comfy weights may expect to be pre-cast to the target device without in-model casting. Previously they were allocated in the vbar with _v which required the _v fault path in cast_to. Instead, back up the original CPU weight and move it directly to GPU at load time.	2026-03-03 08:50:33 -08:00
Lodestone	9ebee0a217	Feat: z-image pixel space (model still training atm) (#12709 ) * draft zeta (z-image pixel space) * revert gitignore * model loaded and able to run however vector direction still wrong tho * flip the vector direction to original again this time * Move wrongly positioned Z image pixel space class * inherit Radiance LatentFormat class * Fix parameters in classes for Zeta x0 dino * remove arbitrary nn.init instances * Remove unused import of lru_cache --------- Co-authored-by: silveroxides <ishimarukaito@gmail.com>	2026-03-02 19:43:47 -05:00
comfyanonymous	57dd6c1aad	Support loading zeta chroma weights properly. (#12734 )	2026-03-02 18:54:18 -05:00
rattus	dfbf99a061	model_mangament: make dynamic --disable-smart-memory work (#12724 ) This was previously considering the pool of dynamic models as one giant entity for the sake of smart memory, but that isnt really the useful or what a user would reasonably expect. Make Dynamic VRAM properly purge its models just like the old --disable-smart-memory but conditioning the dynamic-for-dynamic bypass on smart memory. Re-enable dynamic smart memory.	2026-03-01 19:18:56 -08:00
comfyanonymous	602f6bd82c	Make --disable-smart-memory disable dynamic vram. (#12722 )	2026-03-01 15:28:39 -05:00
drozbay	4d79f4f028	fix: handle substep sigmas in context window set_step (#12719 ) Multi-step samplers (eg. dpmpp_2s_ancestral) call the model at intermediate sigma values not present in the schedule. This caused set_step to crash with "No sample_sigmas matched current timestep" when context windows were enabled. The fix is to keep self._step from the last exact match when a substep sigma is encountered, since substeps are still logically part of their parent step and should use the same context windows. Co-authored-by: ozbayb <17261091+ozbayb@users.noreply.github.com>	2026-03-01 09:38:30 -08:00
comfyanonymous	1080bd442a	Disable dynamic vram on wsl. (#12706 )	2026-02-28 22:23:28 -05:00
rattus	48bb0bd18a	cli_args: Default comfy to DynamicVram mode (#12658 )	2026-02-28 16:52:30 -05:00
rattus	5f41584e96	Disable dynamic_vram when weight hooks applied (#12653 ) * sd: add support for clip model reconstruction * nodes: SetClipHooks: Demote the dynamic model patcher * mp: Make dynamic_disable more robust The backup need to not be cloned. In addition add a delegate object to ModelPatcherDynamic so that non-cloning code can do ModelPatcherDynamic demotion * sampler_helpers: Demote to non-dynamic model patcher when hooking * code rabbit review comments	2026-02-28 16:50:18 -05:00
Jukka Seppänen	1f6744162f	feat: Support SCAIL WanVideo model (#12614 )	2026-02-28 16:49:12 -05:00
fappaz	95e1059661	fix(ace15): handle missing lm_metadata in memory estimation during checkpoint export #12669 (#12686 )	2026-02-28 01:18:40 -05:00
Talmaj	ac4412d0fa	Native LongCat-Image implementation (#12597 )	2026-02-27 23:04:34 -05:00
rattus	e721e24136	ops: implement lora requanting for non QuantizedTensor fp8 (#12668 ) Allow non QuantizedTensor layer to set want_requant to get the post lora calculation stochastic cast down to the original input dtype. This is then used by the legacy fp8 Linear implementation to set the compute_dtype to the preferred lora dtype but then want_requant it back down to fp8. This fixes the issue with --fast fp8_matrix_mult is combined with --fast dynamic_vram which doing a lora on an fp8_ non QT model.	2026-02-27 19:05:51 -05:00
Reiner "Tiles" Prokein	25ec3d96a3	Class WanVAE, def encode, feat_map is using self.decoder instead of self.encoder (#12682 )	2026-02-27 19:03:45 -05:00
vickytsang	35e9fce775	Enable Pytorch Attention for gfx950 (#12641 )	2026-02-26 20:16:12 -05:00
Jukka Seppänen	c7f7d52b68	feat: Support SDPose-OOD (#12661 )	2026-02-26 19:59:05 -05:00
fappaz	b233dbe0bc	feat(ace-step): add ACE-Step 1.5 lycoris key alias mapping for LoKR #12638 (#12665 )	2026-02-26 18:19:19 -05:00
comfyanonymous	8a4d85c708	Cleanups to the last PR. (#12646 )	2026-02-26 01:30:31 -05:00
Tavi Halperin	a4522017c5	feat: per-guide attention strength control in self-attention (#12518 ) Implements per-guide attention attenuation via log-space additive bias in self-attention. Each guide reference tracks its own strength and optional spatial mask in conditioning metadata (guide_attention_entries).	2026-02-26 01:25:23 -05:00
Jukka Seppänen	907e5dcbbf	initial FlowRVS support (#12637 )	2026-02-25 23:38:46 -05:00
comfyanonymous	7253531670	Fix ltxav te mem estimation. (#12643 )	2026-02-25 23:13:47 -05:00
comfyanonymous	e14b04478c	Fix LTXAV text enc min length. (#12640 ) Should have been 1024 instead of 512	2026-02-25 22:36:02 -05:00
rattus	4f5b7dbf1f	Fix Aimdo fallback on probe to not use zero-copy sft (#12634 ) * utils: dont use comfy sft loader in aimdo fallback This was going to the raw command line switch and should respect main.py probe of whether aimdo actually loaded successfully. * ops: dont use deferred linear load in Aimdo fallback Avoid changes of behaviour on --fast dynamic_vram when aimdo doesnt work.	2026-02-25 16:49:48 -05:00
rattus	3ebe1ac22e	Disable dynamic_vram when using torch compiler (#12612 ) * mp: attach re-construction arguments to model patcher When making a model-patcher from a unet or ckpt, attach a callable function that can be called to replay the model construction. This can be used to deep clone model patcher WRT the actual model. Originally written by Kosinkadink `f4b99bc623` * mp: Add disable_dynamic clone argument Add a clone argument that lets a caller clone a ModelPatcher but disable dynamic to demote the clone to regular MP. This is useful for legacy features where dynamic_vram support is missing or TBD. * torch_compile: disable dynamic_vram This is a bigger feature. Disable for the interim to preserve functionality.	2026-02-24 19:13:46 -05:00
comfyanonymous	599f9c5010	Don't crash right away if op is uninitialized. (#12615 )	2026-02-24 12:28:25 -05:00
comfyanonymous	84aba95e03	Temporality unbreak some LTXAV workflows to give people time to migrate. (#12605 )	2026-02-24 00:50:03 -05:00
comfyanonymous	caa43d2395	Fix issue loading fp8 ltxav checkpoints. (#12582 )	2026-02-22 16:00:02 -05:00
comfyanonymous	07ca6852e8	Fix dtype issue in embeddings connector. (#12570 )	2026-02-22 03:18:20 -05:00
comfyanonymous	f266b8d352	Move LTXAV av embedding connectors to diffusion model. (#12569 )	2026-02-21 22:29:58 -05:00
rattus	0bfb936ab4	comfy-aimdo 0.2 - Improved pytorch allocator integration (#12557 ) Integrate comfy-aimdo 0.2 which takes a different approach to installing the memory allocator hook. Instead of using the complicated and buggy pytorch MemPool+CudaPluggableAlloctor, cuda is directly hooked making the process much more transparent to both comfy and pytorch. As far as pytorch knows, aimdo doesnt exist anymore, and just operates behind the scenes. Remove all the mempool setup stuff for dynamic_vram and bump the comfy-aimdo version. Remove the allocator object from memory_management and demote its use as an enablment check to a boolean flag. Comfy-aimdo 0.2 also support the pytorch cuda async allocator, so remove the dynamic_vram based force disablement of cuda_malloc and just go back to the old settings of allocators based on command line input.	2026-02-21 10:52:57 -08:00
Terry Jia	f394af8d0f	feat: add gradient-slider display mode for FLOAT inputs (#12536 ) * feat: add gradient-slider display mode for FLOAT inputs * fix: use precise type annotation list[list[float]] for gradient_stops Amp-Thread-ID: https://ampcode.com/threads/T-019c7eea-be2b-72ce-a51f-838376f9b7a7 --------- Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com> Co-authored-by: bymyself <cbyrne@comfy.org>	2026-02-20 22:52:32 -08:00
comfyanonymous	5f2117528a	Force min length 1 when tokenizing for text generation. (#12538 )	2026-02-19 22:57:44 -05:00
comfyanonymous	0301ccf745	Small cleanup and try to get qwen 3 work with the text gen. (#12537 )	2026-02-19 22:42:28 -05:00
Jukka Seppänen	6d11cc7354	feat: Add basic text generation support with native models, initially supporting Gemma3 (#12392 )	2026-02-18 20:49:43 -05:00
rattus	58dcc97dcf	ops: limit return of requants (#12506 ) This check was far too broad and the dtype is not a reliable indicator of wanting the requant (as QT returns the compute dtype as the dtype). So explictly plumb whether fp8mm wants the requant or not.	2026-02-17 15:32:27 -05:00
chaObserv	44f8598521	Fix anima LLM adapter forward when manual cast (#12504 )	2026-02-17 07:56:44 -08:00
comfyanonymous	c39653163d	Fix anima preprocess text embeds not using right inference dtype. (#12501 )	2026-02-17 00:29:20 -05:00
comfyanonymous	18927538a1	Implement NAG on all the models based on the Flux code. (#12500 ) Use the Normalized Attention Guidance node. Flux, Flux2, Klein, Chroma, Chroma radiance, Hunyuan Video, etc..	2026-02-16 23:30:34 -05:00
comfyanonymous	4454fab7f0	Remove code to support RMSNorm on old pytorch. (#12499 )	2026-02-16 20:09:24 -05:00
comfyanonymous	88e6370527	Remove workaround for old pytorch. (#12480 )	2026-02-15 20:43:53 -05:00
rattus	c0370044cd	MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash) (#12446 ) * lora: add weight shape calculations. This lets the loader know if a lora will change the shape of a weight so it can take appropriate action. * MPDynamic: force load flux img_in weight This weight is a bit special, in that the lora changes its geometry. This is rather unique, not handled by existing estimate and doesn't work for either offloading or dynamic_vram. Fix for dynamic_vram as a special case. Ideally we can fully precalculate these lora geometry changes at load time, but just get these models working first.	2026-02-15 20:30:09 -05:00
comfyanonymous	e1ede29d82	Remove unsafe pickle loading code that was used on pytorch older than 2.4 (#12473 ) ComfyUI hasn't started on pytorch 2.4 since last month.	2026-02-14 22:53:52 -05:00
krigeta	dc9822b7df	Add working Qwen 2512 ControlNet (Fun ControlNet) support (#12359 )	2026-02-13 22:23:52 -05:00
comfyanonymous	712efb466b	Add left padding to LTXAV text encoder. (#12456 )	2026-02-13 21:56:54 -05:00
comfyanonymous	726af73867	Fix some custom nodes. (#12455 )	2026-02-13 20:21:10 -05:00
comfyanonymous	831351a29e	Support generating attention masks for left padded text encoders. (#12454 )	2026-02-13 20:15:23 -05:00
comfyanonymous	e1add563f9	Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432 )	2026-02-13 15:35:13 -05:00
rattus	8902907d7a	dynamic_vram: Training fixes (#12442 )	2026-02-13 15:29:37 -05:00
rattus	ae79e33345	llama: use a more efficient rope implementation (#12434 ) Get rid of the cat and unary negation and inplace add-cmul the two halves of the rope. Precompute -sin once at the start of the model rather than every transformer block. This is slightly faster on both GPU and CPU bound setups.	2026-02-12 19:56:42 -05:00
rattus	117e214354	ModelPatcherDynamic: force load non leaf weights (#12433 ) The current behaviour of the default ModelPatcher is to .to a model only if its fully loaded, which is how random non-leaf weights get loaded in non-LowVRAM conditions. The however means they never get loaded in dynamic_vram. In the dynamic_vram case, force load them to the GPU.	2026-02-12 19:51:50 -05:00
askmyteapot	e5ae670a40	Update ace15.py to allow min_p sampling (#12373 )	2026-02-11 20:28:48 -05:00
rattus	3fe61cedda	model_patcher: guard against none model_dtype (#12410 ) Handle the case where the _model_dtype exists but is none with the intended fallback.	2026-02-11 14:54:02 -05:00
rattus	2a4328d639	ace15: Use dynamic_vram friendly trange (#12409 ) Factor out the ksampler trange and use it in ACE LLM to prevent the silent stall at 0 and rate distortion due to first-step model load.	2026-02-11 14:53:42 -05:00
rattus	d297a749a2	dynamic_vram: Fix windows Aimdo crash + Fix LLM performance (#12408 ) * model_management: lazy-cache aimdo_tensor These tensors cosntructed from aimdo-allocations are CPU expensive to make on the pytorch side. Add a cache version that will be valid with signature match to fast path past whatever torch is doing. * dynamic_vram: Minimize fast path CPU work Move as much as possible inside the not resident if block and cache the formed weight and bias rather than the flat intermediates. In extreme layer weight rates this adds up.	2026-02-11 14:50:16 -05:00
comfyanonymous	76a7fa96db	Make built in lora training work on anima. (#12402 )	2026-02-10 22:04:32 -05:00
Kohaku-Blueleaf	cdcf4119b3	[Trainer] training with proper offloading (#12189 ) * Fix bypass dtype/device moving * Force offloading mode for training * training context var * offloading implementation in training node * fix wrong input type * Support bypass load lora model, correct adapter/offloading handling	2026-02-10 21:45:19 -05:00
rattus	123a7874a9	ops: Fix vanilla-fp8 loaded lora quality (#12390 ) This was missing the stochastic rounding required for fp8 downcast to be consistent with model_patcher.patch_weight_to_device. Missed in testing as I spend too much time with quantized tensors and overlooked the simpler ones.	2026-02-10 13:38:28 -05:00
rattus	f719f9c062	sd: delay VAE dtype archive until after override (#12388 ) VAEs have host specific dtype logic that should override the dynamic _model_dtype. Defer the archiving of model dtypes until after.	2026-02-10 13:37:46 -05:00

1 2 3 4 5 ...

2161 Commits (master)