Commit Graph

233 Commits (0e0b607cfaabf59167e3ab2a44d491b4e4e1b08e)

Author SHA1 Message Date
Disty0 3e52009a4f SDNQ assert Triton for quantized matmul 2025-11-29 00:54:19 +03:00
Disty0 aaef4992c3 SDNQ fix svd + fp8 tw and fp16 mm 2025-11-28 22:31:09 +03:00
Disty0 a46f32b354 pull sdnq version from .common 2025-11-28 01:10:05 +03:00
Disty0 55cf627ac6 add version to sdnq 2025-11-28 00:45:24 +03:00
Disty0 368eb3103a cleanup 2025-11-27 18:40:15 +03:00
Disty0 73e4d1e379 Pass torch_dtype to sdnq loader 2025-11-27 18:37:35 +03:00
Disty0 7b2a8e3f87 cleanup 2025-11-27 18:26:14 +03:00
Disty0 ff4c254930 Auto handle tied weights with new transformers 2025-11-27 18:24:55 +03:00
CalamitousFelicitousness 9dd537072c Fix import path for SDNQ options and handle Qwen models in load_sdnq_model 2025-11-27 14:53:03 +00:00
Disty0 131c51918b SDNQ fix model_ oader 2025-11-27 14:51:45 +03:00
Disty0 ed6f977218 SDNQ fix z_image matmul 2025-11-27 14:19:29 +03:00
Disty0 16c429711c update lumina and z_image keys 2025-11-26 23:22:44 +03:00
Disty0 679060bd00 SDNQ add lumina and z_image keys 2025-11-26 22:51:15 +03:00
Disty0 48b5d56ba4 Enable or disable quantized matmul on pre-quant models 2025-11-26 21:08:15 +03:00
Disty0 70b96daa63 cleanup 2025-11-25 23:02:01 +03:00
Disty0 da0df35106 fix typo 2025-11-25 21:58:53 +03:00
Disty0 da3c439059 SDNQ fix _tied_weights_keys is dict case 2025-11-25 19:37:46 +03:00
Disty0 aeb71d172e SDNQ add Flux2Transformer2DModel keys 2025-11-25 19:22:02 +03:00
vladmandic 9658a330b2 lint
Signed-off-by: vladmandic <mandic00@live.com>
2025-11-23 13:29:03 -05:00
Disty0 41ef28bb78 SDNQ don't divide group_size 2025-11-22 16:44:13 +03:00
Disty0 25d05b1445 SDNQ catch all exceptions on triton import 2025-11-22 14:48:55 +03:00
Disty0 4e4f49b38d update sdnq loader 2025-11-22 03:45:27 +03:00
Disty0 b6e9332cfe SDNQ de-couple matmul dtype and add fp16 matmul 2025-11-22 02:16:20 +03:00
Disty0 5308630b3a SDNQ use dequantize_fp32 with uint16 + torch_dtype = fp16 2025-11-18 23:53:27 +03:00
Disty0 49cd85d388 SDNQ add training related changes 2025-11-18 22:46:14 +03:00
Disty0 3fbfae5963 cleanup 2025-11-18 02:37:10 +03:00
Disty0 1745ed53f8 Refactor SDNQDequantizer 2025-11-18 01:42:58 +03:00
Disty0 3a4d7795d8 SDNQ fix weights_dtype getting overwritten on post load quant 2025-11-14 16:51:10 +03:00
Disty0 6f33ec3357 SDNQ use the model quant params instead of user settings on Lora 2025-11-10 00:12:38 +03:00
Disty0 0e8429dbd8 Cleanup 2025-11-07 18:49:29 +03:00
Disty0 93f28f07ac Make SDNQ not depended on quantization_config.json and fix invalid quantization_config getting attached to the model on load 2025-11-07 18:11:21 +03:00
Disty0 a4378a79e4 fix typo 2025-11-04 14:30:52 +03:00
Disty0 8ad53ed4b3 SDNQ update keys 2025-11-04 14:29:44 +03:00
Disty0 76d699dc09 SDNQ add common keys 2025-10-31 00:21:54 +03:00
Disty0 da3d183f96 add Emu3ForCausalLM keys 2025-10-30 23:44:05 +03:00
Disty0 b9435257c4 SDNQ add chrono keys 2025-10-30 23:33:38 +03:00
Vladimir Mandic d43091f1fa lint set minimum to py310 and update rules
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-29 11:28:09 -04:00
Disty0 7bcc5fa29c SDNQ add HunyuanImage3ForCausalMM keys 2025-10-29 13:36:31 +03:00
Disty0 6c937c2747 Fix transformers using all the ram 2025-10-29 13:09:03 +03:00
Vladimir Mandic bc775f0530 add wan asymmetric vae upscaler
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-28 13:55:46 -04:00
Disty0 e6af602c0d handle files = str case 2025-10-27 21:40:18 +03:00
Disty0 a830c0a7e0 cleanup 2025-10-27 21:32:52 +03:00
Vladimir Mandic 5ab9a5a15d add sota model loader: runai streamer
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-27 14:20:10 -04:00
Disty0 2104bf8bb0 sdnq add wan keys 2025-10-25 15:34:14 +03:00
Disty0 b627617d14 SDNQ fix enable matmul after load 2025-10-19 17:25:02 +03:00
Disty0 f05c29175e cleanup 2025-10-19 02:09:25 +03:00
Disty0 758b006104 cleanup 2025-10-19 02:00:16 +03:00
Disty0 ef72edf18f SDNQ improve svd and low bit matmul perf 2025-10-19 00:06:07 +03:00
Disty0 f12caf81f9 SDNQ skip bad layers on svd and fix svd with dequantize_fp32 2025-10-17 17:25:50 +03:00
Vladimir Mandic 4f336d3aab linting
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-16 19:39:05 -04:00
Disty0 2cf9938d97 SDNQ fix sdxl unet quant config not getting saved 2025-10-17 00:08:17 +03:00
Disty0 63aad89676 remove the unused state_dict arg 2025-10-16 16:29:23 +03:00
Vladimir Mandic 070edb20b0 update transformers and fix quant params
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-16 09:21:20 -04:00
Disty0 845869079d Fix sdnq unset config 2025-10-14 17:58:09 +03:00
Disty0 4aee524ddf SDNQ add NaDiT keys 2025-10-14 17:18:58 +03:00
Disty0 b601f0d402 SDNQ expose svd_steps and update module skip keys 2025-10-14 00:15:09 +03:00
Disty0 d4d24214b3 SDNQ use a better way of loading pre quants and cleanup 2025-10-13 14:06:13 +03:00
Vladimir Mandic 2e4e741d47 seedvt2
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-12 15:35:08 -04:00
Disty0 a376f89fd6 Add type checking to SDNQConfig 2025-10-12 01:02:47 +03:00
Disty0 9206d9443e SDNQ add dequantize model 2025-10-12 00:00:53 +03:00
Disty0 9a8ba0fc90 SDNQ unset device specific configs on save 2025-10-11 19:24:09 +03:00
Disty0 f7286c90d5 SDNQ add native pre-quant loader support to from_pretrained 2025-10-11 16:19:11 +03:00
Disty0 6bc83bc296 Prevent accelerate from splitting Linear and Conv layers and causing device mismatch errors 2025-10-11 03:19:30 +03:00
Disty0 0f785880ee SDNQ fix a singular bias not getting offloaded 2025-10-11 02:38:49 +03:00
Disty0 c7aba8589b SDNQ fix Qwen loading 2025-10-11 00:05:09 +03:00
Disty0 2a3deaa064 Check T5 keys before override 2025-10-09 22:46:27 +03:00
Disty0 6995d8c3c6 SDNQ fix T5 loading 2025-10-09 22:42:20 +03:00
Disty0 612df3abbb cleanup 2025-10-09 20:09:34 +03:00
Disty0 a9de8ef152 cleanup 2025-10-09 19:58:57 +03:00
Disty0 e19fb2d833 SDNQ keep the quant configs inside the module subfolder, add dtype cast and don't send to GPU 2025-10-09 19:34:48 +03:00
Vladimir Mandic 70defe6d06 handle load shards
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-09 11:29:36 -04:00
Vladimir Mandic 6907fcd320 speedup prequant model load
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-08 13:47:36 -04:00
Disty0 35277a79d3 cleanup x3 2025-10-08 01:21:11 +03:00
Disty0 9c16e2234a cleanup 2025-10-08 01:18:12 +03:00
Disty0 25303bb182 cleanup 2025-10-08 01:16:25 +03:00
Disty0 bdcd07f713 Add add_module_skip_keys to pre-load quant too 2025-10-08 01:11:40 +03:00
Disty0 7fdf400e8b cleanup 2025-10-08 00:41:04 +03:00
Disty0 df03ea9ba8 SDNQ add sdnq_post_load_quant and update Qwen keys 2025-10-08 00:29:36 +03:00
Vladimir Mandic 962cb7115d infra for full-model load/save with quant
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-07 14:30:45 -04:00
Vladimir Mandic 7fdc880a73 sdnq patches
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-07 09:43:34 -04:00
Disty0 1cd7b6d63a fix upcast scale check 2025-10-07 01:27:54 +03:00
Disty0 aa0c10440f SDNQ make the loader don't touch the model options by default 2025-10-07 00:15:23 +03:00
Disty0 5306376b2a improve contiguous mm performance 2025-10-06 19:05:46 +03:00
Disty0 be91bbff75 SDNQ add SVD support for Convs 2025-10-06 18:26:42 +03:00
Disty0 c931bf9efa SDNQ add dtype casting to loader 2025-10-06 17:44:52 +03:00
Disty0 5c042c5fb8 cleanup 2025-10-06 11:30:26 +03:00
Vladimir Mandic a315a004e9 linting
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-05 20:25:33 -04:00
Disty0 23f2deaa58 fix enable_quantized_mamtul 2025-10-06 02:04:28 +03:00
Disty0 1f81a37e8e Set the default svd rank to 32 2025-10-06 01:27:29 +03:00
Disty0 ebb26ac123 SDNQ make load file name configurable 2025-10-06 01:04:00 +03:00
Disty0 0acb571472 SDNQ ass load and save model funcs 2025-10-06 00:57:23 +03:00
Disty0 9e52d0c1fb SDNQ add SVDQuant quantization method 2025-10-05 22:50:30 +03:00
Disty0 428600613a SDNQ fix new transformers again 2025-10-05 15:30:15 +03:00
Disty0 a164f3e0c2 SDNQ Improve UINT3 and below quant speed 2025-10-05 03:12:05 +03:00
Disty0 f2e12a682f SDNQ remove use_contiguous_mm path in re_quant 2025-10-04 19:17:05 +03:00
Disty0 df142afe81 don't use triton mm for nvidia 2025-10-04 18:48:03 +03:00
Disty0 5c5d7d5a86 cleanup 2025-10-04 18:38:18 +03:00
Disty0 99113947bf SDNQ add RDNA2 INT8 support via Triton 2025-10-04 18:31:25 +03:00
Disty0 95a7da7e75 SDNQ use non-contiguous re-quantize 2025-10-03 18:54:58 +03:00
Disty0 54acf1760b Make SDNQ scales compatible with balanced offload 2025-10-03 18:13:55 +03:00