Commit Graph

49 Commits (3a65d561a70f60d2c67f607d2b00a944c7c427ed)

Author SHA1 Message Date
Disty0 d4e2cbb826 SDNQ fix torch.compile always being active 2025-12-08 18:15:08 +03:00
Disty0 064b64c76c cleanup 2025-12-08 01:14:19 +03:00
Disty0 6e05a12a49 SDNQ post process pre-quants after load 2025-12-08 01:08:53 +03:00
vladmandic 0ad40d2b8b lint
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-02 12:25:04 +01:00
Disty0 d9bc31e7da Cleanup 2025-11-29 01:46:04 +03:00
Disty0 01a0f6b356 Warn and disable quantized matmul if triton is not available 2025-11-29 01:34:54 +03:00
Disty0 55cf627ac6 add version to sdnq 2025-11-28 00:45:24 +03:00
Disty0 73e4d1e379 Pass torch_dtype to sdnq loader 2025-11-27 18:37:35 +03:00
Disty0 7b2a8e3f87 cleanup 2025-11-27 18:26:14 +03:00
Disty0 ff4c254930 Auto handle tied weights with new transformers 2025-11-27 18:24:55 +03:00
CalamitousFelicitousness 9dd537072c Fix import path for SDNQ options and handle Qwen models in load_sdnq_model 2025-11-27 14:53:03 +00:00
Disty0 131c51918b SDNQ fix model_ oader 2025-11-27 14:51:45 +03:00
Disty0 ed6f977218 SDNQ fix z_image matmul 2025-11-27 14:19:29 +03:00
Disty0 48b5d56ba4 Enable or disable quantized matmul on pre-quant models 2025-11-26 21:08:15 +03:00
Disty0 da0df35106 fix typo 2025-11-25 21:58:53 +03:00
Disty0 4e4f49b38d update sdnq loader 2025-11-22 03:45:27 +03:00
Disty0 b6e9332cfe SDNQ de-couple matmul dtype and add fp16 matmul 2025-11-22 02:16:20 +03:00
Disty0 1745ed53f8 Refactor SDNQDequantizer 2025-11-18 01:42:58 +03:00
Disty0 0e8429dbd8 Cleanup 2025-11-07 18:49:29 +03:00
Disty0 93f28f07ac Make SDNQ not depended on quantization_config.json and fix invalid quantization_config getting attached to the model on load 2025-11-07 18:11:21 +03:00
Vladimir Mandic 5ab9a5a15d add sota model loader: runai streamer
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-27 14:20:10 -04:00
Disty0 b627617d14 SDNQ fix enable matmul after load 2025-10-19 17:25:02 +03:00
Disty0 758b006104 cleanup 2025-10-19 02:00:16 +03:00
Disty0 ef72edf18f SDNQ improve svd and low bit matmul perf 2025-10-19 00:06:07 +03:00
Disty0 845869079d Fix sdnq unset config 2025-10-14 17:58:09 +03:00
Disty0 b601f0d402 SDNQ expose svd_steps and update module skip keys 2025-10-14 00:15:09 +03:00
Disty0 9a8ba0fc90 SDNQ unset device specific configs on save 2025-10-11 19:24:09 +03:00
Disty0 c7aba8589b SDNQ fix Qwen loading 2025-10-11 00:05:09 +03:00
Disty0 2a3deaa064 Check T5 keys before override 2025-10-09 22:46:27 +03:00
Disty0 6995d8c3c6 SDNQ fix T5 loading 2025-10-09 22:42:20 +03:00
Disty0 612df3abbb cleanup 2025-10-09 20:09:34 +03:00
Disty0 a9de8ef152 cleanup 2025-10-09 19:58:57 +03:00
Disty0 e19fb2d833 SDNQ keep the quant configs inside the module subfolder, add dtype cast and don't send to GPU 2025-10-09 19:34:48 +03:00
Vladimir Mandic 70defe6d06 handle load shards
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-09 11:29:36 -04:00
Vladimir Mandic 6907fcd320 speedup prequant model load
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-08 13:47:36 -04:00
Disty0 bdcd07f713 Add add_module_skip_keys to pre-load quant too 2025-10-08 01:11:40 +03:00
Disty0 7fdf400e8b cleanup 2025-10-08 00:41:04 +03:00
Disty0 df03ea9ba8 SDNQ add sdnq_post_load_quant and update Qwen keys 2025-10-08 00:29:36 +03:00
Vladimir Mandic 962cb7115d infra for full-model load/save with quant
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-07 14:30:45 -04:00
Vladimir Mandic 7fdc880a73 sdnq patches
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-07 09:43:34 -04:00
Disty0 1cd7b6d63a fix upcast scale check 2025-10-07 01:27:54 +03:00
Disty0 aa0c10440f SDNQ make the loader don't touch the model options by default 2025-10-07 00:15:23 +03:00
Disty0 c931bf9efa SDNQ add dtype casting to loader 2025-10-06 17:44:52 +03:00
Disty0 5c042c5fb8 cleanup 2025-10-06 11:30:26 +03:00
Vladimir Mandic a315a004e9 linting
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-05 20:25:33 -04:00
Disty0 23f2deaa58 fix enable_quantized_mamtul 2025-10-06 02:04:28 +03:00
Disty0 1f81a37e8e Set the default svd rank to 32 2025-10-06 01:27:29 +03:00
Disty0 ebb26ac123 SDNQ make load file name configurable 2025-10-06 01:04:00 +03:00
Disty0 0acb571472 SDNQ ass load and save model funcs 2025-10-06 00:57:23 +03:00