Commit Graph

26 Commits (4ecec822e0f75dafc19380832a44c8ade3c68a80)

Author SHA1 Message Date
Disty0 f05c29175e cleanup 2025-10-19 02:09:25 +03:00
Disty0 ef72edf18f SDNQ improve svd and low bit matmul perf 2025-10-19 00:06:07 +03:00
Disty0 9206d9443e SDNQ add dequantize model 2025-10-12 00:00:53 +03:00
Disty0 5306376b2a improve contiguous mm performance 2025-10-06 19:05:46 +03:00
Disty0 be91bbff75 SDNQ add SVD support for Convs 2025-10-06 18:26:42 +03:00
Disty0 c931bf9efa SDNQ add dtype casting to loader 2025-10-06 17:44:52 +03:00
Disty0 23f2deaa58 fix enable_quantized_mamtul 2025-10-06 02:04:28 +03:00
Disty0 9e52d0c1fb SDNQ add SVDQuant quantization method 2025-10-05 22:50:30 +03:00
Disty0 f2e12a682f SDNQ remove use_contiguous_mm path in re_quant 2025-10-04 19:17:05 +03:00
Disty0 99113947bf SDNQ add RDNA2 INT8 support via Triton 2025-10-04 18:31:25 +03:00
Disty0 95a7da7e75 SDNQ use non-contiguous re-quantize 2025-10-03 18:54:58 +03:00
Disty0 54acf1760b Make SDNQ scales compatible with balanced offload 2025-10-03 18:13:55 +03:00
Disty0 e6715ba8d3 Cleanup SDNQ compile 2025-09-19 19:29:36 +03:00
Disty0 a12edc1e90 SDNQ use nan_to_num_ with fp8 quantization in case of zeros 2025-09-15 20:22:39 +03:00
Disty0 4ec8603f63 SDNQ re-add bitpacking for uint1 2025-08-29 23:06:11 +03:00
Disty0 d49e954918 SDNQ listen to dequantize_fp32 option with re_quantize 2025-08-29 22:48:28 +03:00
Disty0 a8de3f7282 SDNQ add quantized matmul support for all quantization types and group sizes 2025-08-29 22:26:47 +03:00
Disty0 8460be662c SDNQ use inplace transpose and use view instead of reshape 2025-08-17 05:07:55 +03:00
Disty0 dc7b25d387 Cleanup SDNQ and add SDNQ_USE_TENSORWISE_FP8_MATMUL env var 2025-08-11 14:50:17 +03:00
Disty0 3f45c4e570 Cleanup SDNQ and skip transpose on packed int8 matmul 2025-08-10 19:31:34 +03:00
Disty0 c3d007b02c SDNQ split forward.py into layers and cleanup 2025-08-02 17:36:55 +03:00
Disty0 25a4731a97 SDNQ use static compile 2025-07-20 16:25:57 +03:00
Disty0 86cd272b96 SDNQ fix Dora 2025-06-18 16:24:42 +03:00
Disty0 26800a1ef9 Cleanup sdnq 2025-06-17 02:05:13 +03:00
Disty0 d31df8c1eb SDNQ fuse bias into dequantizer with matmul 2025-06-14 22:10:10 +03:00
Disty0 5e013fb154 SDNQ optimize input quantization and use the word quantize instead of compress 2025-06-12 12:06:57 +03:00