Commit Graph

16 Commits (e4e863fd6d4167063413fb197fae420e488f3b9b)

Author SHA1 Message Date
Disty0 db59d2b507 SDNQ handle packed floats in fp mm 2025-12-27 16:29:18 +03:00
Disty0 b6e9332cfe SDNQ de-couple matmul dtype and add fp16 matmul 2025-11-22 02:16:20 +03:00
Disty0 c7aba8589b SDNQ fix Qwen loading 2025-10-11 00:05:09 +03:00
Disty0 be91bbff75 SDNQ add SVD support for Convs 2025-10-06 18:26:42 +03:00
Disty0 9e52d0c1fb SDNQ add SVDQuant quantization method 2025-10-05 22:50:30 +03:00
Disty0 54acf1760b Make SDNQ scales compatible with balanced offload 2025-10-03 18:13:55 +03:00
Disty0 6b67a9d0c4 SDNQ add check_mats to matmul 2025-09-30 01:58:13 +03:00
Disty0 e6715ba8d3 Cleanup SDNQ compile 2025-09-19 19:29:36 +03:00
Disty0 a12edc1e90 SDNQ use nan_to_num_ with fp8 quantization in case of zeros 2025-09-15 20:22:39 +03:00
Disty0 bbb345cf44 Fix bias dtype mismatch 2025-08-30 02:31:41 +03:00
Disty0 6c36433a14 SDNQ fix row-wise FP8 matmul with fp32 and fp16 inputs 2025-08-30 02:27:15 +03:00
Disty0 f324b7c0e5 SDNQ remove unnecessary .contiguous() 2025-08-21 02:21:05 +03:00
Disty0 8460be662c SDNQ use inplace transpose and use view instead of reshape 2025-08-17 05:07:55 +03:00
Disty0 dc7b25d387 Cleanup SDNQ and add SDNQ_USE_TENSORWISE_FP8_MATMUL env var 2025-08-11 14:50:17 +03:00
Disty0 22d86acda3 Make SDNQ MatMul listen to the dequantize fp32 setting 2025-08-09 01:10:07 +03:00
Disty0 c3d007b02c SDNQ split forward.py into layers and cleanup 2025-08-02 17:36:55 +03:00