Commit Graph

11 Commits (2e4e741d47c61730d34e79ae8dae4f7995f7b4a7)

Author SHA1 Message Date
Disty0 c7aba8589b SDNQ fix Qwen loading 2025-10-11 00:05:09 +03:00
Disty0 9e52d0c1fb SDNQ add SVDQuant quantization method 2025-10-05 22:50:30 +03:00
Disty0 54acf1760b Make SDNQ scales compatible with balanced offload 2025-10-03 18:13:55 +03:00
Disty0 6b67a9d0c4 SDNQ add check_mats to matmul 2025-09-30 01:58:13 +03:00
Disty0 e6715ba8d3 Cleanup SDNQ compile 2025-09-19 19:29:36 +03:00
Disty0 a12edc1e90 SDNQ use nan_to_num_ with fp8 quantization in case of zeros 2025-09-15 20:22:39 +03:00
Disty0 f324b7c0e5 SDNQ remove unnecessary .contiguous() 2025-08-21 02:21:05 +03:00
Disty0 8460be662c SDNQ use inplace transpose and use view instead of reshape 2025-08-17 05:07:55 +03:00
Disty0 dc7b25d387 Cleanup SDNQ and add SDNQ_USE_TENSORWISE_FP8_MATMUL env var 2025-08-11 14:50:17 +03:00
Disty0 22d86acda3 Make SDNQ MatMul listen to the dequantize fp32 setting 2025-08-09 01:10:07 +03:00
Disty0 c3d007b02c SDNQ split forward.py into layers and cleanup 2025-08-02 17:36:55 +03:00