Commit Graph

29 Commits (4e4557d81c57128a9d847b19463b1ed67dd7c8d8)

Author SHA1 Message Date
Disty0 4e4557d81c NNCF set min matmul shape to 32 2025-05-13 18:50:23 +03:00
Disty0 b9ad55857d NNCF INT8 MatMul don't force FP32 with FP16 scales 2025-05-13 05:08:22 +03:00
Disty0 129c701b3d NNCF use torch.compile directly on int8_matmul instead of sub functions 2025-05-13 04:28:37 +03:00
Disty0 f1eefe97a4 NNCF use inplace ops 2025-05-13 03:49:30 +03:00
Disty0 f4e3a81a84 NNCF experimental direct INT8 MatMul support 2025-05-12 21:41:49 +03:00
Disty0 4eedeab9f8 NNCF use group size instead of number of groups and set default group size for int4 to 64 2025-05-11 20:38:01 +03:00
Disty0 03d05b6243 NNCF fix very large number of groups 2025-05-11 18:43:19 +03:00
Disty0 9cfdc3c079 Remove NNCF device hijack 2025-05-11 18:30:10 +03:00
Disty0 0673689d5b NNCF set the default group size to 128 for INT4 2025-05-11 08:45:27 +03:00
Disty0 4af570bfd4 Cleanup 2025-05-11 08:10:13 +03:00
Disty0 8d27d34969 NNCF don't clip the zero_point 2025-05-11 07:57:51 +03:00
Disty0 98308fa187 NNCF fix group size calculation 2025-05-11 07:08:21 +03:00
Disty0 020e1aa374 Cleanup 2025-05-11 06:56:36 +03:00
Disty0 03a6d7f9bf NNCF add number of quantization groups 2025-05-11 05:55:58 +03:00
Disty0 a1491a660c Cleanup 2025-05-09 23:36:50 +03:00
Disty0 1ee9832e05 NNCF silence the pytorch version warning 2025-05-09 23:16:55 +03:00
Disty0 a4d4462e2a NNCF add decompress using toch.compile option 2025-05-09 21:02:24 +03:00
Disty0 f3aa3b4574 NNCF remove T5 hijack from pre quant mode 2025-05-08 14:11:19 +03:00
Disty0 b6d2aa7fd8 NNCF more optimizations 2025-05-07 21:50:31 +03:00
Disty0 a57c7087b8 Make NNCF INT4 quant run 75% faster and don't force fp32 decompress 2025-05-07 20:34:07 +03:00
Disty0 43a5cfba79 NNCF use .to instead of .type 2025-04-25 16:16:08 +03:00
Disty0 778de4d295 NNCF do quant on GPU 2025-04-23 23:07:23 +03:00
Disty0 491ea6eea4 NNCF fix Conv layers remains in fp32 2025-04-23 19:04:43 +03:00
Disty0 74d4093e74 NNCF disable quant conv by default 2025-04-23 16:31:27 +03:00
Disty0 f1d8543cae NNCF lora support 2025-04-23 15:44:09 +03:00
Disty0 16935ec08b NNCF pre-load fix HiDream LLM quant 2025-04-23 02:52:46 +03:00
Disty0 bb0329f54f Update and refactor NNCF and add more quant options 2025-04-23 02:03:30 +03:00
Disty0 13c3f070e8 Cleanup 2025-04-22 04:51:25 +03:00
Disty0 2264d8087b Pre-load support for NNCF 2025-04-22 04:35:36 +03:00