automatic

Commit Graph

Author	SHA1	Message	Date
Disty0	f05c29175e	cleanup	2025-10-19 02:09:25 +03:00
Disty0	ef72edf18f	SDNQ improve svd and low bit matmul perf	2025-10-19 00:06:07 +03:00
Disty0	9206d9443e	SDNQ add dequantize model	2025-10-12 00:00:53 +03:00
Disty0	5306376b2a	improve contiguous mm performance	2025-10-06 19:05:46 +03:00
Disty0	be91bbff75	SDNQ add SVD support for Convs	2025-10-06 18:26:42 +03:00
Disty0	c931bf9efa	SDNQ add dtype casting to loader	2025-10-06 17:44:52 +03:00
Disty0	23f2deaa58	fix enable_quantized_mamtul	2025-10-06 02:04:28 +03:00
Disty0	9e52d0c1fb	SDNQ add SVDQuant quantization method	2025-10-05 22:50:30 +03:00
Disty0	f2e12a682f	SDNQ remove use_contiguous_mm path in re_quant	2025-10-04 19:17:05 +03:00
Disty0	99113947bf	SDNQ add RDNA2 INT8 support via Triton	2025-10-04 18:31:25 +03:00
Disty0	95a7da7e75	SDNQ use non-contiguous re-quantize	2025-10-03 18:54:58 +03:00
Disty0	54acf1760b	Make SDNQ scales compatible with balanced offload	2025-10-03 18:13:55 +03:00
Disty0	e6715ba8d3	Cleanup SDNQ compile	2025-09-19 19:29:36 +03:00
Disty0	a12edc1e90	SDNQ use nan_to_num_ with fp8 quantization in case of zeros	2025-09-15 20:22:39 +03:00
Disty0	4ec8603f63	SDNQ re-add bitpacking for uint1	2025-08-29 23:06:11 +03:00
Disty0	d49e954918	SDNQ listen to dequantize_fp32 option with re_quantize	2025-08-29 22:48:28 +03:00
Disty0	a8de3f7282	SDNQ add quantized matmul support for all quantization types and group sizes	2025-08-29 22:26:47 +03:00
Disty0	8460be662c	SDNQ use inplace transpose and use view instead of reshape	2025-08-17 05:07:55 +03:00
Disty0	dc7b25d387	Cleanup SDNQ and add SDNQ_USE_TENSORWISE_FP8_MATMUL env var	2025-08-11 14:50:17 +03:00
Disty0	3f45c4e570	Cleanup SDNQ and skip transpose on packed int8 matmul	2025-08-10 19:31:34 +03:00
Disty0	c3d007b02c	SDNQ split forward.py into layers and cleanup	2025-08-02 17:36:55 +03:00
Disty0	25a4731a97	SDNQ use static compile	2025-07-20 16:25:57 +03:00
Disty0	86cd272b96	SDNQ fix Dora	2025-06-18 16:24:42 +03:00
Disty0	26800a1ef9	Cleanup sdnq	2025-06-17 02:05:13 +03:00
Disty0	d31df8c1eb	SDNQ fuse bias into dequantizer with matmul	2025-06-14 22:10:10 +03:00
Disty0	5e013fb154	SDNQ optimize input quantization and use the word quantize instead of compress	2025-06-12 12:06:57 +03:00

26 Commits (4ecec822e0f75dafc19380832a44c8ade3c68a80)