Commit Graph

281 Commits (e8a158f4f5e10742c316e9b308ce9241b2c98430)

Author SHA1 Message Date
vladmandic 32b8b082e2 cleanup logging
Signed-off-by: vladmandic <mandic00@live.com>
2026-01-16 10:36:02 +01:00
vladmandic 85332594fc triton test reduce verbosity
Signed-off-by: vladmandic <mandic00@live.com>
2026-01-10 10:32:13 +01:00
Seunghoon Lee 49965dfda8
get_hip_arch_name -> get_hip_agent, use amdhip64_7.dll served within rocm package 2026-01-03 21:00:36 +09:00
vladmandic b9c18452f2 unify hip get arch name
Signed-off-by: vladmandic <mandic00@live.com>
2026-01-03 08:22:19 +01:00
Vladimir Mandic 0b1e6d2d3c improve offloading
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-12-25 10:24:02 +00:00
vladmandic e2fb70d4a1 detailer draw segmentation overlays
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-17 10:03:17 +01:00
Vladimir Mandic 3a3c984411
Merge pull request #4388 from vladmandic/kanvas
merge kanvas to dev
2025-11-09 07:57:54 -05:00
Disty0 f4ee9c7052 Add Flex attention 2025-11-09 00:14:38 +03:00
Vladimir Mandic f491955991
Merge pull request #4383 from vladmandic/dev
refresh branch
2025-11-08 15:43:28 -05:00
Vladimir Mandic 69180202d3 kanvas integration
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-11-08 15:41:52 -05:00
Disty0 2bbbb684cc Rename CK Flash attention to just Flash attention 2025-11-08 23:24:40 +03:00
Disty0 a93715e0da Don't expose AMD Triton Flash Atten for non AMD 2025-11-08 23:20:55 +03:00
Vladimir Mandic 56026c4e61 refactor attention handling
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-11-08 10:55:41 -05:00
Vladimir Mandic 155ee7f84c fix sage-attention checks on sm86
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-11-08 08:47:21 -05:00
Vladimir Mandic 5ffbca9377 cleanup and update changelog
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-11-05 12:57:30 -05:00
CalamitousFelicitousness bdc477d252
Refactor GPU backend selection for sage attention
Removed hot path, now everything is defined at setup

Also, passing device to get_device_capability so that it works properly with multi-gpu setups.
2025-11-05 16:31:13 +00:00
CalamitousFelicitousness 4c791fb795
Remove model check logic for SA2 workaround 2025-11-05 10:53:07 +00:00
CalamitousFelicitousness 18676996d0
Sage Attention 2 + Triton workaround Qwen-Image
Workaround to prevent black images generated with Qwen-Image models when Sage Attention 2 is enabled with Triton as backend on devices with compute capability 8.0 and 8.6.

Simply switches back to Cuda backend for these models only.

Proof of concept, feel free to close if this is not appropriate.
2025-11-04 23:31:14 +00:00
Vladimir Mandic 780cd26587 triton test hide errors behind debug flag
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-11-04 07:26:21 -05:00
Vladimir Mandic 495cfd8632 fix cn
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-11-01 12:21:19 -04:00
Vladimir Mandic 58f218a560 add cudnn enable/disable override
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-11-01 11:33:39 -04:00
Vladimir Mandic 408b82ef08 cleanup
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-29 09:44:18 -04:00
Vladimir Mandic 46876060ab kandinsky 10s force flex attn
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-28 10:53:19 -04:00
vladmandic 4282762c8c fix late init
Signed-off-by: vladmandic <mandic00@live.com>
2025-10-26 18:57:51 -04:00
vladmandic 60ac82b191 add basic xpu gpu monitor
Signed-off-by: vladmandic <mandic00@live.com>
2025-10-26 18:55:54 -04:00
vladmandic 0271e0830c triton split check into early and full
Signed-off-by: vladmandic <mandic00@live.com>
2025-10-26 11:48:10 -04:00
Disty0 818b0c0821 Add basic triton test 2025-10-26 10:44:04 +03:00
Vladimir Mandic 4b95d72d45 video tab layout
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-18 14:07:52 -04:00
Vladimir Mandic 1ebd96fdc6 add kandinsky5-lite t2v
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-10-18 12:15:36 -04:00
Disty0 4a70e82b0c ROCm always use numpy on cholesky 2025-09-28 14:05:20 +03:00
Disty0 a47959b114 move ROCm Windows hijacks outside of torch install 2025-09-28 13:33:15 +03:00
Disty0 6766563510 Add info log for Building CK Flash attention 2025-09-10 19:25:07 +03:00
Disty0 bc7c89c070 add typing to sdpa hijacks 2025-09-10 04:43:57 +03:00
Disty0 c51552af90 Enable triton flash atten option for rocm linux too 2025-09-03 16:26:00 +03:00
Disty0 c42e0e0b37 Cleanup 2025-09-03 16:18:02 +03:00
Disty0 266c9c0d3d Move Zluda Triton flash atten hijack to Triton Flash attention option 2025-09-03 16:16:41 +03:00
Vladimir Mandic fa44521ea3 offload-never and offload-always per-module and new highvram profile
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-31 11:40:24 -04:00
Vladimir Mandic 04af23a3bc refactore pipeline apply/unapply optional components & features
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-26 20:04:07 -04:00
Disty0 ad716b118b fix enable_gqa with dyn atten 2025-07-26 01:49:17 +03:00
Vladimir Mandic a5b77b8ee2 remove dead code
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-07-05 16:47:25 -04:00
Disty0 c9f49720c5 Cleanup 2025-06-25 23:37:41 +03:00
Disty0 dd84fb541f Always set sdpa params 2025-06-11 21:43:48 +03:00
Disty0 7679028c1a Override CPU to use FP32 by default 2025-06-06 15:33:51 +03:00
chrismuzyn 299d189276 When using the openvino backend, do not look for an nvidia gpu. 2025-05-12 19:14:26 -04:00
Disty0 b0e5a6c4df Add devices.has_triton() and enable NNCF compile if triton is available 2025-05-09 22:24:36 +03:00
Disty0 dfebc909eb Disable cuDNN benchmark on ROCm and add cudnn_benchmark_limit option 2025-05-08 13:27:06 +03:00
Disty0 90f887ac4a Add dim checks to ck flash atten and fix dim check on dyn atten 2025-03-25 03:50:21 +03:00
Seunghoon Lee 0c890b50e0
proper zluda detection 2025-03-20 23:03:23 +09:00
Disty0 1e0f512ccb ROCm disable FP16 for gfx1102 2025-03-19 15:42:36 +03:00
Disty0 878cab085f Reverse the sdpa hijcak order 2025-02-14 19:56:39 +03:00