Commit Graph

232 Commits (75ff932e792398cc4e9d1f90d6fcc49bac19c0b6)

Author SHA1 Message Date
Disty0 878cab085f Reverse the sdpa hijcak order 2025-02-14 19:56:39 +03:00
Disty0 f94196bcd1 Rename ROCm Flash atten hijack to CK Flash atten and enable AOTriton memory and flash atten by default 2025-02-13 22:01:06 +03:00
Vladimir Mandic 49712ab9e7 update requirements
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-02-13 12:03:08 -05:00
Vladimir Mandic e28b8cd920 add torch gc debug
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-01-31 13:12:59 -05:00
Disty0 d2fee97e24 Update changelog 2025-01-31 20:27:22 +03:00
Disty0 039746914f Add check for missing cuda and ipex params 2025-01-31 19:12:56 +03:00
Vladimir Mandic 3dcb70e8a2 device init logging
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-01-31 09:31:12 -05:00
Vladimir Mandic 1697fb1508 add tunable ops path
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-01-30 13:35:09 -05:00
Vladimir Mandic 0ea7840608 add tunable ops
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-01-30 13:08:49 -05:00
Disty0 a770b1c888 More correct Dynamic Atten SDPA implementation and deprecate IPEX Diffusers attention 2025-01-25 21:33:42 +03:00
Disty0 af35296a68 IPEX 4GB alloc detection and log driver version 2025-01-22 18:15:25 +03:00
Vladimir Mandic bb97e695da log cleanup
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-12-31 09:15:36 -05:00
Vladimir Mandic 910f5d0a73 lora direct on-demand apply/unapply
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-12-29 12:38:19 -05:00
Vladimir Mandic 20f2554cec add sd35-ipadapter and more balanced offload optimizations
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-12-20 10:22:42 -05:00
Vladimir Mandic e9f951b2c5 offload logging
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-12-11 14:20:01 -05:00
Vladimir Mandic 9a588d9c91 update balanced offload
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-12-11 12:06:03 -05:00
Vladimir Mandic 023b13b6cb balanced offload improvements
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-12-01 15:34:25 -05:00
Vladimir Mandic b7aff134a2 add low/high threshold to balanced offload
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-11-30 19:03:51 -05:00
Vladimir Mandic b74166f9cb detailer add augment setting
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-11-29 07:18:07 -05:00
Vladimir Mandic dbb9ba0890 cuda memory limits
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-10-26 07:51:40 -04:00
Seunghoon Lee a76893bd72
add cdna check 2024-10-26 14:09:50 +09:00
Seunghoon Lee 81bd236cc3
zluda&rocm bf16 test 2024-10-26 13:59:21 +09:00
Disty0 d459acfcca Cleanup 2024-10-24 20:08:49 +03:00
Disty0 3b916d5e48 Zluda guess the GPU arch with the device name 2024-10-24 18:53:17 +03:00
Disty0 801ebdd080 Treat Zluda as a different backend and auto disable BF16 for Zluda and ROCm on RDNA1-2 2024-10-24 15:06:39 +03:00
Vladimir Mandic 0587f0be0c gc logging
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-10-23 07:58:09 -04:00
Vladimir Mandic 64f363283f messages,stats,save
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-10-20 09:26:07 -04:00
Disty0 6c11002420 PyTorch 2.5 XPU support 2024-10-17 23:11:52 +03:00
Disty0 b14e8f9a5f Don't assume Cuda on devices.same_device() 2024-10-14 17:23:51 +03:00
Vladimir Mandic c2ab0b11c3 check te device
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-10-14 09:29:33 -04:00
Disty0 9c05124c33 Add devices.has_xpu() 2024-10-13 15:13:38 +03:00
Disty0 011d9c3348 Move device backed initialization to shared.py 2024-10-13 14:56:28 +03:00
Disty0 84f8ab4076 Fix IPEX 2024-10-13 14:24:55 +03:00
Vladimir Mandic 0c54c235cb add sageattention
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-10-12 15:43:34 -04:00
Disty0 2e2cb43406 Make SDPA hijacks chainable and add Sage Attention 2024-10-12 21:19:38 +03:00
Vladimir Mandic ea0dfebe2d better handle any quant lib requirements
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-10-12 13:36:16 -04:00
Vladimir Mandic 3bbcc33181 add detailer
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-10-07 09:32:56 -04:00
Vladimir Mandic c21a10b7c9 add bf16 override for directml
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-09-29 21:34:59 -04:00
Vladimir Mandic b31d02ba1d cleanup
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-09-29 20:28:01 -04:00
Vladimir Mandic 47755dce6b refactor devices
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-09-29 20:17:03 -04:00
Vladimir Mandic fe94edf781 set default cuda dtype to auto
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-09-29 14:16:14 -04:00
Vladimir Mandic 174add0c3a restore dtype after upcast complete
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2024-09-29 13:22:08 -04:00
Vladimir Mandic 92f2a2902f improve profiling 2024-09-23 11:07:24 -04:00
Vladimir Mandic fe93ad6929 refactor xyzgrid 2024-09-13 19:22:02 -04:00
Disty0 586e5384b5 Update IPEX to 2.3 on Linux 2024-09-10 19:13:42 +03:00
Vladimir Mandic b4df9a4de1 jumbo update with flux.1 refactor, see changelog for details 2024-09-01 22:56:15 -04:00
Vladimir Mandic 5ed58ac7cc end-to-end update flux, see changelog and wiki 2024-08-28 08:04:24 -04:00
Disty0 a3f26c9df0 Convert Dynamic Attention SDP to a global SDP option 2024-08-18 01:44:27 +03:00
Disty0 f2769c0449 ROCm flash atten fall back to sdpa with fp32 inputs 2024-07-23 01:13:55 +03:00
xedis 809444eb5c
fix typo 2024-07-03 23:11:08 -07:00