Disty0
|
7679028c1a
|
Override CPU to use FP32 by default
|
2025-06-06 15:33:51 +03:00 |
chrismuzyn
|
299d189276
|
When using the openvino backend, do not look for an nvidia gpu.
|
2025-05-12 19:14:26 -04:00 |
Disty0
|
b0e5a6c4df
|
Add devices.has_triton() and enable NNCF compile if triton is available
|
2025-05-09 22:24:36 +03:00 |
Disty0
|
dfebc909eb
|
Disable cuDNN benchmark on ROCm and add cudnn_benchmark_limit option
|
2025-05-08 13:27:06 +03:00 |
Disty0
|
90f887ac4a
|
Add dim checks to ck flash atten and fix dim check on dyn atten
|
2025-03-25 03:50:21 +03:00 |
Seunghoon Lee
|
0c890b50e0
|
proper zluda detection
|
2025-03-20 23:03:23 +09:00 |
Disty0
|
1e0f512ccb
|
ROCm disable FP16 for gfx1102
|
2025-03-19 15:42:36 +03:00 |
Disty0
|
878cab085f
|
Reverse the sdpa hijcak order
|
2025-02-14 19:56:39 +03:00 |
Disty0
|
f94196bcd1
|
Rename ROCm Flash atten hijack to CK Flash atten and enable AOTriton memory and flash atten by default
|
2025-02-13 22:01:06 +03:00 |
Vladimir Mandic
|
49712ab9e7
|
update requirements
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2025-02-13 12:03:08 -05:00 |
Vladimir Mandic
|
e28b8cd920
|
add torch gc debug
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2025-01-31 13:12:59 -05:00 |
Disty0
|
d2fee97e24
|
Update changelog
|
2025-01-31 20:27:22 +03:00 |
Disty0
|
039746914f
|
Add check for missing cuda and ipex params
|
2025-01-31 19:12:56 +03:00 |
Vladimir Mandic
|
3dcb70e8a2
|
device init logging
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2025-01-31 09:31:12 -05:00 |
Vladimir Mandic
|
1697fb1508
|
add tunable ops path
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2025-01-30 13:35:09 -05:00 |
Vladimir Mandic
|
0ea7840608
|
add tunable ops
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2025-01-30 13:08:49 -05:00 |
Disty0
|
a770b1c888
|
More correct Dynamic Atten SDPA implementation and deprecate IPEX Diffusers attention
|
2025-01-25 21:33:42 +03:00 |
Disty0
|
af35296a68
|
IPEX 4GB alloc detection and log driver version
|
2025-01-22 18:15:25 +03:00 |
Vladimir Mandic
|
bb97e695da
|
log cleanup
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-12-31 09:15:36 -05:00 |
Vladimir Mandic
|
910f5d0a73
|
lora direct on-demand apply/unapply
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-12-29 12:38:19 -05:00 |
Vladimir Mandic
|
20f2554cec
|
add sd35-ipadapter and more balanced offload optimizations
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-12-20 10:22:42 -05:00 |
Vladimir Mandic
|
e9f951b2c5
|
offload logging
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-12-11 14:20:01 -05:00 |
Vladimir Mandic
|
9a588d9c91
|
update balanced offload
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-12-11 12:06:03 -05:00 |
Vladimir Mandic
|
023b13b6cb
|
balanced offload improvements
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-12-01 15:34:25 -05:00 |
Vladimir Mandic
|
b7aff134a2
|
add low/high threshold to balanced offload
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-11-30 19:03:51 -05:00 |
Vladimir Mandic
|
b74166f9cb
|
detailer add augment setting
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-11-29 07:18:07 -05:00 |
Vladimir Mandic
|
dbb9ba0890
|
cuda memory limits
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-10-26 07:51:40 -04:00 |
Seunghoon Lee
|
a76893bd72
|
add cdna check
|
2024-10-26 14:09:50 +09:00 |
Seunghoon Lee
|
81bd236cc3
|
zluda&rocm bf16 test
|
2024-10-26 13:59:21 +09:00 |
Disty0
|
d459acfcca
|
Cleanup
|
2024-10-24 20:08:49 +03:00 |
Disty0
|
3b916d5e48
|
Zluda guess the GPU arch with the device name
|
2024-10-24 18:53:17 +03:00 |
Disty0
|
801ebdd080
|
Treat Zluda as a different backend and auto disable BF16 for Zluda and ROCm on RDNA1-2
|
2024-10-24 15:06:39 +03:00 |
Vladimir Mandic
|
0587f0be0c
|
gc logging
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-10-23 07:58:09 -04:00 |
Vladimir Mandic
|
64f363283f
|
messages,stats,save
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-10-20 09:26:07 -04:00 |
Disty0
|
6c11002420
|
PyTorch 2.5 XPU support
|
2024-10-17 23:11:52 +03:00 |
Disty0
|
b14e8f9a5f
|
Don't assume Cuda on devices.same_device()
|
2024-10-14 17:23:51 +03:00 |
Vladimir Mandic
|
c2ab0b11c3
|
check te device
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-10-14 09:29:33 -04:00 |
Disty0
|
9c05124c33
|
Add devices.has_xpu()
|
2024-10-13 15:13:38 +03:00 |
Disty0
|
011d9c3348
|
Move device backed initialization to shared.py
|
2024-10-13 14:56:28 +03:00 |
Disty0
|
84f8ab4076
|
Fix IPEX
|
2024-10-13 14:24:55 +03:00 |
Vladimir Mandic
|
0c54c235cb
|
add sageattention
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-10-12 15:43:34 -04:00 |
Disty0
|
2e2cb43406
|
Make SDPA hijacks chainable and add Sage Attention
|
2024-10-12 21:19:38 +03:00 |
Vladimir Mandic
|
ea0dfebe2d
|
better handle any quant lib requirements
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-10-12 13:36:16 -04:00 |
Vladimir Mandic
|
3bbcc33181
|
add detailer
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-10-07 09:32:56 -04:00 |
Vladimir Mandic
|
c21a10b7c9
|
add bf16 override for directml
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-09-29 21:34:59 -04:00 |
Vladimir Mandic
|
b31d02ba1d
|
cleanup
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-09-29 20:28:01 -04:00 |
Vladimir Mandic
|
47755dce6b
|
refactor devices
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-09-29 20:17:03 -04:00 |
Vladimir Mandic
|
fe94edf781
|
set default cuda dtype to auto
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-09-29 14:16:14 -04:00 |
Vladimir Mandic
|
174add0c3a
|
restore dtype after upcast complete
Signed-off-by: Vladimir Mandic <mandic00@live.com>
|
2024-09-29 13:22:08 -04:00 |
Vladimir Mandic
|
92f2a2902f
|
improve profiling
|
2024-09-23 11:07:24 -04:00 |
Vladimir Mandic
|
fe93ad6929
|
refactor xyzgrid
|
2024-09-13 19:22:02 -04:00 |
Disty0
|
586e5384b5
|
Update IPEX to 2.3 on Linux
|
2024-09-10 19:13:42 +03:00 |
Vladimir Mandic
|
b4df9a4de1
|
jumbo update with flux.1 refactor, see changelog for details
|
2024-09-01 22:56:15 -04:00 |
Vladimir Mandic
|
5ed58ac7cc
|
end-to-end update flux, see changelog and wiki
|
2024-08-28 08:04:24 -04:00 |
Disty0
|
a3f26c9df0
|
Convert Dynamic Attention SDP to a global SDP option
|
2024-08-18 01:44:27 +03:00 |
Disty0
|
f2769c0449
|
ROCm flash atten fall back to sdpa with fp32 inputs
|
2024-07-23 01:13:55 +03:00 |
xedis
|
809444eb5c
|
fix typo
|
2024-07-03 23:11:08 -07:00 |
Vladimir Mandic
|
16ab1a0af7
|
lint updates
|
2024-06-26 08:58:22 -04:00 |
Vladimir Mandic
|
b036c2fc3b
|
improve gc threshold
|
2024-06-21 12:57:17 -04:00 |
Disty0
|
092a326c09
|
Add torch_gc to state.nextjob, vae and upscale
|
2024-06-20 14:47:30 +03:00 |
Vladimir Mandic
|
a1f53add94
|
fix typos
|
2024-06-16 17:00:35 -04:00 |
Vladimir Mandic
|
6d6f1de295
|
additional python 3.12 compatibility
|
2024-06-08 14:14:48 -04:00 |
Vladimir Mandic
|
db9718eee6
|
add torch full deterministic mode
|
2024-06-07 09:26:51 -04:00 |
Vladimir Mandic
|
d63f35e298
|
add cudaMallocAsync
|
2024-05-16 18:02:55 -04:00 |
Vladimir Mandic
|
40d5fdfdfd
|
maybe fix slerp
|
2024-04-03 17:51:06 -04:00 |
Vladimir Mandic
|
834cb1b665
|
run fp16/bf16 test only once
|
2024-04-03 10:26:29 -04:00 |
Vladimir Mandic
|
9873178897
|
add extra_network_reference setting, refactor geninfo parser
|
2024-03-11 11:15:51 -04:00 |
Disty0
|
06149c4a41
|
ROCm add Flash Attention support
|
2024-03-10 00:11:50 +03:00 |
Vladimir Mandic
|
ee7517dfb8
|
expose sdp options
|
2024-02-19 08:29:24 -05:00 |
Vladimir Mandic
|
1b3028b667
|
minor update
|
2024-02-15 09:13:34 -05:00 |
Vladimir Mandic
|
d5a4f43f43
|
post release jumbo update
|
2024-02-08 12:10:32 -05:00 |
Disty0
|
0f829b2d04
|
Make OpenVINO compatible with IPEX venv
|
2024-01-08 03:00:18 +03:00 |
Disty0
|
8f70b7d08c
|
Add DISABLE_VENV_LIBS env variable
|
2024-01-08 02:04:55 +03:00 |
Vladimir Mandic
|
17b30a320e
|
enable batched taesd
|
2024-01-03 10:38:30 -05:00 |
Disty0
|
068f0a7d71
|
Return CPU device with OpenVINO on MAC
|
2024-01-02 20:20:36 +03:00 |
Vladimir Mandic
|
70bfe4ced8
|
enable gc on ram threshold
|
2023-12-31 08:15:22 -05:00 |
Vladimir Mandic
|
439542d3df
|
redesign profiler
|
2023-12-03 11:27:24 -05:00 |
Disty0
|
bd141bbfeb
|
IPEX decrease Torch GC Threshold to 80
|
2023-11-21 16:47:35 +03:00 |
Seunghoon Lee
|
36bef98cd5
|
Show device information log for DirectML.
|
2023-10-17 12:24:27 +09:00 |
Vladimir Mandic
|
2ec797472b
|
add hypertile
|
2023-10-06 16:10:56 -04:00 |
Disty0
|
dc31dcbc1c
|
Cleanup
|
2023-09-29 18:29:44 +03:00 |
Disty0
|
72a33d5247
|
Update device logging
|
2023-09-29 18:28:04 +03:00 |
Disty0
|
6184a8cb5c
|
IPEX and DML fix Cuda error
|
2023-09-28 20:51:11 +03:00 |
Disty0
|
21d53b6ac8
|
Cleanup
|
2023-09-28 19:53:52 +03:00 |
Disty0
|
7a3c1da954
|
Add OpenVINO device logging
|
2023-09-28 19:33:14 +03:00 |
Disty0
|
5edf481c8d
|
Add Torch GC threshold slider
|
2023-09-28 14:38:22 +03:00 |
Vladimir Mandic
|
0afcfe6097
|
logger early init
|
2023-09-23 23:44:34 -04:00 |
Disty0
|
550b7056ac
|
IPEX fix SDPA and reduce torch_gc force to %90
|
2023-09-18 15:36:14 +03:00 |
Vladimir Mandic
|
484dae8dbd
|
upgrade diffusers
|
2023-09-14 09:38:17 -04:00 |
Vladimir Mandic
|
76c444fbc8
|
cleanup
|
2023-09-13 11:48:13 -04:00 |
Vladimir Mandic
|
f8fcb6f853
|
fix original hires non-latent
|
2023-09-10 18:30:20 -04:00 |
Vladimir Mandic
|
250d1bf2fb
|
update hints
|
2023-09-10 13:05:31 -04:00 |
Disty0
|
34ee67477e
|
Fix BF16 and FP32 logging
|
2023-09-08 23:49:49 +03:00 |
Vladimir Mandic
|
29d88cf557
|
cleanup logging
|
2023-09-08 13:29:33 -04:00 |
Vladimir Mandic
|
f36c1eb476
|
jumbo patch
|
2023-09-08 13:01:20 -04:00 |
Vladimir Mandic
|
8fd96d0f30
|
catch directml and ipex initialization errors
|
2023-09-07 07:27:54 -04:00 |
Vladimir Mandic
|
df65df3f36
|
minor fixes
|
2023-08-30 09:45:47 -04:00 |
Vladimir Mandic
|
48c0ce9b2b
|
fix model lookups
|
2023-08-27 08:01:29 +00:00 |
Vladimir Mandic
|
6a4d4ea5b7
|
update logging and model hashinh
|
2023-08-22 18:28:09 +00:00 |
Disty0
|
f9718f068c
|
Seperate OpenVINO from IPEX
|
2023-08-19 17:52:15 +03:00 |