Commit Graph

176 Commits (4e1f8a2b711784636e27e3db2d9d5eb7ee7170a4)

Author SHA1 Message Date
Vladimir Mandic e2b33b81d3 fix diffusers samplers 2023-07-15 22:40:03 -04:00
Disty0 f773c782fa ipex cleanup 2023-07-16 01:39:40 +03:00
Seunghoon Lee 0a52c44e73
DirectML rework & provide GPU memory usage (AMD only). 2023-07-15 18:55:38 +09:00
Disty0 14d1136fe7 Fix ipex memstats 2023-07-14 18:09:07 +03:00
Disty0 2a9133bfec IPEX rework 2023-07-14 17:33:24 +03:00
Disty0 c3a4293f22 Disable torch_gc for IPEX in WSL2 2023-07-12 13:02:42 +03:00
Disty0 2bce86a50a Replace empty_cache with torch_gc 2023-07-12 12:45:21 +03:00
Disty0 562ca33275 Fix Diffusers _conv_forward dtype error with IPEX 2023-07-12 02:03:45 +03:00
Vladimir Mandic db30f5faec update changelog 2023-07-08 14:22:51 -04:00
Vladimir Mandic 2a21196061
Merge branch 'master' into dev 2023-07-08 13:35:25 -04:00
Vladimir Mandic 89a7ea6a3f overal quality fixes 2023-07-08 09:49:41 -04:00
Disty0 205b516487 Fix diffusers_sdxl on ipex 2023-07-07 22:41:26 +03:00
Disty0 3bcca6f92b Patch torch.Generator again 2023-07-06 02:51:27 +03:00
Disty0 422c60c787 Patch torch.Generator 2023-07-05 20:49:39 +03:00
Disty0 99284ff020 Cleanup 2023-07-05 12:43:15 +03:00
Disty0 a62d9b0ca4 Cleanup 2023-07-05 12:39:34 +03:00
Nuullll 860bf8e2bf [IPEX] Support SDE samplers
This is a W/A since `torch.Generator()` API doesn't support `xpu`
backend at the moment. So replacing it with `torch.xpu.Generator()` API
provided by IPEX.
2023-07-05 15:48:58 +08:00
Disty0 45d50bd106 Remove cpu=xpu with ipex 2023-07-05 00:12:07 +03:00
Disty0 966eed8dd9 Autodetect IPEX 2023-07-04 23:37:36 +03:00
Vladimir Mandic b216a35ddd update diffusers and extra networks 2023-07-04 09:28:48 -04:00
Vladimir Mandic 2a41bf1406 fix styles 2023-06-27 09:04:42 -04:00
Disty0 102503a3a4 Fix ControlNet and change to sub-quad on ipex 2023-06-27 15:17:13 +03:00
Disty0 618097dac2 GradScaler patch for IPEX 2023-06-15 01:19:35 +03:00
Vladimir Mandic cb307399dd jumbo merge 2023-06-13 11:59:56 -04:00
Disty0 c9e58c9604 Fix train for IPEX 2023-06-12 00:21:32 +03:00
Disty0 f63dd1c92e Fix torch.linalg.solve with IPEX & Diffusers UniPC 2023-06-10 22:01:09 +03:00
Disty0 3bef3e3eee Train patches for IPEX 2023-06-07 17:25:11 +03:00
Vladimir Mandic efbe364f7d js optimizations 2023-06-05 14:26:01 -04:00
Disty0 c52fb69dde Fix bf16 test 2023-06-05 20:49:18 +03:00
Vladimir Mandic c0a824d8c6 add extra networks to xyz 2023-06-05 10:32:08 -04:00
Disty0 8bef48e501 Fix GroupNorm.forward with IPEX 2023-06-04 12:22:56 +03:00
Disty0 4265692505 Fix GradScaler doesn't exist for XPU 2023-06-03 17:02:44 +03:00
Vladimir Mandic 1f988d1df6 cleanup 2023-06-02 19:39:44 -04:00
Vince Navarro c30eb90aff
Remove stray print 2023-06-01 17:28:13 -04:00
Vince Navarro 523dbaf8dc
Add XPU support for --device-id 2023-06-01 16:42:21 -04:00
Vladimir Mandic 9bf0b1ae1f allow experimental to override precision 2023-05-28 07:46:47 -04:00
Vladimir Mandic f8884bc051 fix hip detection 2023-05-25 09:13:57 -04:00
Vladimir Mandic 9e66d88e21 add mps defaults 2023-05-24 15:21:49 -04:00
Vladimir Mandic 684851ae34 set default optimizer 2023-05-24 13:50:01 -04:00
Vladimir Mandic 0acc7d3b86 fix redirector 2023-05-24 08:49:33 -04:00
Vladimir Mandic ea0780339a fixes 2023-05-21 08:17:36 -04:00
Vladimir Mandic 9033499e08 add manual seed 2023-05-19 08:34:43 -04:00
Vladimir Mandic 0ccda9bc8b jumbo patch 2023-05-17 14:15:55 -04:00
Vladimir Mandic 5250ba4be3 force no-half with directml 2023-05-16 21:20:36 -04:00
Vladimir Mandic 8350b93a5c add force latent sampler 2023-05-15 09:32:20 -04:00
Vladimir Mandic 5134471bc8 dml autocast 2023-05-14 13:24:59 -04:00
Vladimir Mandic 618a1703ae update cudnn benchmark setting 2023-05-14 12:28:37 -04:00
Vladimir Mandic 760f5fb89a add extra debug messages 2023-05-14 12:26:15 -04:00
Vladimir Mandic a652270999 fix 2023-05-13 12:26:00 -04:00
Vladimir Mandic a2923064a5 update cudnn 2023-05-13 11:52:31 -04:00
Vladimir Mandic d96ab6a1ae update directml 2023-05-13 11:21:11 -04:00
Vladimir Mandic a2485cf7ef update 2023-05-12 21:12:24 -04:00
Vladimir Mandic 1921504e64 enable dynamo compile 2023-05-12 15:58:00 -04:00
Vladimir Mandic daf90cb6b4 add performance note 2023-05-12 14:23:51 -04:00
Vladimir Mandic 62dda471a3 process images in threads 2023-05-12 14:21:26 -04:00
Vladimir Mandic 1943bfea88 use cudnn workaround 2023-05-11 22:24:12 -04:00
Vladimir Mandic e038bf1549 aggressive gc 2023-05-10 16:03:55 -04:00
Vladimir Mandic 41182009cb switch some cmdopts to opts 2023-05-08 09:27:50 -04:00
Vladimir Mandic 1360c6422a add fp16 test 2023-05-08 09:27:50 -04:00
Disty0 8171d57c36 Remove unnecessary IPEX imports 2023-05-04 02:34:34 +03:00
Vladimir Mandic 5d8c787a7b restart server redesign 2023-05-03 17:20:22 -04:00
Disty0 53f3567224 Use cmd_args parser instead of launch.py 2023-05-03 21:25:23 +03:00
Disty0 7577a09528 Add IPEX Optimizers and use XPU instead of CPU when using IPEX 2023-05-03 18:12:38 +03:00
Disty0 de8d0bef9f More patches and Import IPEX after Torch 2023-04-30 18:19:37 +03:00
Disty0 a720a670e8 More patches and less import shared 2023-04-30 16:01:17 +03:00
Disty0 b075d3c8fd Intel ARC Support 2023-04-30 15:13:56 +03:00
Seunghoon Lee d2d5011bd3
Implement memory estimation for AMDGPUs.
Stable.
2023-04-26 17:44:32 +09:00
Seunghoon Lee a49a8f8b46
First DirectML implementation.
Unstable and not tested.
2023-04-25 01:43:19 +09:00
Vladimir Mandic 61e9a1970c add exception around torch properties 2023-04-22 08:35:17 -04:00
Vladimir Mandic cf277e7326 fix dtype logic 2023-04-21 15:04:05 -04:00
Vladimir Mandic 57204b3d70 disable xformers/sdp if cannot be used 2023-04-21 11:32:19 -04:00
Vladimir Mandic 7939a1649d parse model preload 2023-04-20 23:19:25 -04:00
Vladimir Mandic 0e7144186d jump patch 2023-04-20 11:20:27 -04:00
Vladimir Mandic e14cba0771 add lycoris folder 2023-04-15 12:25:59 -04:00
Vladimir Mandic 81b8294e93 switch cmdflags to settings 2023-04-12 10:40:11 -04:00
brkirch 1b8af15f13 Refactor Mac specific code to a separate file
Move most Mac related code to a separate file, don't even load it unless web UI is run under macOS.
2023-02-01 14:05:56 -05:00
brkirch 2217331cd1 Refactor MPS fixes to CondFunc 2023-02-01 06:36:22 -05:00
brkirch 7738c057ce MPS fix is still needed :(
Apparently I did not test with large enough images to trigger the bug with torch.narrow on MPS
2023-02-01 05:23:58 -05:00
AUTOMATIC1111 fecb990deb
Merge pull request #7309 from brkirch/fix-embeddings
Fix embeddings, upscalers, and refactor `--upcast-sampling`
2023-01-28 18:44:36 +03:00
brkirch f9edd578e9 Remove MPS fix no longer needed for PyTorch
The torch.narrow fix was required for nightly PyTorch builds for a while to prevent a hard crash, but newer nightly builds don't have this issue.
2023-01-28 04:16:27 -05:00
brkirch ada17dbd7c Refactor conditional casting, fix upscalers 2023-01-28 04:16:25 -05:00
AUTOMATIC 9beb794e0b clarify the option to disable NaN check. 2023-01-27 13:08:00 +03:00
AUTOMATIC d2ac95fa7b remove the need to place configs near models 2023-01-27 11:28:12 +03:00
brkirch e3b53fd295 Add UI setting for upcasting attention to float32
Adds "Upcast cross attention layer to float32" option in Stable Diffusion settings. This allows for generating images using SD 2.1 models without --no-half or xFormers.

In order to make upcasting cross attention layer optimizations possible it is necessary to indent several sections of code in sd_hijack_optimizations.py so that a context manager can be used to disable autocast. Also, even though Stable Diffusion (and Diffusers) only upcast q and k, unfortunately my findings were that most of the cross attention layer optimizations could not function unless v is upcast also.
2023-01-25 01:13:04 -05:00
brkirch 84d9ce30cb Add option for float32 sampling with float16 UNet
This also handles type casting so that ROCm and MPS torch devices work correctly without --no-half. One cast is required for deepbooru in deepbooru_model.py, some explicit casting is required for img2img and inpainting. depth_model can't be converted to float16 or it won't work correctly on some systems (it's known to have issues on MPS) so in sd_models.py model.depth_model is removed for model.half().
2023-01-25 01:13:02 -05:00
AUTOMATIC1111 aa60fc6660
Merge pull request #6922 from brkirch/cumsum-fix
Improve cumsum fix for MPS
2023-01-19 13:18:34 +03:00
brkirch a255dac4f8 Fix cumsum for MPS in newer torch
The prior fix assumed that testing int16 was enough to determine if a fix is needed, but a recent fix for cumsum has int16 working but not bool.
2023-01-17 20:54:18 -05:00
AUTOMATIC c361b89026 disable the new NaN check for the CI 2023-01-17 11:05:01 +03:00
AUTOMATIC 9991967f40 Add a check and explanation for tensor with all NaNs. 2023-01-16 22:59:46 +03:00
brkirch 8111b5569d Add support for PyTorch nightly and local builds 2023-01-05 20:54:52 -05:00
brkirch 16b4509fa6 Add numpy fix for MPS on PyTorch 1.12.1
When saving training results with torch.save(), an exception is thrown:
"RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead."

So for MPS, check if Tensor.requires_grad and detach() if necessary.
2022-12-17 04:22:58 -05:00
AUTOMATIC b6e5edd746 add built-in extension system
add support for adding upscalers in extensions
move LDSR, ScuNET and SwinIR to built-in extensions
2022-12-03 18:06:33 +03:00
AUTOMATIC 46b0d230e7 add comment for #4407 and remove seemingly unnecessary cudnn.enabled 2022-12-03 16:01:23 +03:00
AUTOMATIC 2651267e3a fix #4407 breaking UI entirely for card other than ones related to the PR 2022-12-03 15:57:52 +03:00
AUTOMATIC1111 681c0003df
Merge pull request #4407 from yoinked-h/patch-1
Fix issue with 16xx cards
2022-12-03 10:30:34 +03:00
brkirch 0fddb4a1c0 Rework MPS randn fix, add randn_like fix
torch.manual_seed() already sets a CPU generator, so there is no reason to create a CPU generator manually. torch.randn_like also needs a MPS fix for k-diffusion, but a torch hijack with randn_like already exists so it can also be used for that.
2022-11-30 10:33:42 -05:00
AUTOMATIC1111 cc90dcc933
Merge pull request #4918 from brkirch/pytorch-fixes
Fixes for PyTorch 1.12.1 when using MPS
2022-11-27 13:47:01 +03:00
AUTOMATIC 5b2c316890 eliminate duplicated code from #5095 2022-11-27 13:08:54 +03:00
Matthew McGoogan c67c40f983 torch.cuda.empty_cache() defaults to cuda:0 device unless explicitly set otherwise first. Updating torch_gc() to use the device set by --device-id if specified to avoid OOM edge cases on multi-GPU systems. 2022-11-26 23:25:16 +00:00
brkirch e247b7400a Add fixes for PyTorch 1.12.1
Fix typo "MasOS" -> "macOS"

If MPS is available and PyTorch is an earlier version than 1.13:
* Monkey patch torch.Tensor.to to ensure all tensors sent to MPS are contiguous
* Monkey patch torch.nn.functional.layer_norm to ensure input tensor is contiguous (required for this program to work with MPS on unmodified PyTorch 1.12.1)
2022-11-21 02:07:19 -05:00