add sdxl support

2023-07-06 19:26:43 -04:00 · 2023-07-06 19:26:43 -04:00 · 7e11ff2b34
parent d8748fd7eb
commit 7e11ff2b34
11 changed files with 390 additions and 200 deletions
--- a/DIFFUSERS.md
+++ b/DIFFUSERS.md
@ -1,4 +1,129 @@
-# Diffusers WiP  
+# Additional Models
+
+SD.Next includes *experimental* support for additional model pipelines  
+This includes support for additional models such as:
+
+- **Stable Diffusion XL**
+- **Kandinsky**
+- **Deep Floyd IF**
+- **Shap-E**
+
+Note that support is *experimental*, do not open [GitHub issues](https://github.com/vladmandic/automatic/issues) for those models  
+and instead reach-out on [Discord](https://discord.gg/WqMzTUDC) using dedicated channels  
+
+*This has been made possible by integration of [huggingface diffusers](https://huggingface.co/docs/diffusers/index) library with help of huggingface team!*
+
+## How to
+
+- Install **SD.Next** as usual  
+- Start with  
+  `webui --backend diffusers`
+- To go back to standard execution pipeline, start with  
+  `webui --backend original`
+
+## Integration
+
+### Standard workflows  
+
+- **txt2txt**
+- **img2img**
+- **process**
+
+### Model Access
+
+- For standard SD 1.5 and SD 2.1 models, you can use either  
+  standard *safetensor* models or *diffusers* models
+- For additional models, you can use *diffusers* models only
+- You can download diffuser models directly from [Huggingface hub](https://huggingface.co/)  
+  or use built-in model search & download in SD.Next: **UI -> Models -> Huggingface**
+- Note that access to some models is gated  
+  In which case, you need to accept model EULA and provide your huggingface token  
+
+### Extra Networks
+
+- Lora networks  
+- Textual inversions (embeddings)  
+
+Note that Lora and TI need are still model-specific, so you cannot use Lora trained on SD 1.5 on SD-XL  
+(just like you couldn't do it on SD 2.1 model) - it needs to be trained for a specific model  
+
+Support for SD-XL training is expected shortly  
+
+### Diffuser Settings
+
+- UI -> Settings -> Diffuser Settings  
+  contains additional tunable parameters  
+
+### Samplers
+
+- Samplers (schedulers) are pipeline specific, so when running with diffuser backend, you'll see a different list of samplers
+- UI -> Settings -> Sampler Settings shows different configurable parameters depending on backend  
+- Recommended sampler for diffusers is **DEIS**
+
+### Other
+
+- Updated **System Info** tab with additional information
+- Support for `lowvram` and `medvram` modes  
+  Additional tunables are available in UI -> Settings -> Diffuser Settings  
+- Support for both default **SDP** and **xFormers** cross-optimizations  
+  Other cross-optimization methods are not available  
+- **Extra Networks UI** will show available diffusers models  
+- **CUDA model compile**  
+  UI Settings -> Compute settings  
+  Requires GPU with high VRAM  
+  Diffusers recommend `reduce overhead`, but other methods are available as well  
+  Fullgraph is possible (with sufficient vram) when using diffusers  
+
+## SD-XL Notes
+
+- SD-XL model is designed as two-stage model  
+  You can run SD-XL pipeline using just `base` model, but for best results, load both `base` and `refiner` models  
+  - `base`: Trained on images with variety of aspect ratios and uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding  
+  - `refiner`: Trained to denoise small noise levels of high quality data and uses the OpenCLIP model  
+- If you want to use `refiner` model, it is advised to add `sd_model_refiner` to **quicksettings**  
+  in UI Settings -> User Interface
+- SD-XL model was trained on **1024px** images  
+  You can use it with smaller sizes, but you will likely get better results with SD 1.5 models  
+- SD-XL model NSFW filter has been turned off  
+
+## Limitations
+
+- Diffusers do not have callbacks per-step, so any functionality that relies on that will not be available  
+  This includes trival but very visible **progress bar**  
+- Any extension that requires access to model internals will likely not work when using diffusers backend  
+  This for example includes standard extensions such as `ControlNet`, `MultiDiffusion`, `LyCORIS`
+- Second-pass workflows such as `hires fix` are not yet implemented (soon)
+- Hypernetworks  
+- Explit VAE usage (soon)
+
+## Performance
+
+Comparison of original stable diffusion pipeline and diffusers pipeline
+
+| pipeline | performance it/s | memory cpu/gpu |
+| --- | --- | --- |
+| original | 7.99 / 7.93 / 8.83 / 9.14 / 9.2 | 6.7 / 7.2 |
+| original medvram | 6.23 / 7.16 / 8.41 / 9.24 / 9.68 | 8.4 / 6.8 |
+| original lowvram | 1.05 / 1.94 / 3.2 / 4.81 / 6.46 | 8.8 / 5.2 |
+| diffusers | 9 / 7.4 / 8.2 / 8.4 / 7.0 | 4.3 / 9.0 |
+| diffusers medvram | 7.5 / 6.7 / 7.5 / 7.8 / 7.2 | 6.6 / 8.2 |
+| diffusers lowvram | 7.0 / 7.0 / 7.4 / 7.7 / 7.8 | 4.3 / 7.2 |
+| diffusers with safetensors | 8.9 / 7.3 / 8.1 / 8.4 / 7.1 | 5.9 / 9.0 |
+
+Notes:
+
+- Performance is measured using standard SD 1.5 model
+- Performance is measured for `batch-size` 1, 2, 4, 8 16
+- Test environment:
+  - nVidia RTX 3060 GPU
+  - Torch 2.1-nightly with CUDA 12.1
+  - Cross-optimization: SDP
+- All being equal, diffussers seem to:
+  - Use slightly less RAM and more VRAM
+  - Have highly efficient medvram/lowvram equivalents which don't loose a lot of performance  
+  - Faster on smaller batch sizes, slower on larger batch sizes  
+
+## TODO

 initial support merged into `dev` branch  

@ -13,61 +138,6 @@ lora support is not compatible with setting `Use LyCoris handler for all Lora ty

 to update repo, do not use `--upgrade` flag, use manual `git pull` instead

-## Test
-
-### Standard
-
-goal is to test standard workflows (so not diffusers) to ensure there are no regressions  
-so diffusers code can be merged into `master` and we can continue with development there
-
- run with `webui --debug --backend original`  
-
-### Diffusers
-
-whats implemented so far?
-
- new scheduler: deis
- simple model downloader for huggingface models: tabs -> models -> hf hub  
- use huggingface models  
- extra networks ui  
- use safetensor models with diffusers backend  
- lowvram and medvram equivalents for diffusers
- standard workflows:
-  - txt2img, img2img, inpaint, outpaint, process  
-  - hires fix, restore faces, etc?  
- textual inversion  
-  yes, this applies to standard embedddings, don't need ones from huggingface  
- lora  
-  yes, this applies to standard loras, don't need ones from huggingface  
-  but seems that diffuser lora support is somewhat limited, so quite a few loras may not work  
-  you should see which lora loads without issues in console log  
- system info tab with updated information  
- kandinsky model  
-  works for me  
-
-### Experimental
-
- cuda model compile
-  in settings -> compute settings  
-  diffusers recommend `reduce overhead`, but other methods are available as well  
-  it seems that fullgraph is possible (with sufficient vram) when using diffusers  
- deepfloyd  
-  in theory it should work, but its 20gb model so cant test it just yet  
-  note that access is gated, so you'll need to download using your huggingface credentials  
-  (you can still do it from sdnext ui, just need access token)  
-
-## Todo
-
- sdxl model  
- no idea if sd21 works out-of-the-box
- hires fix?
- vae support
-
-## Limitations
-
-even if extensions are not supported, runtime errors are never nice  
-will need to handle in the code before we get out of alpha
-
 - lycoris  
  `lyco_patch_lora`
 - controlnet
@ -76,28 +146,11 @@ will need to handle in the code before we get out of alpha
  > sd_model.first_stage_model?.encoder?
 - dynamic-thresholding
  > AttributeError: 'DiffusionSampler' object has no attribute 'model_wrap_cfg'
- no per-step callback  

-## Performance
-
-| pipeline | performance it/s | memory cpu/gpu |
-| --- | --- | --- |
-| original | 7.99 / 7.93 / 8.83 / 9.14 / 9.2 | 6.7 / 7.2 |
-| original medvram | 6.23 / 7.16 / 8.41 / 9.24 / 9.68 | 8.4 / 6.8 |
-| original lowvram | 1.05 / 1.94 / 3.2 / 4.81 / 6.46 | 8.8 / 5.2 |
-| diffusers | 9 / 7.4 / 8.2 / 8.4 / 7.0 | 4.3 / 9.0 |
-| diffusers medvram | 7.5 / 6.7 / 7.5 / 7.8 / 7.2 | 6.6 / 8.2 |
-| diffusers lowvram | 7.0 / 7.0 / 7.4 / 7.7 / 7.8 | 4.3 / 7.2 |
-| diffusers with safetensors | 8.9 / 7.3 / 8.1 / 8.4 / 7.1 | 5.9 / 9.0 |
-
-Notes:
-
- Performance is measured for `batch-size` 1, 2, 4, 8 16
- Test environment:
-  - nVidia RTX 3060 GPU
-  - Torch 2.1-nightly with CUDA 12.1
-  - Cross-optimization: SDP
- All being equal, diffussers seem to:
-  - Use slightly less RAM and more VRAM
-  - Have highly efficient medvram/lowvram equivalents which don't loose a lot of performance  
-  - Faster on smaller batch sizes, slower on larger batch sizes  
+- diffusers pipeline in general no sampler per-step callback, its completely opaque inside the pipeline  
+  so i'm missing some very basic stuff like progress bar in the ui or ability to generate live preview based on intermediate latents  
+- StableDiffusionXLPipeline does not implement `from_ckpt`
+- StableDiffusionXLPipeline has long delay after tqdm progress bar finishes and before it returns an image, i assume its vae, but its not a good user-experience
+- VAE:  
+  > vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")  
+  > pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)  
--- a/installer.py
+++ b/installer.py
@ -283,6 +283,7 @@ def check_torch():
    log.debug(f'Torch overrides: cuda={args.use_cuda} rocm={args.use_rocm} ipex={args.use_ipex} diml={args.use_directml}')
    log.debug(f'Torch allowed: cuda={allow_cuda} rocm={allow_rocm} ipex={allow_ipex} diml={allow_directml}')
    torch_command = os.environ.get('TORCH_COMMAND', '')
+    xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
    if torch_command != '':
        pass
    elif allow_cuda and (shutil.which('nvidia-smi') is not None or os.path.exists(os.path.join(os.environ.get('SystemRoot') or r'C:\Windows', 'System32', 'nvidia-smi.exe'))):
@ -294,11 +295,9 @@ def check_torch():
        os.environ.setdefault('HSA_OVERRIDE_GFX_VERSION', '10.3.0')
        os.environ.setdefault('PYTORCH_HIP_ALLOC_CONF', 'garbage_collection_threshold:0.8,max_split_size_mb:512')
        torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/rocm5.4.2')
-        xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
    elif allow_ipex and args.use_ipex and shutil.which('sycl-ls') is not None:
        log.info('Intel OneAPI Toolkit detected')
        torch_command = os.environ.get('TORCH_COMMAND', 'torch==1.13.0a0 torchvision==0.14.1a0 intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu')
-        xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
    else:
        machine = platform.machine()
        if sys.platform == 'darwin':
@ -306,13 +305,11 @@ def check_torch():
        elif allow_directml and args.use_directml and ('arm' not in machine and 'aarch' not in machine):
            log.info('Using DirectML Backend')
            torch_command = os.environ.get('TORCH_COMMAND', 'torch-directml')
-            xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
            if 'torch' in torch_command and not args.version:
                install(torch_command, 'torch torchvision')
        else:
            log.info('Using CPU-only Torch')
            torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision')
-            xformers_package = os.environ.get('XFORMERS_PACKAGE', 'none')
    if 'torch' in torch_command and not args.version:
        install(torch_command, 'torch torchvision')
    if args.skip_torch:
--- a/javascript/black-orange.css
+++ b/javascript/black-orange.css
@ -1,10 +1,28 @@
 /* generic html tags */
-:root { --font: "Source Sans Pro", 'ui-sans-serif', 'system-ui', "Roboto", sans-serif; }
-html { font-size: 16px; }
+:root {
+  --font: "Source Sans Pro", 'ui-sans-serif', 'system-ui', "Roboto", sans-serif;
+  --font-size: 16px;
+  --left-column: 490px;
+  --highlight-color: #ce6400;
+  --inactive-color: #4e1400;
+  --background-color: #000000;
+  --primary-50: #fff7ed;
+  --primary-100: #ffedd5;
+  --primary-200: #fed7aa;
+  --primary-300: #fdba74;
+  --primary-400: #fb923c;
+  --primary-500: #f97316;
+  --primary-600: #ea580c;
+  --primary-700: #c2410c;
+  --primary-800: #9a3412;
+  --primary-900: #7c2d12;
+  --primary-950: #6c2e12;
+}
+html { font-size: var(--font-size); }
 body, button, input, select, textarea { font-family: var(--font);}
 button { font-size: 1.2rem; }
-img { background-color: black; }
-input[type=range] { height: 18px; appearance: none; margin-top: 0; min-width: 160px; background-color: black; width: 100%; background: transparent; }
+img { background-color: var(--background-color); }
+input[type=range] { height: 18px; appearance: none; margin-top: 0; min-width: 160px; background-color: var(--background-color); width: 100%; background: transparent; }
 input[type=range]::-webkit-slider-runnable-track { width: 100%; height: 18px; cursor: pointer; box-shadow: 2px 2px 3px #111111; background: #50555C; border-radius: 2px; border: 0px solid #222222; }
 input[type=range]::-moz-range-track { width: 100%; height: 18px; cursor: pointer; box-shadow: 2px 2px 3px #111111; background: #50555C; border-radius: 2px; border: 0px solid #222222; }
 input[type=range]::-webkit-slider-thumb { box-shadow: 2px 2px 3px #111111; border: 0px solid #000000; height: 18px; width: 40px; border-radius: 2px; background: var(--highlight-color); cursor: pointer; appearance: none; margin-top: 0px; }
@ -14,10 +32,6 @@ input[type=range]::-moz-range-thumb { box-shadow: 2px 2px 3px #111111; border: 0
 ::-webkit-scrollbar-thumb { background-color: var(--highlight-color); border-radius: 2px; border-width: 0; box-shadow: 2px 2px 3px #111111; }
 div.form { border-width: 0; box-shadow: none; background: transparent; overflow: visible; gap: 0.5em; margin-bottom: 6px; }

-/* gradio shadowroot */
-.gradio-container { font-family: var(--font); --left-column: 490px; --highlight-color: #CE6400; --inactive-color: #4E1400; }
-
-
 /* gradio style classes */
 fieldset .gr-block.gr-box, label.block span { padding: 0; margin-top: -4px; }
 .border-2 { border-width: 0; }
@ -27,11 +41,11 @@ fieldset .gr-block.gr-box, label.block span { padding: 0; margin-top: -4px; }
 .gr-button { font-weight: normal; box-shadow: 2px 2px 3px #111111; font-size: 0.8rem; min-width: 32px; min-height: 32px; padding: 3px; margin: 3px; }
 .gr-check-radio { background-color: var(--inactive-color); border-width: 0; border-radius: 2px; box-shadow: 2px 2px 3px #111111; }
 .gr-check-radio:checked { background-color: var(--highlight-color); }
-.gr-compact { background-color: black; }
+.gr-compact { background-color: var(--background-color); }
 .gr-form { border-width: 0; }
 .gr-input { background-color: #333333 !important; padding: 4px; margin: 4px; }
 .gr-input-label { color: lightyellow; border-width: 0; background: transparent; padding: 2px !important; }
-.gr-panel { background-color: black; }
+.gr-panel { background-color: var(--background-color); }
 .eta-bar { display: none !important }
 svg.feather.feather-image, .feather .feather-image { display: none }
 .gap-2 { padding-top: 8px; }
@ -42,9 +56,9 @@ svg.feather.feather-image, .feather .feather-image { display: none }
 .p-2 { padding: 0; }
 .px-4 { padding-lefT: 1rem; padding-right: 1rem; }
 .py-6 { padding-bottom: 0; }
-.tabs { background-color: black; }
+.tabs { background-color: var(--background-color); }
 .block.token-counter span { background-color: #222 !important; box-shadow: 2px 2px 2px #111; border: none !important; font-size: 0.8rem; }
-.tab-nav { zoom: 120%; margin-bottom: 10px; border-bottom: 2px solid #CE6400 !important; padding-bottom: 2px; }
+.tab-nav { zoom: 120%; margin-bottom: 10px; border-bottom: 2px solid var(--highlight-color) !important; padding-bottom: 2px; }
 .label-wrap { margin: 16px 0px 8px 0px; }
 .gradio-slider input[type="number"] { width: 4.5em; font-size: 0.8rem; }
 #tab_extensions table td, #tab_extensions table th { border: none; padding: 0.5em; }
@ -56,13 +70,13 @@ svg.feather.feather-image, .feather .feather-image { display: none }
 .progressDiv .progress { border-radius: 0 !important; background: var(--highlight-color); line-height: 3rem; height: 48px; }
 .gallery-item { box-shadow: none !important; }
 .performance { color: #888; }
-.extra-networks { border-left: 2px solid #CE6400 !important; padding-left: 4px; }
+.extra-networks { border-left: 2px solid var(--highlight-color) !important; padding-left: 4px; }
 .image-buttons { gap: 10px !important}

 /* gradio elements overrides */
 #div.gradio-container { overflow-x: hidden; }
 #img2img_label_copy_to_img2img { font-weight: normal; }
-#txt2img_prompt, #txt2img_neg_prompt, #img2img_prompt, #img2img_neg_prompt { background-color: black; box-shadow: 4px 4px 4px 0px #333333  !important; }
+#txt2img_prompt, #txt2img_neg_prompt, #img2img_prompt, #img2img_neg_prompt { background-color: var(--background-color); box-shadow: 4px 4px 4px 0px #333333  !important; }
 #txt2img_prompt > label > textarea, #txt2img_neg_prompt > label > textarea, #img2img_prompt > label > textarea, #img2img_neg_prompt > label > textarea { font-size: 1.2rem; }
 #img2img_settings { min-width: calc(2 * var(--left-column)); max-width: calc(2 * var(--left-column)); background-color: #111111; padding-top: 16px; }
 #interrogate, #deepbooru { margin: 0 0px 10px 0px; max-width: 80px; max-height: 80px; font-weight: normal; font-size: 0.95em; }
@ -82,7 +96,7 @@ svg.feather.feather-image, .feather .feather-image { display: none }

 #extras_upscale { margin-top: 10px }
 #txt2img_progress_row > div { min-width: var(--left-column); max-width: var(--left-column); }
-#txt2img_results, #img2img_results, #extras_results { background-color: black; padding: 0; }
+#txt2img_results, #img2img_results, #extras_results { background-color: var(--background-color); padding: 0; }
 #txt2img_seed_row { padding: 0; margin-top: 8px; }
 #txt2img_settings { min-width: var(--left-column); max-width: var(--left-column); background-color: #111111; padding-top: 16px; }
 #txt2img_subseed_row { padding: 0; margin-top: 16px; }
@ -97,13 +111,13 @@ svg.feather.feather-image, .feather .feather-image { display: none }

 /* based on gradio built-in dark theme */
 :root, .light, .dark {
-  --body-background-fill: black;
+  --body-background-fill: var(--background-color);
  --body-text-color: var(--neutral-100);
  --color-accent-soft: var(--neutral-700);
  --background-fill-primary: #222222;
  --background-fill-secondary: none;
-  --border-color-accent: black;
-  --border-color-primary: black;
+  --border-color-accent: var(--background-color);
+  --border-color-primary: var(--background-color);
  --link-text-color-active: var(--secondary-500);
  --link-text-color: var(--secondary-500);
  --link-text-color-hover: var(--secondary-400);
@ -183,17 +197,6 @@ svg.feather.feather-image, .feather .feather-image { display: none }
  --button-secondary-border-color-hover: var(--button-secondary-border-color);
  --button-secondary-text-color: white;
  --button-secondary-text-color-hover: var(--button-secondary-text-color);
-  --primary-50: #fff7ed;
-  --primary-100: #ffedd5;
-  --primary-200: #fed7aa;
-  --primary-300: #fdba74;
-  --primary-400: #fb923c;
-  --primary-500: #f97316;
-  --primary-600: #ea580c;
-  --primary-700: #c2410c;
-  --primary-800: #9a3412;
-  --primary-900: #7c2d12;
-  --primary-950: #6c2e12;
  --secondary-50: #eff6ff;
  --secondary-100: #dbeafe;
  --secondary-200: #bfdbfe;
--- a/modules/api/api.py
+++ b/modules/api/api.py
@ -406,17 +406,15 @@ class Api:

    def interruptapi(self):
        shared.state.interrupt()
-
        return {}

    def unloadapi(self):
-        unload_model_weights()
-
+        unload_model_weights(op='model')
+        unload_model_weights(op='refiner')
        return {}

    def reloadapi(self):
        reload_model_weights()
-
        return {}

    def skip(self):
--- a/modules/processing.py
+++ b/modules/processing.py
@ -223,7 +223,8 @@ class StableDiffusionProcessing:
        source_image = devices.cond_cast_float(source_image)
        # HACK: Using introspection as the Depth2Image model doesn't appear to uniquely
        # identify itself with a field common to all models. The conditioning_key is also hybrid.
-        if backend == Backend.DIFFUSERS: # TODO: Diffusers img2img_image_conditioning
+        if backend == Backend.DIFFUSERS:
+            log.warning('Diffusers not implemented: img2img_image_conditioning')
            return None
        if isinstance(self.sd_model, LatentDepth2ImageDiffusion):
            return self.depth2img_image_conditioning(source_image)
@ -682,11 +683,6 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
                x_samples_ddim = torch.stack(x_samples_ddim).float()
                x_samples_ddim = torch.clamp((x_samples_ddim + 1.0) / 2.0, min=0.0, max=1.0)
                del samples_ddim
-                if shared.cmd_opts.lowvram or shared.cmd_opts.medvram:
-                    lowvram.send_everything_to_cpu()
-                    devices.torch_gc()
-                if p.scripts is not None:
-                    p.scripts.postprocess_batch(p, x_samples_ddim, batch_number=n)

            elif backend == Backend.DIFFUSERS:
                generator_device = 'cpu' if shared.opts.diffusers_generator_device == "cpu" else shared.device
@ -695,7 +691,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
                    sampler = sd_samplers.all_samplers_map.get(p.sampler_name, None)
                    if sampler is None:
                        sampler = sd_samplers.all_samplers_map.get("UniPC")
-                    shared.sd_model.scheduler = sd_samplers.create_sampler(sampler.name, shared.sd_model) # TODO(Patrick): For wrapped pipelines this is currently a no-op
+                    # shared.sd_model.scheduler = sd_samplers.create_sampler(sampler.name, shared.sd_model) # TODO(Patrick): For wrapped pipelines this is currently a no-op

                cross_attention_kwargs={}
                if lora_state['active']:
@ -708,23 +704,48 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
                elif sd_models.get_diffusers_task(shared.sd_model) == sd_models.DiffusersTaskType.INPAINTING:
                    # TODO(PVP): change out to latents once possible with `diffusers`
                    task_specific_kwargs = {"image": p.init_images[0], "mask_image": p.image_mask, "strength": p.denoising_strength}
-                output = shared.sd_model(
+
+                output = shared.sd_model( # pylint: disable=not-callable
                    prompt=prompts,
                    negative_prompt=negative_prompts,
                    num_inference_steps=p.steps,
                    guidance_scale=p.cfg_scale,
                    generator=generator,
-                    output_type="np",
+                    output_type='np' if shared.sd_refiner is None else 'latent',
                    cross_attention_kwargs=cross_attention_kwargs,
                    **task_specific_kwargs
                )
+
+                if shared.sd_refiner is not None:
+                    init_image = output.images[0]
+                    output = shared.sd_refiner( # pylint: disable=not-callable
+                        prompt=prompts,
+                        negative_prompt=negative_prompts,
+                        num_inference_steps=p.steps,
+                        guidance_scale=p.cfg_scale,
+                        generator=generator,
+                        output_type='np',
+                        cross_attention_kwargs=cross_attention_kwargs,
+                        image=init_image
+                    )
+
                x_samples_ddim = output.images
+
+                if p.enable_hr:
+                    log.warning('Diffusers not implemented: hires fix')
+
                if lora_state['active']:
                    unload_diffusers_lora()

            else:
                raise ValueError(f"Unknown backend {backend}")

+            if shared.cmd_opts.lowvram or shared.cmd_opts.medvram:
+                lowvram.send_everything_to_cpu()
+                devices.torch_gc()
+            if p.scripts is not None:
+                p.scripts.postprocess_batch(p, x_samples_ddim, batch_number=n)
+
            for i, x_sample in enumerate(x_samples_ddim):
                p.batch_index = i
                if backend == Backend.ORIGINAL:
@ -898,7 +919,23 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
            if self.hr_upscaler is not None:
                self.extra_generation_params["Hires upscaler"] = self.hr_upscaler

-    def sample(self, conditioning, unconditional_conditioning, seeds, subseeds, subseed_strength, prompts): # TODO this is majority of processing time
+    def sample(self, conditioning, unconditional_conditioning, seeds, subseeds, subseed_strength, prompts):
+
+        def save_intermediate(image, index):
+            """saves image before applying hires fix, if enabled in options; takes as an argument either an image or batch with latent space images"""
+            if not opts.save or self.do_not_save_samples or not opts.save_images_before_highres_fix:
+                return
+            if not isinstance(image, Image.Image):
+                image = sd_samplers.sample_to_image(image, index, approximation=0)
+            orig1 = self.extra_generation_params
+            orig2 = self.restore_faces
+            self.extra_generation_params = {}
+            self.restore_faces = False
+            info = create_infotext(self, self.all_prompts, self.all_seeds, self.all_subseeds, [], iteration=self.iteration, position_in_batch=index)
+            self.extra_generation_params = orig1
+            self.restore_faces = orig2
+            images.save_image(image, self.outpath_samples, "", seeds[index], prompts[index], opts.samples_format, info=info, suffix="-before-highres-fix")
+
        if backend == Backend.DIFFUSERS:
            sd_models.set_diffuser_pipe(self.sd_model, sd_models.DiffusersTaskType.TEXT_2_IMAGE)

@ -916,21 +953,6 @@ class StableDiffusionProcessingTxt2Img(StableDiffusionProcessing):
        target_width = self.hr_upscale_to_x
        target_height = self.hr_upscale_to_y

-        def save_intermediate(image, index):
-            """saves image before applying hires fix, if enabled in options; takes as an argument either an image or batch with latent space images"""
-            if not opts.save or self.do_not_save_samples or not opts.save_images_before_highres_fix:
-                return
-            if not isinstance(image, Image.Image):
-                image = sd_samplers.sample_to_image(image, index, approximation=0)
-            orig1 = self.extra_generation_params
-            orig2 = self.restore_faces
-            self.extra_generation_params = {}
-            self.restore_faces = False
-            info = create_infotext(self, self.all_prompts, self.all_seeds, self.all_subseeds, [], iteration=self.iteration, position_in_batch=index)
-            self.extra_generation_params = orig1
-            self.restore_faces = orig2
-            images.save_image(image, self.outpath_samples, "", seeds[index], prompts[index], opts.samples_format, info=info, suffix="-before-highres-fix")
-
        if latent_scale_mode is not None:
            for i in range(samples.shape[0]):
                save_intermediate(samples, i)
--- a/modules/sd_models.py
+++ b/modules/sd_models.py
@ -23,6 +23,7 @@ from modules.timer import Timer
 from modules.memstats import memory_stats
 from modules.paths_internal import models_path

+
 transformers_logging.set_verbosity_error()
 model_dir = "Stable-diffusion"
 model_path = os.path.abspath(os.path.join(paths.models_path, model_dir))
@ -127,7 +128,9 @@ def list_models():
    checkpoints_list.clear()
    checkpoint_aliases.clear()
    ext_filter=[".safetensors"] if shared.opts.sd_disable_ckpt else [".ckpt", ".safetensors"]
-    model_list = modelloader.load_models(model_path=model_path, model_url=None, command_path=shared.opts.ckpt_dir, ext_filter=ext_filter, download_name=None, ext_blacklist=[".vae.ckpt", ".vae.safetensors"])
+    model_list = []
+    if shared.backend == shared.Backend.ORIGINAL or shared.opts.diffusers_pipeline == shared.pipelines[0]:
+        model_list += modelloader.load_models(model_path=model_path, model_url=None, command_path=shared.opts.ckpt_dir, ext_filter=ext_filter, download_name=None, ext_blacklist=[".vae.ckpt", ".vae.safetensors"])
    if shared.backend == shared.Backend.DIFFUSERS:
        model_list += modelloader.load_diffusers_models(model_path=os.path.join(models_path, 'Diffusers'), command_path=shared.opts.diffusers_dir)

@ -210,11 +213,18 @@ def model_hash(filename):
        return 'NOHASH'


-def select_checkpoint(model=True):
-    model_checkpoint = shared.opts.sd_model_checkpoint if model else shared.opts.sd_model_dict
+def select_checkpoint(op='model'):
+    if op == 'model':
+        model_checkpoint = shared.opts.sd_model_checkpoint
+    elif op == 'dict':
+        model_checkpoint = shared.opts.sd_model_dict
+    elif op == 'refiner':
+        model_checkpoint = shared.opts.data['sd_model_refiner']
+    if model_checkpoint is None or model_checkpoint == 'None':
+        return None
    checkpoint_info = get_closet_checkpoint_match(model_checkpoint)
    if checkpoint_info is not None:
-        shared.log.debug(f'Select checkpoint: {checkpoint_info.title if checkpoint_info is not None else None}')
+        shared.log.debug(f'Select checkpoint: {op} {checkpoint_info.title if checkpoint_info is not None else None}')
        return checkpoint_info
    if len(checkpoints_list) == 0:
        shared.log.error("Cannot run without a checkpoint")
@ -458,9 +468,10 @@ sd1_clip_weight = 'cond_stage_model.transformer.text_model.embeddings.token_embe
 sd2_clip_weight = 'cond_stage_model.model.transformer.resblocks.0.attn.in_proj_weight'


-class SdModelData:
+class ModelData:
    def __init__(self):
        self.sd_model = None
+        self.sd_refiner = None
        self.sd_dict = 'None'
        self.initial = True
        self.lock = threading.Lock()
@ -470,9 +481,9 @@ class SdModelData:
            with self.lock:
                try:
                    if shared.backend == shared.Backend.ORIGINAL:
-                        reload_model_weights()
+                        reload_model_weights(op='model')
                    elif shared.backend == shared.Backend.DIFFUSERS:
-                        load_diffuser()
+                        load_diffuser(op='model')
                    else:
                        shared.log.error(f"Unknown Stable Diffusion backend: {shared.backend}")
                    self.initial = False
@ -483,11 +494,31 @@ class SdModelData:
        return self.sd_model

    def set_sd_model(self, v):
+        shared.log.debug(f"Class model: {v}")
        self.sd_model = v

+    def get_sd_refiner(self):
+        if self.sd_model is None:
+            with self.lock:
+                try:
+                    if shared.backend == shared.Backend.ORIGINAL:
+                        reload_model_weights(op='refiner')
+                    elif shared.backend == shared.Backend.DIFFUSERS:
+                        load_diffuser(op='refiner')
+                    else:
+                        shared.log.error(f"Unknown Stable Diffusion backend: {shared.backend}")
+                    self.initial = False
+                except Exception as e:
+                    shared.log.error("Failed to load stable diffusion model")
+                    errors.display(e, "loading stable diffusion model")
+                    self.sd_refiner = None
+        return self.sd_refiner

-model_data = SdModelData()
+    def set_sd_refiner(self, v):
+        shared.log.debug(f"Class refiner: {v}")
+        self.sd_refiner = v

+model_data = ModelData()

 class PriorPipeline:
    def __init__(self, prior, main):
@ -531,7 +562,7 @@ class PriorPipeline:
        return result


-def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=None): # pylint: disable=unused-argument
+def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=None, op='model'): # pylint: disable=unused-argument
    if timer is None:
        timer = Timer()
    import logging
@ -548,27 +579,61 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No

    if shared.opts.data['sd_model_checkpoint'] == 'model.ckpt':
        shared.opts.data['sd_model_checkpoint'] = "runwayml/stable-diffusion-v1-5"
+
+    if op == 'model' or op == 'dict':
+        if model_data.sd_model is not None and (checkpoint_info.hash == model_data.sd_model.sd_checkpoint_info.hash): # trying to load the same model
+            return
+    else:
+        if model_data.sd_refiner is not None and (checkpoint_info.hash == model_data.sd_refiner.sd_checkpoint_info.hash): # trying to load the same model
+            return
+
    sd_model = None
    try:
-        devices.set_cuda_params() # todo
+        devices.set_cuda_params()
        if shared.cmd_opts.ckpt is not None and model_data.initial: # initial load
            model_name = modelloader.find_diffuser(shared.cmd_opts.ckpt)
            if model_name is not None:
-                shared.log.info(f'Loading diffuser model: {model_name}')
+                shared.log.info(f'Loading diffuser {op}: {model_name}')
                model_file = modelloader.download_diffusers_model(hub_id=model_name)
                sd_model = diffusers.DiffusionPipeline.from_pretrained(model_file, **diffusers_load_config)
                list_models() # rescan for downloaded model
                checkpoint_info = CheckpointInfo(model_name)

        if sd_model is None:
-            checkpoint_info = checkpoint_info or select_checkpoint()
-            shared.log.info(f'Loading diffuser model: {checkpoint_info.filename}')
+            checkpoint_info = checkpoint_info or select_checkpoint(op=op)
+            if checkpoint_info is None:
+                unload_model_weights(op=op)
+                return
+            shared.log.info(f'Loading diffuser {op}: {checkpoint_info.filename}')
            if not os.path.isfile(checkpoint_info.path):
                sd_model = diffusers.DiffusionPipeline.from_pretrained(checkpoint_info.path, **diffusers_load_config)
            else:
                diffusers_load_config["local_files_only "] = True
                diffusers_load_config["extract_ema"] = shared.opts.diffusers_extract_ema
-                sd_model = diffusers.StableDiffusionPipeline.from_ckpt(checkpoint_info.path, **diffusers_load_config)
+                try:
+                    # pipelines = ['Stable Diffusion', 'Stable Diffusion XL', 'Kandinsky V1', 'Kandinsky V2', 'DeepFloyd IF', 'Shap-E']
+                    if shared.opts.diffusers_pipeline == shared.pipelines[0]:
+                        pipeline = diffusers.StableDiffusionPipeline
+                    elif shared.opts.diffusers_pipeline == shared.pipelines[1]:
+                        pipeline = diffusers.StableDiffusionXLPipeline
+                    elif shared.opts.diffusers_pipeline == shared.pipelines[2]:
+                        pipeline = diffusers.KandinskyPipeline
+                    elif shared.opts.diffusers_pipeline == shared.pipelines[3]:
+                        pipeline = diffusers.KandinskyV22Pipeline
+                    elif shared.opts.diffusers_pipeline == shared.pipelines[4]:
+                        pipeline = diffusers.IFPipeline
+                    elif shared.opts.diffusers_pipeline == shared.pipelines[5]:
+                        pipeline = diffusers.ShapEPipeline
+                    else:
+                        shared.log.error(f'Diffusers unknown pipeline: {shared.opts.diffusers_pipeline}')
+                except Exception as e:
+                    shared.log.error(f'Diffusers failed initializing pipeline: {shared.opts.diffusers_pipeline} {e}')
+                    return
+                try:
+                    sd_model = pipeline.from_ckpt(checkpoint_info.path, **diffusers_load_config)
+                except Exception as e:
+                    shared.log.error(f'Diffusers failed loading model using pipeline: {checkpoint_info.path} {shared.opts.diffusers_pipeline} {e}')
+                    return

        if "StableDiffusion" in sd_model.__class__.__name__:
            pass # scheduler is created on first use
@ -615,7 +680,7 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No
            sd_model.unet.to(memory_format=torch.channels_last)
        if shared.opts.cuda_compile and torch.cuda.is_available():
            sd_model.to(devices.device)
-            import torch._dynamo # pylint: disable=unused-import
+            import torch._dynamo # pylint: disable=unused-import,redefined-outer-name
            log_level = logging.WARNING if shared.opts.cuda_compile_verbose else logging.CRITICAL # pylint: disable=protected-access
            torch._logging.set_logs(dynamo=log_level, aot=log_level, inductor=log_level) # pylint: disable=protected-access
            torch._dynamo.config.verbose = shared.opts.cuda_compile_verbose # pylint: disable=protected-access
@ -632,7 +697,11 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No
    except Exception as e:
        shared.log.error("Failed to load diffusers model")
        errors.display(e, "loading Diffusers model")
-    shared.sd_model = sd_model
+
+    if op == 'refiner':
+        model_data.sd_refiner = sd_model
+    else:
+        model_data.sd_model = sd_model

    from modules.textual_inversion import textual_inversion
    embedding_db = textual_inversion.EmbeddingDatabase()
@ -685,7 +754,7 @@ def set_diffuser_pipe(pipe, new_pipe_type):
    new_pipe.sd_model_checkpoint = sd_model_checkpoint
    new_pipe.sd_model_hash = sd_model_hash

-    shared.sd_model = new_pipe
+    model_data.sd_model = new_pipe
    shared.log.info(f"Pipeline class changed from {pipe.__class__.__name__} to {new_pipe_cls.__name__}")


@ -700,21 +769,32 @@ def get_diffusers_task(pipe: diffusers.DiffusionPipeline) -> DiffusersTaskType:
    return DiffusersTaskType.TEXT_2_IMAGE


-def load_model(checkpoint_info=None, already_loaded_state_dict=None, timer=None):
+def load_model(checkpoint_info=None, already_loaded_state_dict=None, timer=None, op='model'):
    from modules import lowvram, sd_hijack
-    checkpoint_info = checkpoint_info or select_checkpoint()
+    checkpoint_info = checkpoint_info or select_checkpoint(op=op)
    if checkpoint_info is None:
        return
-    if model_data.sd_model is not None and (checkpoint_info.hash == model_data.sd_model.sd_checkpoint_info.hash): # trying to load the same model
-        return
-    shared.log.debug(f'Load model: name={checkpoint_info.filename} dict={already_loaded_state_dict is not None}')
+    if op == 'model' or op == 'dict':
+        if model_data.sd_model is not None and (checkpoint_info.hash == model_data.sd_model.sd_checkpoint_info.hash): # trying to load the same model
+            return
+    else:
+        if model_data.sd_refiner is not None and (checkpoint_info.hash == model_data.sd_refiner.sd_checkpoint_info.hash): # trying to load the same model
+            return
+    shared.log.debug(f'Load {op}: name={checkpoint_info.filename} dict={already_loaded_state_dict is not None}')
    if timer is None:
        timer = Timer()
    current_checkpoint_info = None
-    if model_data.sd_model is not None:
-        sd_hijack.model_hijack.undo_hijack(model_data.sd_model)
-        current_checkpoint_info = model_data.sd_model.sd_checkpoint_info
-        unload_model_weights()
+    if op == 'model' or op == 'dict':
+        if model_data.sd_model is not None:
+            sd_hijack.model_hijack.undo_hijack(model_data.sd_model)
+            current_checkpoint_info = model_data.sd_model.sd_checkpoint_info
+            unload_model_weights(op=op)
+    else:
+        if model_data.sd_refiner is not None:
+            sd_hijack.model_hijack.undo_hijack(model_data.sd_refiner)
+            current_checkpoint_info = model_data.sd_refiner.sd_checkpoint_info
+            unload_model_weights(op=op)
+
    do_inpainting_hijack()
    devices.set_cuda_params()
    if already_loaded_state_dict is not None:
@ -760,7 +840,10 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None, timer=None)
        sd_model = torch.xpu.optimize(sd_model, dtype=devices.dtype, auto_kernel_selection=True, optimize_lstm=True,
        graph_mode=True if shared.opts.cuda_compile and shared.opts.cuda_compile_mode == 'ipex' else False)
        shared.log.info("Applied IPEX Optimize")
-    model_data.sd_model = sd_model
+    if op == 'refiner':
+        model_data.sd_refiner = sd_model
+    else:
+        model_data.sd_model = sd_model
    sd_hijack.model_hijack.embedding_db.load_textual_inversion_embeddings(force_reload=True)  # Reload embeddings after model load as they may or may not fit the model
    timer.record("embeddings")
    script_callbacks.model_loaded_callback(sd_model)
@ -771,7 +854,7 @@ def load_model(checkpoint_info=None, already_loaded_state_dict=None, timer=None)
    shared.log.info(f'Model load finished: {memory_stats()} cached={len(checkpoints_loaded.keys())}')


-def reload_model_weights(sd_model=None, info=None, reuse_dict=False):
+def reload_model_weights(sd_model=None, info=None, reuse_dict=False, op='model'):
    load_dict = shared.opts.sd_model_dict != model_data.sd_dict
    global skip_next_load # pylint: disable=global-statement
    if skip_next_load:
@ -779,15 +862,18 @@ def reload_model_weights(sd_model=None, info=None, reuse_dict=False):
        skip_next_load = False
        return
    from modules import lowvram, sd_hijack
-    checkpoint_info = info or select_checkpoint(model=not load_dict) # are we selecting model or dictionary
-    next_checkpoint_info = info or select_checkpoint(model=load_dict) if load_dict else None
+    checkpoint_info = info or select_checkpoint(op=op) # are we selecting model or dictionary
+    next_checkpoint_info = info or select_checkpoint(op='dict' if load_dict else 'model') if load_dict else None
+    if checkpoint_info is None:
+        unload_model_weights(op=op)
+        return
    if load_dict:
        shared.log.debug(f'Model dict: existing={sd_model is not None} target={checkpoint_info.filename} info={info}')
    else:
        model_data.sd_dict = 'None'
        shared.log.debug(f'Load model weights: existing={sd_model is not None} target={checkpoint_info.filename} info={info}')
    if not sd_model:
-        sd_model = model_data.sd_model
+        sd_model = model_data.sd_model if op == 'model' or op == 'dict' else model_data.sd_refiner
    if sd_model is None:  # previous model load failed
        current_checkpoint_info = None
    else:
@ -802,7 +888,7 @@ def reload_model_weights(sd_model=None, info=None, reuse_dict=False):
        shared.log.info('Reusing previous model dictionary')
        sd_hijack.model_hijack.undo_hijack(sd_model)
    else:
-        unload_model_weights()
+        unload_model_weights(op=op)
        sd_model = None
    timer = Timer()
    state_dict = get_checkpoint_state_dict(checkpoint_info, timer)
@ -811,14 +897,14 @@ def reload_model_weights(sd_model=None, info=None, reuse_dict=False):
    if sd_model is None or checkpoint_config != sd_model.used_config:
        del sd_model
        if shared.backend == shared.Backend.ORIGINAL:
-            load_model(checkpoint_info, already_loaded_state_dict=state_dict, timer=timer)
+            load_model(checkpoint_info, already_loaded_state_dict=state_dict, timer=timer, op=op)
        else:
-            load_diffuser(checkpoint_info, already_loaded_state_dict=state_dict, timer=timer)
+            load_diffuser(checkpoint_info, already_loaded_state_dict=state_dict, timer=timer, op=op)
        if load_dict and next_checkpoint_info is not None:
            model_data.sd_dict = shared.opts.sd_model_dict
            shared.opts.data["sd_model_checkpoint"] = next_checkpoint_info.title
            reload_model_weights(reuse_dict=True) # ok we loaded dict now lets redo and load model on top of it
-        return model_data.sd_model
+        return model_data.sd_model if op == 'model' or op == 'dict' else model_data.sd_refiner
    try:
        load_model_weights(sd_model, checkpoint_info, state_dict, timer)
    except Exception:
@ -835,17 +921,22 @@ def reload_model_weights(sd_model=None, info=None, reuse_dict=False):
    shared.log.info(f"Weights loaded in {timer.summary()}")


-def unload_model_weights(sd_model=None, _info=None):
+def unload_model_weights(op='model'):
    from modules import sd_hijack
-    if model_data.sd_model:
-        model_data.sd_model.to(devices.cpu)
-        if shared.backend == shared.Backend.ORIGINAL:
-            sd_hijack.model_hijack.undo_hijack(model_data.sd_model)
-        sd_model = None
-        model_data.sd_model = None
-        devices.torch_gc(force=True)
-        shared.log.debug(f'Model weights unloaded: {memory_stats()}')
-    return sd_model
+    if op == 'model' or op == 'dict':
+        if model_data.sd_model:
+            model_data.sd_model.to(devices.cpu)
+            if shared.backend == shared.Backend.ORIGINAL:
+                sd_hijack.model_hijack.undo_hijack(model_data.sd_model)
+            model_data.sd_model = None
+    else:
+        if model_data.sd_refiner:
+            model_data.sd_refiner.to(devices.cpu)
+            if shared.backend == shared.Backend.ORIGINAL:
+                sd_hijack.model_hijack.undo_hijack(model_data.sd_refiner)
+            model_data.sd_refiner = None
+    shared.log.debug(f'Weights unloaded {op}: {memory_stats()}')
+    devices.torch_gc(force=True)


 def apply_token_merging(sd_model, token_merging_ratio):
--- a/modules/shared.py
+++ b/modules/shared.py
@ -38,6 +38,7 @@ hypernetworks = {}
 loaded_hypernetworks = []
 gradio_theme = gr.themes.Base()
 settings_components = None
+pipelines = ['Stable Diffusion', 'Stable Diffusion XL', 'Kandinsky V1', 'Kandinsky V2', 'DeepFloyd IF', 'Shap-E']
 latent_upscale_default_mode = "Latent"
 latent_upscale_modes = {
    "Latent": {"mode": "bilinear", "antialias": False},
@ -217,6 +218,10 @@ class OptionInfo:
        self.comment_after += f"<span class='info'>({info})</span>"
        return self

+    def html(self, info):
+        self.comment_after += f"<span class='info'>{info}</span>"
+        return self
+
    def needs_restart(self):
        self.comment_after += " <span class='info'>(requires restart)</span>"
        return self
@ -295,8 +300,9 @@ else: # cuda
    cross_attention_optimization_default ="Scaled-Dot-Product"

 options_templates.update(options_section(('sd', "Stable Diffusion"), {
-    "sd_model_checkpoint": OptionInfo(default_checkpoint, "Stable Diffusion checkpoint", gr.Dropdown, lambda: {"choices": list_checkpoint_tiles()}, refresh=refresh_checkpoints),
    "sd_checkpoint_autoload": OptionInfo(True, "Stable Diffusion checkpoint autoload on server start"),
+    "sd_model_checkpoint": OptionInfo(default_checkpoint, "Stable Diffusion checkpoint", gr.Dropdown, lambda: {"choices": list_checkpoint_tiles()}, refresh=refresh_checkpoints),
+    "sd_model_refiner": OptionInfo('None', "Stable Diffusion refiner", gr.Dropdown, lambda: {"choices": ['None'] + list_checkpoint_tiles()}, refresh=refresh_checkpoints),
    "sd_checkpoint_cache": OptionInfo(0, "Number of cached model checkpoints", gr.Slider, {"minimum": 0, "maximum": 10, "step": 1}),
    "sd_vae_checkpoint_cache": OptionInfo(0, "Number of cached VAE checkpoints", gr.Slider, {"minimum": 0, "maximum": 10, "step": 1}),
    "sd_vae": OptionInfo("Automatic", "Select VAE", gr.Dropdown, lambda: {"choices": shared_items.sd_vae_items()}, refresh=shared_items.refresh_vae_list),
@ -341,10 +347,11 @@ options_templates.update(options_section(('cuda', "Compute Settings"), {
    "cuda_compile_fullgraph": OptionInfo(False, "Model compile fullgraph"),
    "cuda_compile_verbose": OptionInfo(False, "Model compile verbose mode"),
    "cuda_compile_errors": OptionInfo(True, "Model compile suppress errors"),
-    "disable_gc": OptionInfo(False, "Disable Torch memory garbage collection"),
+    "disable_gc": OptionInfo(True, "Disable Torch memory garbage collection on each generation"),
 }))

 options_templates.update(options_section(('diffusers', "Diffusers Settings"), {
+    "diffusers_pipeline": OptionInfo(pipelines[0], 'Diffuser Pipeline', gr.Dropdown, lambda: {"choices": pipelines}),
    "diffusers_extract_ema": OptionInfo(True, "Use model EMA weights when possible"),
    "diffusers_generator_device": OptionInfo("default", "Generator device", gr.Radio, lambda: {"choices": ["default", "cpu"]}),
    "diffusers_seq_cpu_offload": OptionInfo(False, "Enable sequential CPU offload"),
@ -905,7 +912,6 @@ class Shared(sys.modules[__name__].__class__):
    @property
    def sd_model(self):
        import modules.sd_models # pylint: disable=W0621
-        # return modules.sd_models.model_data.sd_model
        return modules.sd_models.model_data.get_sd_model()

    @sd_model.setter
@ -913,6 +919,17 @@ class Shared(sys.modules[__name__].__class__):
        import modules.sd_models # pylint: disable=W0621
        modules.sd_models.model_data.set_sd_model(value)

-# sd_model: LatentDiffusion = None  # this var is here just for IDE's type checking; it cannot be accessed because the class field above will be accessed instead
+    @property
+    def sd_refiner(self):
+        import modules.sd_models # pylint: disable=W0621
+        return modules.sd_models.model_data.get_sd_refiner()
+
+    @sd_refiner.setter
+    def sd_refiner(self, value):
+        import modules.sd_models # pylint: disable=W0621
+        modules.sd_models.model_data.set_sd_refiner(value)
+
+
 sd_model = None
+sd_refiner = None
 sys.modules[__name__].__class__ = Shared
--- a/modules/textual_inversion/textual_inversion.py
+++ b/modules/textual_inversion/textual_inversion.py
@ -145,8 +145,12 @@ class EmbeddingDatabase:
            self.word_embeddings[name] = embedding
        except Exception:
            self.skipped_embeddings[name] = embedding
-        text_inv_tokens = pipe.tokenizer.added_tokens_encoder.keys()
-        text_inv_tokens = [t for t in text_inv_tokens if not (len(t.split("_")) > 1 and t.split("_")[-1].isdigit())]
+        try:
+            text_inv_tokens = pipe.tokenizer.added_tokens_encoder.keys()
+            text_inv_tokens = [t for t in text_inv_tokens if not (len(t.split("_")) > 1 and t.split("_")[-1].isdigit())]
+        except Exception:
+            text_inv_tokens = []
+            pass

    def load_from_file(self, path, filename):
        name, ext = os.path.splitext(filename)
--- a/modules/ui.py
+++ b/modules/ui.py
@ -1031,7 +1031,8 @@ def create_ui(startup_timer = None):
                create_dirty_indicator("show_all_pages", [], interactive=False)

        def unload_sd_weights():
-            modules.sd_models.unload_model_weights()
+            modules.sd_models.unload_model_weights(op='model')
+            modules.sd_models.unload_model_weights(op='refiner')

        def reload_sd_weights():
            modules.sd_models.reload_model_weights()
--- a/webui.py
+++ b/webui.py
@ -167,14 +167,18 @@ def load_model():
    if opts.sd_checkpoint_autoload:
        shared.state.begin()
        shared.state.job = 'load model'
-        thread = Thread(target=lambda: shared.sd_model)
-        thread.start()
+        thread_model = Thread(target=lambda: shared.sd_model)
+        thread_model.start()
+        thread_refiner = Thread(target=lambda: shared.sd_refiner)
+        thread_refiner.start()
        shared.state.end()
-        thread.join()
+        thread_model.join()
+        thread_refiner.join()
    else:
        log.debug('Model auto load disabled')
-    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights()), call=False)
-    shared.opts.onchange("sd_model_dict", wrap_queued_call(lambda: modules.sd_models.reload_model_weights()), call=False)
+    shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights(op='model')), call=False)
+    shared.opts.onchange("sd_model_refiner", wrap_queued_call(lambda: modules.sd_models.reload_model_weights(op='refiner')), call=False)
+    shared.opts.onchange("sd_model_dict", wrap_queued_call(lambda: modules.sd_models.reload_model_weights(op='dict')), call=False)
    startup_timer.record("checkpoint")


--- a/2
+++ b/2
@ -1 +1 @@
-Subproject commit f941746c0ed2afcaa37c1ec77b86da4dee131bee
+Subproject commit 28e3cc15ef4566564764fa73542ab6b00d2b0959