experimental segmoe support

2024-02-05 10:38:42 -05:00 · 2024-02-05 10:38:42 -05:00 · e32220ccc1
parent ff2c1db1cc
commit e32220ccc1
13 changed files with 1376 additions and 15 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,14 +1,14 @@
 # Change Log for SD.Next

-## Future
+## TODO Future

 - ipadapter multi image  
 - control second pass  
 - diffusers public callbacks  
 - image2video: pia and vgen pipelines  
+- video2video
 - wuerstchen v3 [pr](https://github.com/huggingface/diffusers/pull/6487)  
 - more pipelines: <https://github.com/huggingface/diffusers/blob/main/examples/community/README.md>  
- segmoe: <https://github.com/segmind/segmoe>
 - control api  
 - masking api  
 - preprocess api  
@ -21,12 +21,12 @@
 - update docs  
 - diffusers 0.26.2

-## Update for 2023-02-04
+## TODO Release notes

 Another big release, highlights being:  
 - A lot more functionality in the **Control** module:
  - Inpaint and outpaint support, flexible resizing options, optional hires  
-  - Built-in support for many new processors and models which are auto-downloaded on first use  
+  - Built-in support for many new processors and models, all auto-downloaded on first use  
  - Full support for scripts and extensions  
 - Complete **Face** module  
  implements all variations of **FaceID**, **FaceSwap** and latest **PhotoMaker** and **InstantID**  
@ -34,14 +34,21 @@ Another big release, highlights being:
 - Brand new **Intelligent masking**, manual or automatic  
  Using ML models (*LAMA* object removal, *REMBG* background removal, *SAM* segmentation, etc.) and with live previews  
  With granular blur, erode and dilate controls  
+- New models and pipelines:  
+  **Segmind SegMoE**, **Mixture Tiling**, **InstaFlow**, **SAG**, **BlipDiffusion**  
 - Massive work integrating latest advances with [OpenVINO](https://github.com/vladmandic/automatic/wiki/OpenVINO), [IPEX](https://github.com/vladmandic/automatic/wiki/Intel-ARC) and [ONNX Olive](https://github.com/vladmandic/automatic/wiki/ONNX-Runtime-&-Olive)
- **New models** and pipelines: *Mixture Tiling*, *SAG*, *InstaFlow*, *BlipDiffusion*  
 - Full control over brightness, sharpness and color during generate process directly in latent space  

 Plus welcome additions to **UI performance, usability and accessibility** and flexibility of deployment  
 And it also includes fixes for all reported issues so far  

-As of this release, default backend is set to **diffusers** as its more feature rich than **original** and supports many additional models  
+As of this release, default backend is set to **diffusers** as its more feature rich than **original** and supports many additional models (original backend does remain as fully supported)  
+
+- For basic instructions, see [README](https://github.com/vladmandic/automatic/blob/master/README.md)  
+- For more details on all new features see full [CHANGELOG](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md)  
+- For documentation, see [WIKI](https://github.com/vladmandic/automatic/wiki)
+
+## Update for 2023-02-04

 - **Control**:  
  - add **inpaint** support  
@ -129,6 +136,13 @@ As of this release, default backend is set to **diffusers** as its more feature
    **SD15**: Base, Base ViT-G, Light, Plus, Plus Face, Full Face  
    **SDXL**: Base SXDL, Base ViT-H SXDL, Plus ViT-H SXDL, Plus Face ViT-H SXDL  
  - enable use via api, thanks @trojaner  
+- [Segmind SegMoE](https://github.com/segmind/segmoe)  
+  - initial support for reference models  
+    download&load via network -> models -> reference -> **SegMoE SD 4x2** (3.7GB), **SegMoE XL 2x1** (10GB), **SegMoE XL 4x2**  
+  - note: since segmoe is basically sequential mix of unets from multiple models, it can get large  
+    SD 4x2 is ~4GB, XL 2x1 is ~10GB and XL 4x2 is 18GB  
+  - support for create and load custom mixes will be added in the future  
+  - support for lora and other advanced features will be added in the future  
 - [Mixture Tiling](https://arxiv.org/abs/2302.02412)  
  - uses multiple prompts to guide different parts of the grid during diffusion process  
  - can be used ot create complex scenes with multiple subjects  
--- a/extensions-builtin/Lora/network_lora.py
+++ b/extensions-builtin/Lora/network_lora.py
@ -42,7 +42,7 @@ class NetworkModuleLora(network.NetworkModule):
        elif is_conv and key == "lora_up.weight" or key == "dyn_down":
            module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], (1, 1), bias=False)
        else:
-            raise AssertionError(f'Lora layer {self.network_key} matched a layer with unsupported type: {type(self.sd_module).__name__}')
+            raise AssertionError(f'Lora unsupported: layer={self.network_key} type={type(self.sd_module).__name__}')
        with torch.no_grad():
            if weight.shape != module.weight.shape:
                weight = weight.reshape(module.weight.shape)
--- a/html/reference.json
+++ b/html/reference.json
@ -88,6 +88,21 @@
    "desc": "Segmind's Tiny-SD offers a compact, efficient, and distilled version of Realistic Vision 4.0 and is up to 80% faster than SD1.5",
    "preview": "segmind--tiny-sd.jpg"
  },
+  "Segmind SegMoE SD 4x2": {
+    "path": "segmind/SegMoE-SD-4x2-v0",
+    "desc": "SegMoE-SD-4x2-v0 is an untrained Segmind Mixture of Diffusion Experts Model generated using segmoe from 4 Expert SD1.5 models. SegMoE is a powerful framework for dynamically combining Stable Diffusion Models into a Mixture of Experts within minutes without training",
+    "preview": "segmind--SegMoE-SD-4x2-v0.jpg"
+  },
+  "Segmind SegMoE XL 2x1": {
+    "path": "segmind/SegMoE-2x1-v0",
+    "desc": "SegMoE-2x1-v0 is an untrained Segmind Mixture of Diffusion Experts Model generated using segmoe from 2 Expert SDXL models. SegMoE is a powerful framework for dynamically combining Stable Diffusion Models into a Mixture of Experts within minutes without training",
+    "preview": "segmind--SegMoE-2x1-v0.jpg"
+  },
+  "Segmind SegMoE XL 4x2": {
+    "path": "segmind/SegMoE-4x2-v0",
+    "desc": "SegMoE-4x2-v0 is an untrained Segmind Mixture of Diffusion Experts Model generated using segmoe from 4 Expert SDXL models. SegMoE is a powerful framework for dynamically combining Stable Diffusion Models into a Mixture of Experts within minutes without training",
+    "preview": "segmind--SegMoE-4x2-v0.jpg"
+  },
  "LCM SD-1.5 Dreamshaper 7": {
    "path": "SimianLuo/LCM_Dreamshaper_v7",
    "desc": "Latent Consistencey Models enable swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion. By distilling classifier-free guidance into the model's input, LCM can generate high-quality images in very short inference time. LCM can generate quality images in as few as 3-4 steps, making it blazingly fast.",
--- a/models/Reference/segmind--SegMoE-2x1-v0.jpg
+++ b/models/Reference/segmind--SegMoE-2x1-v0.jpg
--- a/models/Reference/segmind--SegMoE-4x2-v0.jpg
+++ b/models/Reference/segmind--SegMoE-4x2-v0.jpg
--- a/models/Reference/segmind--SegMoE-SD-4x2-v0.jpg
+++ b/models/Reference/segmind--SegMoE-SD-4x2-v0.jpg
--- a/modules/control/proc/marigold/util/ensemble.py
+++ b/modules/control/proc/marigold/util/ensemble.py
@ -74,7 +74,7 @@ def ensemble_depths(

    # objective function
    def closure(x):
-        l = len(x) # noqa
+        l = len(x)
        s = x[: int(l / 2)]
        t = x[int(l / 2) :]
        s = torch.from_numpy(s).to(dtype=dtype).to(device)
@ -102,7 +102,7 @@ def ensemble_depths(
        closure, x, method="BFGS", tol=tol, options={"maxiter": max_iter, "disp": False}
    )
    x = res.x
-    l = len(x) # noqa
+    l = len(x)
    s = x[: int(l / 2)]
    t = x[int(l / 2) :]

--- a/modules/face/instantid_model.py
+++ b/modules/face/instantid_model.py
@ -94,7 +94,7 @@ class PerceiverAttention(nn.Module):
        x = self.norm1(x)
        latents = self.norm2(latents)

-        b, l, _ = latents.shape # noqa:E741
+        b, l, _ = latents.shape

        q = self.to_q(latents)
        kv_input = torch.cat((x, latents), dim=-2)
--- a/modules/processing_diffusers.py
+++ b/modules/processing_diffusers.py
@ -158,7 +158,9 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
        if hasattr(model, "set_progress_bar_config"):
            model.set_progress_bar_config(bar_format='Progress {rate_fmt}{postfix} {bar} {percentage:3.0f}% {n_fmt}/{total_fmt} {elapsed} {remaining} ' + '\x1b[38;5;71m' + desc, ncols=80, colour='#327fba')
        args = {}
-        signature = inspect.signature(type(model).__call__)
+        if hasattr(model, 'pipe'): # recurse
+            model = model.pipe
+        signature = inspect.signature(type(model).__call__, follow_wrapped=True)
        possible = signature.parameters.keys()
        debug(f'Diffusers pipeline possible: {possible}')
        if shared.opts.diffusers_generator_device == "Unset":
@ -201,14 +203,15 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
        if 'generator' in possible and generator is not None:
            args['generator'] = generator
        if 'output_type' in possible:
-            args['output_type'] = 'np'
+            if hasattr(model, 'vae'):
+                args['output_type'] = 'np' # only set latent if model has vae
        if 'callback_steps' in possible:
            args['callback_steps'] = 1
        if 'callback' in possible:
            args['callback'] = diffusers_callback_legacy
        elif 'callback_on_step_end_tensor_inputs' in possible:
            args['callback_on_step_end'] = diffusers_callback
-            if 'prompt_embeds' in possible and 'negative_prompt_embeds' in possible:
+            if 'prompt_embeds' in possible and 'negative_prompt_embeds' in possible and hasattr(model, '_callback_tensor_inputs'):
                args['callback_on_step_end_tensor_inputs'] = model._callback_tensor_inputs # pylint: disable=protected-access
            else:
                args['callback_on_step_end_tensor_inputs'] = ['latents']
@ -405,7 +408,7 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
        desc='Base',
    )
    update_sampler(shared.sd_model)
-    shared.state.sampling_steps = base_args['num_inference_steps']
+    shared.state.sampling_steps = base_args.get('num_inference_steps', p.steps)
    p.extra_generation_params['Pipeline'] = shared.sd_model.__class__.__name__
    if shared.opts.scheduler_eta is not None and shared.opts.scheduler_eta > 0 and shared.opts.scheduler_eta < 1:
        p.extra_generation_params["Sampler Eta"] = shared.opts.scheduler_eta
--- a/modules/sd_models.py
+++ b/modules/sd_models.py
@ -586,6 +586,10 @@ def detect_pipeline(f: str, op: str = 'model', warning=True):
                if shared.backend == shared.Backend.ORIGINAL:
                    warn(f'Model detected as InstaFlow model, but attempting to load using backend=original: {op}={f} size={size} MB')
                guess = 'InstaFlow'
+            if 'SegMoE' in f:
+                if shared.backend == shared.Backend.ORIGINAL:
+                    warn(f'Model detected as SegMoE model, but attempting to load using backend=original: {op}={f} size={size} MB')
+                guess = 'SegMoE'
            if 'PixArt' in f:
                if shared.backend == shared.Backend.ORIGINAL:
                    warn(f'Model detected as PixArt Alpha model, but attempting to load using backend=original: {op}={f} size={size} MB')
@ -794,6 +798,14 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No
                except Exception as e:
                    shared.log.error(f'Diffusers Failed loading {op}: {checkpoint_info.path} {e}')
                    return
+            if model_type in ['SegMoE']: # forced pipeline
+                try:
+                    from modules.segmoe.segmoe_model import SegMoEPipeline
+                    sd_model = SegMoEPipeline(checkpoint_info.path, cache_dir=shared.opts.diffusers_dir, **diffusers_load_config)
+                    sd_model = sd_model.pipe # segmoe pipe does its stuff in __init__ and __call__ is the original pipeline
+                except Exception as e:
+                    shared.log.error(f'Diffusers Failed loading {op}: {checkpoint_info.path} {e}')
+                    return
            elif 'ONNX' in model_type: # forced pipeline
                sd_model = pipeline.from_pretrained(checkpoint_info.path)
            else:
--- a/modules/segmoe/segmoe_model.py
+++ b/modules/segmoe/segmoe_model.py
--- a/modules/shared_items.py
+++ b/modules/shared_items.py
@ -54,7 +54,8 @@ def get_pipelines():
        'ONNX Stable Diffusion XL': getattr(diffusers, 'OnnxStableDiffusionXLPipeline', None),
        'ONNX Stable Diffusion XL Img2Img': getattr(diffusers, 'OnnxStableDiffusionXLImg2ImgPipeline', None),
        'Custom Diffusers Pipeline': getattr(diffusers, 'DiffusionPipeline', None),
-        'InstaFlow': getattr(diffusers, 'StableDiffusionPipeline', None) # dynamically redefined and loaded in sd_models.load_diffuser
+        'InstaFlow': getattr(diffusers, 'StableDiffusionPipeline', None), # dynamically redefined and loaded in sd_models.load_diffuser
+        'SegMoE': getattr(diffusers, 'StableDiffusionPipeline', None), # dynamically redefined and loaded in sd_models.load_diffuser
        # Segmind SSD-1B, Segmind Tiny
    }

--- a/pyproject.toml
+++ b/pyproject.toml
@ -57,6 +57,7 @@ ignore = [
  "C408", # Rewrite as a literal
  "E402", # Module level import not at top of file
  "E721", # Do not compare types, use `isinstance()`
+  "E741", # Do not use variables named `l`, `O`, or `I`
  "EXE001", # Shebang present
  "F401", # Imported but unused
  "ISC003", # Implicit string concatenation