experimental segmoe support

pull/2803/head
Vladimir Mandic 2024-02-05 10:38:42 -05:00
parent ff2c1db1cc
commit e32220ccc1
13 changed files with 1376 additions and 15 deletions

View File

@ -1,14 +1,14 @@
# Change Log for SD.Next
## Future
## TODO Future
- ipadapter multi image
- control second pass
- diffusers public callbacks
- image2video: pia and vgen pipelines
- video2video
- wuerstchen v3 [pr](https://github.com/huggingface/diffusers/pull/6487)
- more pipelines: <https://github.com/huggingface/diffusers/blob/main/examples/community/README.md>
- segmoe: <https://github.com/segmind/segmoe>
- control api
- masking api
- preprocess api
@ -21,12 +21,12 @@
- update docs
- diffusers 0.26.2
## Update for 2023-02-04
## TODO Release notes
Another big release, highlights being:
- A lot more functionality in the **Control** module:
- Inpaint and outpaint support, flexible resizing options, optional hires
- Built-in support for many new processors and models which are auto-downloaded on first use
- Built-in support for many new processors and models, all auto-downloaded on first use
- Full support for scripts and extensions
- Complete **Face** module
implements all variations of **FaceID**, **FaceSwap** and latest **PhotoMaker** and **InstantID**
@ -34,14 +34,21 @@ Another big release, highlights being:
- Brand new **Intelligent masking**, manual or automatic
Using ML models (*LAMA* object removal, *REMBG* background removal, *SAM* segmentation, etc.) and with live previews
With granular blur, erode and dilate controls
- New models and pipelines:
**Segmind SegMoE**, **Mixture Tiling**, **InstaFlow**, **SAG**, **BlipDiffusion**
- Massive work integrating latest advances with [OpenVINO](https://github.com/vladmandic/automatic/wiki/OpenVINO), [IPEX](https://github.com/vladmandic/automatic/wiki/Intel-ARC) and [ONNX Olive](https://github.com/vladmandic/automatic/wiki/ONNX-Runtime-&-Olive)
- **New models** and pipelines: *Mixture Tiling*, *SAG*, *InstaFlow*, *BlipDiffusion*
- Full control over brightness, sharpness and color during generate process directly in latent space
Plus welcome additions to **UI performance, usability and accessibility** and flexibility of deployment
And it also includes fixes for all reported issues so far
As of this release, default backend is set to **diffusers** as its more feature rich than **original** and supports many additional models
As of this release, default backend is set to **diffusers** as its more feature rich than **original** and supports many additional models (original backend does remain as fully supported)
- For basic instructions, see [README](https://github.com/vladmandic/automatic/blob/master/README.md)
- For more details on all new features see full [CHANGELOG](https://github.com/vladmandic/automatic/blob/master/CHANGELOG.md)
- For documentation, see [WIKI](https://github.com/vladmandic/automatic/wiki)
## Update for 2023-02-04
- **Control**:
- add **inpaint** support
@ -129,6 +136,13 @@ As of this release, default backend is set to **diffusers** as its more feature
**SD15**: Base, Base ViT-G, Light, Plus, Plus Face, Full Face
**SDXL**: Base SXDL, Base ViT-H SXDL, Plus ViT-H SXDL, Plus Face ViT-H SXDL
- enable use via api, thanks @trojaner
- [Segmind SegMoE](https://github.com/segmind/segmoe)
- initial support for reference models
download&load via network -> models -> reference -> **SegMoE SD 4x2** (3.7GB), **SegMoE XL 2x1** (10GB), **SegMoE XL 4x2**
- note: since segmoe is basically sequential mix of unets from multiple models, it can get large
SD 4x2 is ~4GB, XL 2x1 is ~10GB and XL 4x2 is 18GB
- support for create and load custom mixes will be added in the future
- support for lora and other advanced features will be added in the future
- [Mixture Tiling](https://arxiv.org/abs/2302.02412)
- uses multiple prompts to guide different parts of the grid during diffusion process
- can be used ot create complex scenes with multiple subjects

View File

@ -42,7 +42,7 @@ class NetworkModuleLora(network.NetworkModule):
elif is_conv and key == "lora_up.weight" or key == "dyn_down":
module = torch.nn.Conv2d(weight.shape[1], weight.shape[0], (1, 1), bias=False)
else:
raise AssertionError(f'Lora layer {self.network_key} matched a layer with unsupported type: {type(self.sd_module).__name__}')
raise AssertionError(f'Lora unsupported: layer={self.network_key} type={type(self.sd_module).__name__}')
with torch.no_grad():
if weight.shape != module.weight.shape:
weight = weight.reshape(module.weight.shape)

View File

@ -88,6 +88,21 @@
"desc": "Segmind's Tiny-SD offers a compact, efficient, and distilled version of Realistic Vision 4.0 and is up to 80% faster than SD1.5",
"preview": "segmind--tiny-sd.jpg"
},
"Segmind SegMoE SD 4x2": {
"path": "segmind/SegMoE-SD-4x2-v0",
"desc": "SegMoE-SD-4x2-v0 is an untrained Segmind Mixture of Diffusion Experts Model generated using segmoe from 4 Expert SD1.5 models. SegMoE is a powerful framework for dynamically combining Stable Diffusion Models into a Mixture of Experts within minutes without training",
"preview": "segmind--SegMoE-SD-4x2-v0.jpg"
},
"Segmind SegMoE XL 2x1": {
"path": "segmind/SegMoE-2x1-v0",
"desc": "SegMoE-2x1-v0 is an untrained Segmind Mixture of Diffusion Experts Model generated using segmoe from 2 Expert SDXL models. SegMoE is a powerful framework for dynamically combining Stable Diffusion Models into a Mixture of Experts within minutes without training",
"preview": "segmind--SegMoE-2x1-v0.jpg"
},
"Segmind SegMoE XL 4x2": {
"path": "segmind/SegMoE-4x2-v0",
"desc": "SegMoE-4x2-v0 is an untrained Segmind Mixture of Diffusion Experts Model generated using segmoe from 4 Expert SDXL models. SegMoE is a powerful framework for dynamically combining Stable Diffusion Models into a Mixture of Experts within minutes without training",
"preview": "segmind--SegMoE-4x2-v0.jpg"
},
"LCM SD-1.5 Dreamshaper 7": {
"path": "SimianLuo/LCM_Dreamshaper_v7",
"desc": "Latent Consistencey Models enable swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion. By distilling classifier-free guidance into the model's input, LCM can generate high-quality images in very short inference time. LCM can generate quality images in as few as 3-4 steps, making it blazingly fast.",

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

@ -74,7 +74,7 @@ def ensemble_depths(
# objective function
def closure(x):
l = len(x) # noqa
l = len(x)
s = x[: int(l / 2)]
t = x[int(l / 2) :]
s = torch.from_numpy(s).to(dtype=dtype).to(device)
@ -102,7 +102,7 @@ def ensemble_depths(
closure, x, method="BFGS", tol=tol, options={"maxiter": max_iter, "disp": False}
)
x = res.x
l = len(x) # noqa
l = len(x)
s = x[: int(l / 2)]
t = x[int(l / 2) :]

View File

@ -94,7 +94,7 @@ class PerceiverAttention(nn.Module):
x = self.norm1(x)
latents = self.norm2(latents)
b, l, _ = latents.shape # noqa:E741
b, l, _ = latents.shape
q = self.to_q(latents)
kv_input = torch.cat((x, latents), dim=-2)

View File

@ -158,7 +158,9 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
if hasattr(model, "set_progress_bar_config"):
model.set_progress_bar_config(bar_format='Progress {rate_fmt}{postfix} {bar} {percentage:3.0f}% {n_fmt}/{total_fmt} {elapsed} {remaining} ' + '\x1b[38;5;71m' + desc, ncols=80, colour='#327fba')
args = {}
signature = inspect.signature(type(model).__call__)
if hasattr(model, 'pipe'): # recurse
model = model.pipe
signature = inspect.signature(type(model).__call__, follow_wrapped=True)
possible = signature.parameters.keys()
debug(f'Diffusers pipeline possible: {possible}')
if shared.opts.diffusers_generator_device == "Unset":
@ -201,14 +203,15 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
if 'generator' in possible and generator is not None:
args['generator'] = generator
if 'output_type' in possible:
args['output_type'] = 'np'
if hasattr(model, 'vae'):
args['output_type'] = 'np' # only set latent if model has vae
if 'callback_steps' in possible:
args['callback_steps'] = 1
if 'callback' in possible:
args['callback'] = diffusers_callback_legacy
elif 'callback_on_step_end_tensor_inputs' in possible:
args['callback_on_step_end'] = diffusers_callback
if 'prompt_embeds' in possible and 'negative_prompt_embeds' in possible:
if 'prompt_embeds' in possible and 'negative_prompt_embeds' in possible and hasattr(model, '_callback_tensor_inputs'):
args['callback_on_step_end_tensor_inputs'] = model._callback_tensor_inputs # pylint: disable=protected-access
else:
args['callback_on_step_end_tensor_inputs'] = ['latents']
@ -405,7 +408,7 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
desc='Base',
)
update_sampler(shared.sd_model)
shared.state.sampling_steps = base_args['num_inference_steps']
shared.state.sampling_steps = base_args.get('num_inference_steps', p.steps)
p.extra_generation_params['Pipeline'] = shared.sd_model.__class__.__name__
if shared.opts.scheduler_eta is not None and shared.opts.scheduler_eta > 0 and shared.opts.scheduler_eta < 1:
p.extra_generation_params["Sampler Eta"] = shared.opts.scheduler_eta

View File

@ -586,6 +586,10 @@ def detect_pipeline(f: str, op: str = 'model', warning=True):
if shared.backend == shared.Backend.ORIGINAL:
warn(f'Model detected as InstaFlow model, but attempting to load using backend=original: {op}={f} size={size} MB')
guess = 'InstaFlow'
if 'SegMoE' in f:
if shared.backend == shared.Backend.ORIGINAL:
warn(f'Model detected as SegMoE model, but attempting to load using backend=original: {op}={f} size={size} MB')
guess = 'SegMoE'
if 'PixArt' in f:
if shared.backend == shared.Backend.ORIGINAL:
warn(f'Model detected as PixArt Alpha model, but attempting to load using backend=original: {op}={f} size={size} MB')
@ -794,6 +798,14 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No
except Exception as e:
shared.log.error(f'Diffusers Failed loading {op}: {checkpoint_info.path} {e}')
return
if model_type in ['SegMoE']: # forced pipeline
try:
from modules.segmoe.segmoe_model import SegMoEPipeline
sd_model = SegMoEPipeline(checkpoint_info.path, cache_dir=shared.opts.diffusers_dir, **diffusers_load_config)
sd_model = sd_model.pipe # segmoe pipe does its stuff in __init__ and __call__ is the original pipeline
except Exception as e:
shared.log.error(f'Diffusers Failed loading {op}: {checkpoint_info.path} {e}')
return
elif 'ONNX' in model_type: # forced pipeline
sd_model = pipeline.from_pretrained(checkpoint_info.path)
else:

File diff suppressed because it is too large Load Diff

View File

@ -54,7 +54,8 @@ def get_pipelines():
'ONNX Stable Diffusion XL': getattr(diffusers, 'OnnxStableDiffusionXLPipeline', None),
'ONNX Stable Diffusion XL Img2Img': getattr(diffusers, 'OnnxStableDiffusionXLImg2ImgPipeline', None),
'Custom Diffusers Pipeline': getattr(diffusers, 'DiffusionPipeline', None),
'InstaFlow': getattr(diffusers, 'StableDiffusionPipeline', None) # dynamically redefined and loaded in sd_models.load_diffuser
'InstaFlow': getattr(diffusers, 'StableDiffusionPipeline', None), # dynamically redefined and loaded in sd_models.load_diffuser
'SegMoE': getattr(diffusers, 'StableDiffusionPipeline', None), # dynamically redefined and loaded in sd_models.load_diffuser
# Segmind SSD-1B, Segmind Tiny
}

View File

@ -57,6 +57,7 @@ ignore = [
"C408", # Rewrite as a literal
"E402", # Module level import not at top of file
"E721", # Do not compare types, use `isinstance()`
"E741", # Do not use variables named `l`, `O`, or `I`
"EXE001", # Shebang present
"F401", # Imported but unused
"ISC003", # Implicit string concatenation