Merge pull request #3323 from vladmandic/dev

merge dev to master
pull/3335/head 2024-07-09
Vladimir Mandic 2024-07-09 15:52:56 -04:00 committed by GitHub
commit 99c9fd28cb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
59 changed files with 2087 additions and 868 deletions

View File

@ -26,14 +26,15 @@ body:
Easiest is to include top part of console log, for example:
```log
Starting SD.Next
Python 3.10.6 on Linux
Version: abd7d160 Sat Jun 10 07:37:42 2023 -0400
nVidia CUDA toolkit detected
Torch 2.1.0.dev20230519+cu121
Torch backend: nVidia CUDA 12.1 cuDNN 8801
Torch detected GPU: NVIDIA GeForce RTX 3060 VRAM 12288 Arch (8, 6) Cores 28
Enabled extensions-builtin: [...]
Enabled extensions: [...]
Version: app=sd.next updated=2024-06-28 hash=1fc20e72 branch=dev url=https://github.com/vladmandic/automatic/tree/dev ui=dev
Branch sync failed: sdnext=dev ui=dev
Platform: arch=x86_64 cpu=x86_64 system=Linux release=5.15.153.1-microsoft-standard-WSL2 python=3.12.3
Torch allocator: "garbage_collection_threshold:0.80,max_split_size_mb:512"
Load packages: {'torch': '2.3.1+cu121', 'diffusers': '0.29.1', 'gradio': '3.43.2'}
Engine: backend=Backend.DIFFUSERS compute=cuda device=cuda attention="Scaled-Dot-Product" mode=no_grad
Device: device=NVIDIA GeForce RTX 4090 n=1 arch=sm_90 cap=(8, 9) cuda=12.1 cudnn=8902 driver=555.99
Extensions: enabled=['sd-webui-agent-scheduler', 'sd-extension-chainner', 'sd-extension-system-info', 'sdnext-modernui', 'Lora'] extensions-builtin
Extensions: enabled=[] extensions
```
- type: markdown
attributes:
@ -73,6 +74,18 @@ body:
default: 0
validations:
required: true
- type: dropdown
id: ui
attributes:
label: UI
description: Which UI are you're using?
options:
- None
- Standard
- ModernUI
default: 1
validations:
required: true
- type: dropdown
id: branch
attributes:
@ -90,11 +103,12 @@ body:
label: Model
description: What is the model type you're using?
options:
- SD 1.5
- SD 2.1
- SD-XL
- StableDiffusion 1.5
- StableDiffusion 2.1
- StableDiffusion XL
- StableDiffusion 3
- PixArt
- Stable Cascade
- StableCascade
- Kandinsky
- Other
default: 0

View File

@ -7,6 +7,12 @@ on:
jobs:
lint:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
flags:
- --debug --test --uv
- --debug --test
steps:
- name: checkout-code
uses: actions/checkout@main
@ -27,5 +33,5 @@ jobs:
msg: apply code formatting and linting auto-fixes
- name: test-startup
run: |
export COMMANDLINE_ARGS="--debug --test"
export COMMANDLINE_ARGS="${{ matrix.flags }}"
python launch.py

View File

@ -1,13 +1,82 @@
# Change Log for SD.Next
## Update for 2024-07-09: WiP
### Pending
- Requires `diffusers==0.30.0`
- [AuraFlow/LavenderFlow](https://github.com/huggingface/diffusers/pull/8796) (previously known as LavenderFlow)
- [Kolors](https://github.com/huggingface/diffusers/pull/8812)
- [ControlNet Union](https://huggingface.co/xinsir/controlnet-union-sdxl-1.0) pipeline
- FlowMatchHeunDiscreteScheduler enable
### Highlights
Massive update to WiKi with over 20 new pages and articles, now includes guides for nearly all major features
Support for new models:
- [AlphaVLLM Lumina-Next-SFT](https://huggingface.co/Alpha-VLLM/Lumina-Next-SFT-diffusers)
- [Kwai Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
- [HunyuanDiT 1.2](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers)
What else? Just a bit... ;)
New **fast-install** mode, new **controlnet-union** *all-in-one* model, support for **DoRA** networks, additional **VLM** models, new **AuraSR** upscaler, and more...
### New Models
- [AlphaVLLM Lumina-Next-SFT](https://huggingface.co/Alpha-VLLM/Lumina-Next-SFT-diffusers)
to use, simply select from *networks -> reference
use scheduler: default or euler flowmatch or heun flowmatch
note: this model uses T5 XXL variation of text encoder
(previous version of Lumina used Gemma 2B as text encoder)
- [Kwai Kolors](https://huggingface.co/Kwai-Kolors/Kolors)
to use, simply select from *networks -> reference
note: this is an SDXL style model that replaces standard CLiP-L and CLiP-G text encoders with a massive `chatglm3-6b` encoder
however, this new encoder does support both English and Chinese prompting
- [HunyuanDiT 1.2](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers)
to use, simply select from *networks -> reference
## Update for 2024-07-08
This release is primary service release with cumulative fixes and several improvements, but no breaking changes.
**New features...**
- massive updates to [Wiki](https://github.com/vladmandic/automatic/wiki)
with over 20 new pages and articles, now includes guides for nearly all major features
*note*: this is work-in-progress, if you have any feedback or suggestions, please let us know!
thanks @GenesisArtemis!
- support for **DoRA** networks, thanks @AI-Casanova!
- support for [uv](https://pypi.org/project/uv/), extremely fast installer, thanks @Yoinky3000!
to use, simply add `--uv` to your command line params
- [Xinsir ControlNet++ Union](https://huggingface.co/xinsir/controlnet-union-sdxl-1.0)
new SDXL *all-in-one* controlnet that can process any kind of preprocessors!
- [CogFlorence 2 Large](https://huggingface.co/thwri/CogFlorence-2-Large-Freeze) VLM model
to use, simply select in process -> visual query
- [AuraSR](https://huggingface.co/fal/AuraSR) high-quality 4x GAN-style upscaling model
note: this is a large upscaler at 2.5GB
**And fixes...**
- enable **Florence VLM** for all platforms, thanks @lshqqytiger!
- improve ROCm detection under WSL2, thanks @lshqqytiger!
- add SD3 with FP16 T5 to list of detected models
- fix executing extensions with zero params
- add support for embeddings bundled in LoRA, thanks @AI-Casanova!
- fix executing extensions with zero params
- fix nncf for lora, thanks @Disty0!
- fix diffusers version detection for SD3
- fix current step for higher order samplers
- fix control input type video
- fix reset pipeline at the end of each iteration
- fix faceswap when no faces detected
- multiple ModernUI fixes
## Update for 2024-06-23
### Highlights for 2024-06-23
Following zero-day **SD3** release, a 10 days later here's a refresh with 10+ improvements
Following zero-day **SD3** release, a 10 days later heres a refresh with 10+ improvements
including full prompt attention, support for compressed weights, additional text-encoder quantization modes.
But there's more than SD3:
But theres more than SD3:
- support for quantized **T5** text encoder *FP16/FP8/FP4/INT8* in all models that use T5: SD3, PixArt-Σ, etc.
- support for **PixArt-Sigma** in small/medium/large variants
- support for **HunyuanDiT 1.1**
@ -17,7 +86,7 @@ But there's more than SD3:
- additional efficiencies for users with low VRAM GPUs
- over 20 overall fixes
### Model Improvements
### Model Improvements for 2024-06-23
- **SD3**: enable tiny-VAE (TAESD) preview and non-full quality mode
- SD3: enable base LoRA support
@ -43,9 +112,9 @@ But there's more than SD3:
- **MS Florence**: integration of Microsoft Florence VLM/VQA Base and Large models
simply select in *process -> visual query*!
### General Improvements
### General Improvements for 2024-06-23
- support FP4 quantized T5 text encoder, in addtion to existing FP8 and FP16
- support FP4 quantized T5 text encoder, in addition to existing FP8 and FP16
- support for T5 text-encoder loader in **all** models that use T5
*example*: load FP4 or FP8 quantized T5 text-encoder into PixArt Sigma!
- support for `torch-directml` **0.2.2**, thanks @lshqqytiger!
@ -67,7 +136,7 @@ But there's more than SD3:
- Lora support without reloading the model
- ControlNet compression support
### Fixes
### Fixes for 2024-06-23
- fix unsaturated outputs, force apply vae config on model load
- fix hidiffusion handling of non-square aspect ratios, thanks @ShenZhang-Shin!
@ -105,7 +174,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
### Full Changelog for 2024-06-13
#### New Models
#### New Models for 2024-06-23
- [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)
yup, supported!
@ -116,7 +185,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
note: this is a very large model at ~17GB, but can be used with less VRAM using model offloading
simply select from networks -> models -> reference, model will be auto-downloaded on first use
#### New Functionality
#### New Functionality for 2024-06-23
- [MuLan](https://github.com/mulanai/MuLan) Multi-language prompts
write your prompts in ~110 auto-detected languages!
@ -153,7 +222,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
typical differences are not large and its disabled by default as it does have some performance impact
- new sampler: **Euler FlowMatch**
#### Improvements
#### Improvements Fixes 2024-06-13
- additional modernui themes
- reintroduce prompt attention normalization, disabled by default, enable in settings -> execution
@ -173,7 +242,7 @@ Plus tons of minor features such as optimized initial install experience, **T-Ga
- auto-synchronize modernui and core branches
- add option to pad prompt with zeros, thanks @Disty
#### Fixes
#### Fixes 2024-06-13
- cumulative fixes since the last release
- fix apply/unapply hidiffusion for sd15

View File

@ -64,31 +64,31 @@ For screenshots and informations on other available themes, see [Themes Wiki](ht
Additional models will be added as they become available and there is public interest in them
- [RunwayML Stable Diffusion](https://github.com/Stability-AI/stablediffusion/) 1.x and 2.x *(all variants)*
- [StabilityAI Stable Diffusion XL](https://github.com/Stability-AI/generative-models)
- [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)
- [RunwayML Stable Diffusion](https://github.com/Stability-AI/stablediffusion/) 1.x and 2.x *(all variants)*
- [StabilityAI Stable Diffusion XL](https://github.com/Stability-AI/generative-models)
- [StabilityAI Stable Diffusion 3 Medium](https://stability.ai/news/stable-diffusion-3-medium)
- [StabilityAI Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) Base, XT 1.0, XT 1.1
- [LCM: Latent Consistency Models](https://github.com/openai/consistency_models)
- [Playground](https://huggingface.co/playgroundai/playground-v2-256px-base) *v1, v2 256, v2 512, v2 1024 and latest v2.5*
- [LCM: Latent Consistency Models](https://github.com/openai/consistency_models)
- [Playground](https://huggingface.co/playgroundai/playground-v2-256px-base) *v1, v2 256, v2 512, v2 1024 and latest v2.5*
- [Stable Cascade](https://github.com/Stability-AI/StableCascade) *Full* and *Lite*
- [aMUSEd 256](https://huggingface.co/amused/amused-256) 256 and 512
- [Segmind Vega](https://huggingface.co/segmind/Segmind-Vega)
- [Segmind SSD-1B](https://huggingface.co/segmind/SSD-1B)
- [Segmind SegMoE](https://github.com/segmind/segmoe) *SD and SD-XL*
- [Kandinsky](https://github.com/ai-forever/Kandinsky-2) *2.1 and 2.2 and latest 3.0*
- [PixArt-α XL 2](https://github.com/PixArt-alpha/PixArt-alpha) *Medium and Large*
- [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma)
- [Warp Wuerstchen](https://huggingface.co/blog/wuertschen)
- [Segmind Vega](https://huggingface.co/segmind/Segmind-Vega)
- [Segmind SSD-1B](https://huggingface.co/segmind/SSD-1B)
- [Segmind SegMoE](https://github.com/segmind/segmoe) *SD and SD-XL*
- [Kandinsky](https://github.com/ai-forever/Kandinsky-2) *2.1 and 2.2 and latest 3.0*
- [PixArt-α XL 2](https://github.com/PixArt-alpha/PixArt-alpha) *Medium and Large*
- [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma)
- [Warp Wuerstchen](https://huggingface.co/blog/wuertschen)
- [Tenecent HunyuanDiT](https://github.com/Tencent/HunyuanDiT)
- [Tsinghua UniDiffusion](https://github.com/thu-ml/unidiffuser)
- [DeepFloyd IF](https://github.com/deep-floyd/IF) *Medium and Large*
- [ModelScope T2V](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b)
- [Segmind SD Distilled](https://huggingface.co/blog/sd_distillation) *(all variants)*
- [BLIP-Diffusion](https://dxli94.github.io/BLIP-Diffusion-website/)
- [BLIP-Diffusion](https://dxli94.github.io/BLIP-Diffusion-website/)
- [KOALA 700M](https://github.com/youngwanLEE/sdxl-koala)
- [VGen](https://huggingface.co/ali-vilab/i2vgen-xl)
- [VGen](https://huggingface.co/ali-vilab/i2vgen-xl)
- [SDXS](https://github.com/IDKiro/sdxs)
- [Hyper-SD](https://huggingface.co/ByteDance/Hyper-SD)
- [Hyper-SD](https://huggingface.co/ByteDance/Hyper-SD)
Also supported are modifiers such as:
@ -226,6 +226,7 @@ List of available parameters, run `webui --help` for the full & up-to-date list:
--version Print version information
--ignore Ignore any errors and attempt to continue
--safe Run in safe mode with no user extensions
--uv Use uv as installer, default: False
Logging options:
--log LOG Set log file, default: None

View File

@ -10,8 +10,7 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
- init latents: variations, img2img
- diffusers public callbacks
- include reference styles
- lora: sc lora, dora, etc
- sd3 controlnet: <https://github.com/huggingface/diffusers/pull/8566>
- lora: sc lora, etc
## Experimental

View File

@ -164,6 +164,8 @@ class KeyConvert:
def diffusers(self, key):
if self.is_sdxl:
if "diffusion_model" in key: # Fix NTC Slider naming error
key = key.replace("diffusion_model", "lora_unet")
map_keys = list(self.UNET_CONVERSION_MAP.keys()) # prefix of U-Net modules
map_keys.sort()
search_key = key.replace(self.LORA_PREFIX_UNET, "").replace(self.OFT_PREFIX_UNET, "").replace(self.LORA_PREFIX_TEXT_ENCODER1, "").replace(self.LORA_PREFIX_TEXT_ENCODER2, "")

View File

@ -65,6 +65,7 @@ class Network: # LoraModule
self.unet_multiplier = [1.0] * 3
self.dyn_dim = None
self.modules = {}
self.bundle_embeddings = {}
self.mtime = None
self.mentioned_name = None
"""the text that was used to add the network to prompt - can be either name or an alias"""
@ -87,6 +88,8 @@ class NetworkModule:
self.bias = weights.w.get("bias")
self.alpha = weights.w["alpha"].item() if "alpha" in weights.w else None
self.scale = weights.w["scale"].item() if "scale" in weights.w else None
self.dora_scale = weights.w.get("dora_scale", None)
self.dora_norm_dims = len(self.shape) - 1
def multiplier(self):
unet_multiplier = 3 * [self.network.unet_multiplier] if not isinstance(self.network.unet_multiplier, list) else self.network.unet_multiplier
@ -108,6 +111,27 @@ class NetworkModule:
return self.alpha / self.dim
return 1.0
def apply_weight_decompose(self, updown, orig_weight):
# Match the device/dtype
orig_weight = orig_weight.to(updown.dtype)
dora_scale = self.dora_scale.to(device=orig_weight.device, dtype=updown.dtype)
updown = updown.to(orig_weight.device)
merged_scale1 = updown + orig_weight
merged_scale1_norm = (
merged_scale1.transpose(0, 1)
.reshape(merged_scale1.shape[1], -1)
.norm(dim=1, keepdim=True)
.reshape(merged_scale1.shape[1], *[1] * self.dora_norm_dims)
.transpose(0, 1)
)
dora_merged = (
merged_scale1 * (dora_scale / merged_scale1_norm)
)
final_updown = dora_merged - orig_weight
return final_updown
def finalize_updown(self, updown, orig_weight, output_shape, ex_bias=None):
if self.bias is not None:
updown = updown.reshape(self.bias.shape)
@ -119,6 +143,8 @@ class NetworkModule:
updown = updown.reshape(orig_weight.shape)
if ex_bias is not None:
ex_bias = ex_bias * self.multiplier()
if self.dora_scale is not None:
updown = self.apply_weight_decompose(updown, orig_weight)
return updown * self.calc_scale() * self.multiplier(), ex_bias
def calc_updown(self, target):

File diff suppressed because it is too large Load Diff

@ -1 +1 @@
Subproject commit dae2c67d826b631dcc343c028c60f478b0437877
Subproject commit 6a570df7ada9a048f3ce273851ade9cede9d5c26

Binary file not shown.

Before

Width:  |  Height:  |  Size: 49 KiB

After

Width:  |  Height:  |  Size: 315 KiB

View File

@ -182,13 +182,29 @@
"extras": "width: 1024, height: 1024, sampler: Default, cfg_scale: 2.0"
},
"Tencent HunyuanDiT 1.1": {
"path": "Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers",
"Tencent HunyuanDiT 1.2": {
"path": "Tencent-Hunyuan/HunyuanDiT-v1.2-Diffusers",
"desc": "Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding.",
"preview": "Tencent-Hunyuan-HunyuanDiT.jpg",
"extras": "width: 1024, height: 1024, sampler: Default, cfg_scale: 2.0"
},
"AlphaVLLM Lumina Next SFT": {
"path": "Alpha-VLLM/Lumina-Next-SFT-diffusers",
"desc": "The Lumina-Next-SFT is a Next-DiT model containing 2B parameters and utilizes Gemma-2B as the text encoder, enhanced through high-quality supervised fine-tuning (SFT).",
"preview": "Alpha-VLLM-Lumina-Next-SFT-diffusers.jpg",
"skip": true,
"extras": "width: 1024, height: 1024, sampler: Default"
},
"Kwai Kolors": {
"path": "Kwai-Kolors/Kolors",
"desc": "Kolors is a large-scale text-to-image generation model based on latent diffusion, developed by the Kuaishou Kolors team. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and proprietary models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters. Furthermore, Kolors supports both Chinese and English inputs",
"preview": "Kwai-Kolors.jpg",
"skip": true,
"extras": "width: 1024, height: 1024"
},
"Kandinsky 2.1": {
"path": "kandinsky-community/kandinsky-2-1",
"desc": "Kandinsky 2.1 is a text-conditional diffusion model based on unCLIP and latent diffusion, composed of a transformer-based image prior model, a unet diffusion model, and a decoder. Kandinsky 2.1 inherits best practices from Dall-E 2 and Latent diffusion while introducing some new ideas. It uses the CLIP model as a text and image encoder, and diffusion image prior (mapping) between latent spaces of CLIP modalities. This approach increases the visual performance of the model and unveils new horizons in blending images and text-guided image manipulation.",

View File

@ -52,6 +52,7 @@ args = Dot({
'reinstall': False,
'version': False,
'ignore': False,
'uv': False,
})
git_commit = "unknown"
submodules_commit = {
@ -235,22 +236,25 @@ def uninstall(package, quiet = False):
@lru_cache()
def pip(arg: str, ignore: bool = False, quiet: bool = False):
def pip(arg: str, ignore: bool = False, quiet: bool = False, uv = True):
uv = uv and args.uv
pipCmd = "uv pip" if uv else "pip"
arg = arg.replace('>=', '==')
if not quiet and '-r ' not in arg:
log.info(f'Install: package="{arg.replace("install", "").replace("--upgrade", "").replace("--no-deps", "").replace("--force", "").replace(" ", " ").strip()}"')
log.info(f'Install: package="{arg.replace("install", "").replace("--upgrade", "").replace("--no-deps", "").replace("--force", "").replace(" ", " ").strip()}" mode={"uv" if uv else "pip"}')
env_args = os.environ.get("PIP_EXTRA_ARGS", "")
log.debug(f'Running: pip="{pip_log}{arg} {env_args}"')
result = subprocess.run(f'"{sys.executable}" -m pip {pip_log}{arg} {env_args}', shell=True, check=False, env=os.environ, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
all_args = f'{pip_log}{arg} {env_args}'.strip()
log.debug(f'Running: {pipCmd}="{all_args}"')
result = subprocess.run(f'"{sys.executable}" -m {pipCmd} {all_args}', shell=True, check=False, env=os.environ, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
txt = result.stdout.decode(encoding="utf8", errors="ignore")
if len(result.stderr) > 0:
txt += ('\n' if len(txt) > 0 else '') + result.stderr.decode(encoding="utf8", errors="ignore")
txt = txt.strip()
debug(f'Install pip: {txt}')
debug(f'Install {pipCmd}: {txt}')
if result.returncode != 0 and not ignore:
global errors # pylint: disable=global-statement
errors += 1
log.error(f'Error running pip: {arg}')
log.error(f'Error running {pipCmd}: {arg}')
log.debug(f'Pip output: {txt}')
return txt
@ -264,7 +268,7 @@ def install(package, friendly: str = None, ignore: bool = False, reinstall: bool
quick_allowed = False
if args.reinstall or reinstall or not installed(package, friendly, quiet=quiet):
deps = '' if not no_deps else '--no-deps '
res = pip(f"install --upgrade {deps}{package}", ignore=ignore)
res = pip(f"install{' --upgrade' if not args.uv else ''} {deps}{package}", ignore=ignore, uv=package != "uv")
try:
import imp # pylint: disable=deprecated-module
imp.reload(pkg_resources)
@ -454,6 +458,10 @@ def install_rocm_zluda(torch_command):
command = subprocess.run('hipinfo', shell=True, check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
amd_gpus = command.stdout.decode(encoding="utf8", errors="ignore").split('\n')
amd_gpus = [x.split(' ')[-1].strip() for x in amd_gpus if x.startswith('gcnArchName:')]
elif os.environ.get('WSL_DISTRO_NAME', None) is not None: # WSL does not have 'rocm_agent_enumerator'
command = subprocess.run('rocminfo', shell=True, check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
amd_gpus = command.stdout.decode(encoding="utf8", errors="ignore").split('\n')
amd_gpus = [x.strip().split(" ")[-1] for x in amd_gpus if x.startswith(' Name:') and "CPU" not in x]
else:
command = subprocess.run('rocm_agent_enumerator', shell=True, check=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
amd_gpus = command.stdout.decode(encoding="utf8", errors="ignore").split('\n')
@ -529,7 +537,10 @@ def install_rocm_zluda(torch_command):
if rocm_ver is None: # assume the latest if version check fails
torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision --index-url https://download.pytorch.org/whl/rocm6.0')
elif rocm_ver == "6.1": # need nightlies
torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision --pre --index-url https://download.pytorch.org/whl/nightly/rocm6.1')
if args.experimental:
torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision --pre --index-url https://download.pytorch.org/whl/nightly/rocm6.1')
else:
torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision --index-url https://download.pytorch.org/whl/rocm6.0')
elif float(rocm_ver) < 5.5: # oldest supported version is 5.5
log.warning(f"Unsupported ROCm version detected: {rocm_ver}")
log.warning("Minimum supported ROCm version is 5.5")
@ -540,6 +551,27 @@ def install_rocm_zluda(torch_command):
ort_version = os.environ.get('ONNXRUNTIME_VERSION', None)
ort_package = os.environ.get('ONNXRUNTIME_PACKAGE', f"--pre onnxruntime-training{'' if ort_version is None else ('==' + ort_version)} --index-url https://pypi.lsh.sh/{rocm_ver[0]}{rocm_ver[2]} --extra-index-url https://pypi.org/simple")
install(ort_package, 'onnxruntime-training')
if bool(int(os.environ.get("TORCH_BLAS_PREFER_HIPBLASLT", "1"))):
supported_archs = []
hipblaslt_available = True
libpath = os.environ.get("HIPBLASLT_TENSILE_LIBPATH", "/opt/rocm/lib/hipblaslt/library")
for file in os.listdir(libpath):
if not file.startswith('extop_'):
continue
supported_archs.append(file[6:-3])
for gpu in amd_gpus:
if gpu not in supported_archs:
hipblaslt_available = False
break
log.info(f'hipBLASLt supported_archs={supported_archs}, available={hipblaslt_available}')
if hipblaslt_available:
import ctypes
# Preload hipBLASLt.
ctypes.CDLL("/opt/rocm/lib/libhipblaslt.so", mode=ctypes.RTLD_GLOBAL)
os.environ["HIPBLASLT_TENSILE_LIBPATH"] = libpath
else:
os.environ["TORCH_BLAS_PREFER_HIPBLASLT"] = "0"
return torch_command
@ -680,6 +712,20 @@ def check_torch():
install('onnxruntime-gpu', 'onnxruntime-gpu', ignore=True, quiet=True)
elif is_rocm_available(allow_rocm):
torch_command = install_rocm_zluda(torch_command)
# WSL ROCm
if os.environ.get('WSL_DISTRO_NAME', None) is not None:
import ctypes
try:
# Preload stdc++ library. This will ignore Anaconda stdc++ library.
ctypes.CDLL("/lib/x86_64-linux-gnu/libstdc++.so.6", mode=ctypes.RTLD_GLOBAL)
except OSError:
pass
try:
# Preload HSA Runtime library.
ctypes.CDLL("/opt/rocm/lib/libhsa-runtime64.so", mode=ctypes.RTLD_GLOBAL)
except OSError:
log.error("Failed to preload HSA Runtime library.")
elif is_ipex_available(allow_ipex):
torch_command = install_ipex(torch_command)
elif allow_openvino and args.use_openvino:
@ -1052,20 +1098,20 @@ def check_ui(ver):
if not same(ver):
log.debug(f'Branch mismatch: sdnext={ver["branch"]} ui={ver["ui"]}')
cwd = os.getcwd()
try:
os.chdir('extensions-builtin/sdnext-modernui')
target = 'dev' if 'dev' in ver['branch'] else 'main'
git('checkout ' + target, ignore=True, optional=True)
cwd = os.getcwd()
try:
os.chdir('extensions-builtin/sdnext-modernui')
target = 'dev' if 'dev' in ver['branch'] else 'main'
git('checkout ' + target, ignore=True, optional=True)
os.chdir(cwd)
ver = get_version(force=True)
if not same(ver):
log.debug(f'Branch synchronized: {ver["branch"]}')
else:
log.debug(f'Branch sync failed: sdnext={ver["branch"]} ui={ver["ui"]}')
except Exception as e:
log.debug(f'Branch switch: {e}')
os.chdir(cwd)
ver = get_version(force=True)
if not same(ver):
log.debug(f'Branch synchronized: {ver["branch"]}')
else:
log.debug(f'Branch sync failed: sdnext={ver["branch"]} ui={ver["ui"]}')
except Exception as e:
log.debug(f'Branch switch: {e}')
os.chdir(cwd)
# check version of the main repo and optionally upgrade it
@ -1165,7 +1211,7 @@ def check_timestamp():
def add_args(parser):
group = parser.add_argument_group('Setup options')
group.add_argument('--reset', default = os.environ.get("SD_RESET",False), action='store_true', help = "Reset main repository to latest version, default: %(default)s")
group.add_argument('--upgrade', default = os.environ.get("SD_UPGRADE",False), action='store_true', help = "Upgrade main repository to latest version, default: %(default)s")
group.add_argument('--upgrade', '--update', default = os.environ.get("SD_UPGRADE",False), action='store_true', help = "Upgrade main repository to latest version, default: %(default)s")
group.add_argument('--requirements', default = os.environ.get("SD_REQUIREMENTS",False), action='store_true', help = "Force re-check of requirements, default: %(default)s")
group.add_argument('--quick', default = os.environ.get("SD_QUICK",False), action='store_true', help = "Bypass version checks, default: %(default)s")
group.add_argument('--use-directml', default = os.environ.get("SD_USEDIRECTML",False), action='store_true', help = "Use DirectML if no compatible GPU is detected, default: %(default)s")
@ -1188,6 +1234,7 @@ def add_args(parser):
group.add_argument('--version', default = False, action='store_true', help = "Print version information")
group.add_argument('--ignore', default = os.environ.get("SD_IGNORE",False), action='store_true', help = "Ignore any errors and attempt to continue")
group.add_argument('--safe', default = os.environ.get("SD_SAFE",False), action='store_true', help = "Run in safe mode with no user extensions")
group.add_argument('--uv', default = os.environ.get("SD_UV",False), action='store_true', help = "Use uv instead of pip to install the packages")
group = parser.add_argument_group('Logging options')
group.add_argument("--log", type=str, default=os.environ.get("SD_LOG", None), help="Set log file, default: %(default)s")

View File

@ -204,6 +204,8 @@ def main():
installer.log.info(f'Platform: {installer.print_dict(installer.get_platform())}')
if not args.skip_env:
installer.set_environment()
if args.uv:
installer.install("uv", "uv")
installer.check_torch()
installer.check_onnx()
installer.check_diffusers()

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

View File

@ -39,10 +39,10 @@ def get_script(script_name, script_runner):
return script_runner.scripts[script_idx]
def init_default_script_args(script_runner):
#find max idx from the scripts in runner and generate a none array to init script_args
# find max idx from the scripts in runner and generate a none array to init script_args
last_arg_index = 1
for script in script_runner.scripts:
if last_arg_index < script.args_to:
if last_arg_index < script.args_to: # pylint disable=consider-using-max-builtin
last_arg_index = script.args_to
# None everywhere except position 0 to initialize script args
script_args = [None]*last_arg_index

View File

@ -282,67 +282,72 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
else:
pass
debug(f'Control: run type={unit_type} models={has_models}')
if has_models:
p.ops.append('control')
p.extra_generation_params["Control mode"] = unit_type # overriden later with pretty-print
p.extra_generation_params["Control conditioning"] = control_conditioning if isinstance(control_conditioning, list) else [control_conditioning]
p.extra_generation_params['Control start'] = control_guidance_start if isinstance(control_guidance_start, list) else [control_guidance_start]
p.extra_generation_params['Control end'] = control_guidance_end if isinstance(control_guidance_end, list) else [control_guidance_end]
p.extra_generation_params["Control model"] = ';'.join([(m.model_id or '') for m in active_model if m.model is not None])
p.extra_generation_params["Control conditioning"] = ';'.join([str(c) for c in p.extra_generation_params["Control conditioning"]])
p.extra_generation_params['Control start'] = ';'.join([str(c) for c in p.extra_generation_params['Control start']])
p.extra_generation_params['Control end'] = ';'.join([str(c) for c in p.extra_generation_params['Control end']])
if unit_type == 't2i adapter' and has_models:
p.extra_generation_params["Control mode"] = 'T2I-Adapter'
p.task_args['adapter_conditioning_scale'] = control_conditioning
instance = t2iadapter.AdapterPipeline(selected_models, shared.sd_model)
pipe = instance.pipeline
if inits is not None:
shared.log.warning('Control: T2I-Adapter does not support separate init image')
elif unit_type == 'controlnet' and has_models:
p.extra_generation_params["Control mode"] = 'ControlNet'
p.task_args['controlnet_conditioning_scale'] = control_conditioning
p.task_args['control_guidance_start'] = control_guidance_start
p.task_args['control_guidance_end'] = control_guidance_end
p.task_args['guess_mode'] = p.guess_mode
instance = controlnet.ControlNetPipeline(selected_models, shared.sd_model)
pipe = instance.pipeline
elif unit_type == 'xs' and has_models:
p.extra_generation_params["Control mode"] = 'ControlNet-XS'
p.controlnet_conditioning_scale = control_conditioning
p.control_guidance_start = control_guidance_start
p.control_guidance_end = control_guidance_end
instance = xs.ControlNetXSPipeline(selected_models, shared.sd_model)
pipe = instance.pipeline
if inits is not None:
shared.log.warning('Control: ControlNet-XS does not support separate init image')
elif unit_type == 'lite' and has_models:
p.extra_generation_params["Control mode"] = 'ControlLLLite'
p.controlnet_conditioning_scale = control_conditioning
instance = lite.ControlLLitePipeline(shared.sd_model)
pipe = instance.pipeline
if inits is not None:
shared.log.warning('Control: ControlLLLite does not support separate init image')
elif unit_type == 'reference' and has_models:
p.extra_generation_params["Control mode"] = 'Reference'
p.extra_generation_params["Control attention"] = p.attention
p.task_args['reference_attn'] = 'Attention' in p.attention
p.task_args['reference_adain'] = 'Adain' in p.attention
p.task_args['attention_auto_machine_weight'] = p.query_weight
p.task_args['gn_auto_machine_weight'] = p.adain_weight
p.task_args['style_fidelity'] = p.fidelity
instance = reference.ReferencePipeline(shared.sd_model)
pipe = instance.pipeline
if inits is not None:
shared.log.warning('Control: ControlNet-XS does not support separate init image')
else: # run in txt2img/img2img mode
if len(active_strength) > 0:
p.strength = active_strength[0]
pipe = shared.sd_model
instance = None
def set_pipe():
global pipe, instance # pylint: disable=global-statement
pipe = None
if has_models:
p.ops.append('control')
p.extra_generation_params["Control mode"] = unit_type # overriden later with pretty-print
p.extra_generation_params["Control conditioning"] = control_conditioning if isinstance(control_conditioning, list) else [control_conditioning]
p.extra_generation_params['Control start'] = control_guidance_start if isinstance(control_guidance_start, list) else [control_guidance_start]
p.extra_generation_params['Control end'] = control_guidance_end if isinstance(control_guidance_end, list) else [control_guidance_end]
p.extra_generation_params["Control model"] = ';'.join([(m.model_id or '') for m in active_model if m.model is not None])
p.extra_generation_params["Control conditioning"] = ';'.join([str(c) for c in p.extra_generation_params["Control conditioning"]])
p.extra_generation_params['Control start'] = ';'.join([str(c) for c in p.extra_generation_params['Control start']])
p.extra_generation_params['Control end'] = ';'.join([str(c) for c in p.extra_generation_params['Control end']])
if unit_type == 't2i adapter' and has_models:
p.extra_generation_params["Control mode"] = 'T2I-Adapter'
p.task_args['adapter_conditioning_scale'] = control_conditioning
instance = t2iadapter.AdapterPipeline(selected_models, shared.sd_model)
pipe = instance.pipeline
if inits is not None:
shared.log.warning('Control: T2I-Adapter does not support separate init image')
elif unit_type == 'controlnet' and has_models:
p.extra_generation_params["Control mode"] = 'ControlNet'
p.task_args['controlnet_conditioning_scale'] = control_conditioning
p.task_args['control_guidance_start'] = control_guidance_start
p.task_args['control_guidance_end'] = control_guidance_end
p.task_args['guess_mode'] = p.guess_mode
instance = controlnet.ControlNetPipeline(selected_models, shared.sd_model)
pipe = instance.pipeline
elif unit_type == 'xs' and has_models:
p.extra_generation_params["Control mode"] = 'ControlNet-XS'
p.controlnet_conditioning_scale = control_conditioning
p.control_guidance_start = control_guidance_start
p.control_guidance_end = control_guidance_end
instance = xs.ControlNetXSPipeline(selected_models, shared.sd_model)
pipe = instance.pipeline
if inits is not None:
shared.log.warning('Control: ControlNet-XS does not support separate init image')
elif unit_type == 'lite' and has_models:
p.extra_generation_params["Control mode"] = 'ControlLLLite'
p.controlnet_conditioning_scale = control_conditioning
instance = lite.ControlLLitePipeline(shared.sd_model)
pipe = instance.pipeline
if inits is not None:
shared.log.warning('Control: ControlLLLite does not support separate init image')
elif unit_type == 'reference' and has_models:
p.extra_generation_params["Control mode"] = 'Reference'
p.extra_generation_params["Control attention"] = p.attention
p.task_args['reference_attn'] = 'Attention' in p.attention
p.task_args['reference_adain'] = 'Adain' in p.attention
p.task_args['attention_auto_machine_weight'] = p.query_weight
p.task_args['gn_auto_machine_weight'] = p.adain_weight
p.task_args['style_fidelity'] = p.fidelity
instance = reference.ReferencePipeline(shared.sd_model)
pipe = instance.pipeline
if inits is not None:
shared.log.warning('Control: ControlNet-XS does not support separate init image')
else: # run in txt2img/img2img mode
if len(active_strength) > 0:
p.strength = active_strength[0]
pipe = shared.sd_model
instance = None
debug(f'Control: run type={unit_type} models={has_models} pipe={pipe.__class__.__name__ if pipe is not None else None}')
return pipe
pipe = set_pipe()
debug(f'Control pipeline: class={pipe.__class__.__name__} args={vars(p)}')
t1, t2, t3 = time.time(), 0, 0
status = True
@ -351,6 +356,7 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
output_filename = None
index = 0
frames = 0
blended_image = None
# set pipeline
if pipe.__class__.__name__ != shared.sd_model.__class__.__name__:
@ -382,6 +388,7 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
codec = util.decode_fourcc(video.get(cv2.CAP_PROP_FOURCC))
status, frame = video.read()
if status:
shared.state.frame_count = 1 + frames // (video_skip_frames + 1)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
shared.log.debug(f'Control: input video: path={inputs} frames={frames} fps={fps} size={w}x{h} codec={codec}')
except Exception as e:
@ -389,6 +396,9 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
return [], '', '', 'Error: video open failed'
while status:
if pipe is None: # pipe may have been reset externally
pipe = set_pipe()
debug(f'Control pipeline reinit: class={pipe.__class__.__name__}')
processed_image = None
if frame is not None:
inputs = [Image.fromarray(frame)] # cv2 to pil
@ -425,9 +435,10 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
else:
debug(f'Control Init image: {i % len(inits) + 1} of {len(inits)}')
init_image = inits[i % len(inits)]
index += 1
if video is not None and index % (video_skip_frames + 1) != 0:
index += 1
continue
index += 1
# resize before
if resize_mode_before != 0 and resize_name_before != 'None':
@ -477,7 +488,6 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
process.model = None
debug(f'Control processed: {len(processed_images)}')
blended_image = None
if len(processed_images) > 0:
try:
if len(p.extra_generation_params["Control process"]) == 0:
@ -593,10 +603,11 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
output = None
script_run = False
if pipe is not None: # run new pipeline
pipe.restore_pipeline = restore_pipeline
if not hasattr(pipe, 'restore_pipeline') and video is None:
pipe.restore_pipeline = restore_pipeline
debug(f'Control exec pipeline: task={sd_models.get_diffusers_task(pipe)} class={pipe.__class__}')
debug(f'Control exec pipeline: p={vars(p)}')
debug(f'Control exec pipeline: args={p.task_args} image={p.task_args.get("image", None)} control={p.task_args.get("control_image", None)} mask={p.task_args.get("mask_image", None) or p.image_mask} ref={p.task_args.get("ref_image", None)}')
# debug(f'Control exec pipeline: p={vars(p)}')
# debug(f'Control exec pipeline: args={p.task_args} image={p.task_args.get("image", None)} control={p.task_args.get("control_image", None)} mask={p.task_args.get("mask_image", None) or p.image_mask} ref={p.task_args.get("ref_image", None)}')
if sd_models.get_diffusers_task(pipe) != sd_models.DiffusersTaskType.TEXT_2_IMAGE: # force vae back to gpu if not in txt2img mode
sd_models.move_model(pipe.vae, devices.device)
@ -692,5 +703,4 @@ def control_run(units: List[unit.Unit] = [], inputs: List[Image.Image] = [], ini
if is_generator:
yield (output_images, blended_image, html_txt, output_filename)
else:
yield (output_images, blended_image, html_txt, output_filename)
return
return (output_images, blended_image, html_txt, output_filename)

View File

@ -49,9 +49,12 @@ predefined_sdxl = {
'Canny XL': 'diffusers/controlnet-canny-sdxl-1.0',
'Depth Zoe XL': 'diffusers/controlnet-zoe-depth-sdxl-1.0',
'Depth Mid XL': 'diffusers/controlnet-depth-sdxl-1.0-mid',
'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0',
'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0/bin',
# 'OpenPose XL': 'thibaud/controlnet-openpose-sdxl-1.0/OpenPoseXL2.safetensors',
'Xinsir Union XL': 'xinsir/controlnet-union-sdxl-1.0',
'Xinsir OpenPose XL': 'xinsir/controlnet-openpose-sdxl-1.0',
'Xinsir Canny XL': 'xinsir/controlnet-canny-sdxl-1.0',
'Xinsir Depth XL': 'xinsir/controlnet-depth-sdxl-1.0',
'Xinsir Scribble XL': 'xinsir/controlnet-scribble-sdxl-1.0',
'Xinsir Anime Painter XL': 'xinsir/anime-painter',
# 'StabilityAI Canny R128': 'stabilityai/control-lora/control-LoRAs-rank128/control-lora-canny-rank128.safetensors',
@ -171,6 +174,9 @@ class ControlNet():
if model_path.endswith('.safetensors'):
self.load_safetensors(model_path)
else:
if '/bin' in model_path:
model_path = model_path.replace('/bin', '')
self.load_config['use_safetensors'] = False
self.model = ControlNetModel.from_pretrained(model_path, **self.load_config)
if self.dtype is not None:
self.model.to(self.dtype)

View File

@ -26,8 +26,7 @@ from diffusers.loaders import FromSingleFileMixin, LoraLoaderMixin, StableDiffus
from diffusers.models import AutoencoderKL, UNet2DConditionModel
from diffusers.models.attention_processor import (
AttnProcessor2_0,
LoRAAttnProcessor2_0,
LoRAXFormersAttnProcessor,
FusedAttnProcessor2_0,
XFormersAttnProcessor,
)
from diffusers.models.lora import adjust_lora_scale_text_encoder
@ -652,8 +651,7 @@ class StableDiffusionXLControlNetXSPipeline(
(
AttnProcessor2_0,
XFormersAttnProcessor,
LoRAXFormersAttnProcessor,
LoRAAttnProcessor2_0,
FusedAttnProcessor2_0,
),
)
# if xformers or torch_2_0 is used attention block does not need

View File

@ -46,7 +46,7 @@ def get_gpu_info():
try:
if shared.cmd_opts.use_openvino:
return {
'device': get_openvino_device(),
'device': get_openvino_device(), # pylint: disable=used-before-assignment
'openvino': get_package_version("openvino"),
}
elif shared.cmd_opts.use_directml:
@ -166,7 +166,7 @@ def torch_gc(force=False):
after = { 'gpu': mem.get('gpu', {}).get('used', 0), 'ram': mem.get('ram', {}).get('used', 0), 'retries': mem.get('retries', 0), 'oom': mem.get('oom', 0) }
utilization = { 'gpu': used_gpu, 'ram': used_ram, 'threshold': threshold }
results = { 'collected': collected, 'saved': saved }
log.debug(f'GC: utilization={utilization} gc={results} beofre={before} after={after} device={torch.device(get_optimal_device_name())} fn={sys._getframe(1).f_code.co_name} time={round(t1 - t0, 2)}') # pylint: disable=protected-access
log.debug(f'GC: utilization={utilization} gc={results} before={before} after={after} device={torch.device(get_optimal_device_name())} fn={sys._getframe(1).f_code.co_name} time={round(t1 - t0, 2)}') # pylint: disable=protected-access
def set_cuda_sync_mode(mode):
@ -311,7 +311,7 @@ def set_cuda_params():
inference_context = contextlib.nullcontext
else:
inference_context = torch.no_grad
log_device_name = get_raw_openvino_device() if shared.cmd_opts.use_openvino else torch.device(get_optimal_device_name())
log_device_name = get_raw_openvino_device() if shared.cmd_opts.use_openvino else torch.device(get_optimal_device_name()) # pylint: disable=used-before-assignment
log.debug(f'Desired Torch parameters: dtype={shared.opts.cuda_dtype} no-half={shared.opts.no_half} no-half-vae={shared.opts.no_half_vae} upscast={shared.opts.upcast_sampling}')
log.info(f'Setting Torch parameters: device={log_device_name} dtype={dtype} vae={dtype_vae} unet={dtype_unet} context={inference_context.__name__} fp16={fp16_ok} bf16={bf16_ok} optimization={shared.opts.cross_attention_optimization}')

View File

@ -22,6 +22,9 @@ def face_swap(p: processing.StableDiffusionProcessing, app, input_images: List[I
np_image = cv2.cvtColor(np.array(source_image), cv2.COLOR_RGB2BGR)
faces = app.get(np_image)
if faces is None or len(faces) == 0:
shared.log.warning('FaceSwap: No faces detected')
return
source_face = faces[0]
processed_images = []
for image in input_images:

View File

@ -35,7 +35,7 @@ timer.startup.record("torch")
import transformers # pylint: disable=W0611,C0411
timer.startup.record("transformers")
import onnxruntime
import onnxruntime # pylint: disable=W0611,C0411
onnxruntime.set_default_logger_severity(3)
timer.startup.record("onnx")
@ -50,7 +50,7 @@ timer.startup.record("pydantic")
import diffusers # pylint: disable=W0611,C0411
import diffusers.loaders.single_file # pylint: disable=W0611,C0411
logging.getLogger("diffusers.loaders.single_file").setLevel(logging.ERROR)
from tqdm.rich import tqdm
from tqdm.rich import tqdm # pylint: disable=W0611,C0411
diffusers.loaders.single_file.logging.tqdm = partial(tqdm, unit='C')
timer.startup.record("diffusers")

28
modules/model_kolors.py Normal file
View File

@ -0,0 +1,28 @@
import torch
import transformers
import diffusers
repo_id = 'Kwai-Kolors/Kolors'
encoder_id = 'THUDM/chatglm3-6b'
def load_kolors(_checkpoint_info, diffusers_load_config={}):
from modules import shared, devices, modelloader
modelloader.hf_login()
diffusers_load_config['variant'] = "fp16"
if 'torch_dtype' not in diffusers_load_config:
diffusers_load_config['torch_dtype'] = 'torch.float16'
text_encoder = transformers.AutoModel.from_pretrained(encoder_id, torch_dtype=torch.float16, trust_remote_code=True, cache_dir=shared.opts.diffusers_dir)
# text_encoder = transformers.AutoModel.from_pretrained("THUDM/chatglm3-6b", torch_dtype=torch.float16, trust_remote_code=True).quantize(4).cuda()
tokenizer = transformers.AutoTokenizer.from_pretrained(encoder_id, trust_remote_code=True, cache_dir=shared.opts.diffusers_dir)
pipe = diffusers.StableDiffusionXLPipeline.from_pretrained(
repo_id,
tokenizer=tokenizer,
text_encoder=text_encoder,
cache_dir = shared.opts.diffusers_dir,
**diffusers_load_config,
)
devices.torch_gc()
return pipe

24
modules/model_lumina.py Normal file
View File

@ -0,0 +1,24 @@
import diffusers
def load_lumina(_checkpoint_info, diffusers_load_config={}):
from modules import shared, devices, modelloader
modelloader.hf_login()
# {'low_cpu_mem_usage': True, 'torch_dtype': torch.float16, 'load_connected_pipeline': True, 'safety_checker': None, 'requires_safety_checker': False}
if 'torch_dtype' not in diffusers_load_config:
diffusers_load_config['torch_dtype'] = 'torch.float16'
if 'low_cpu_mem_usage' in diffusers_load_config:
del diffusers_load_config['low_cpu_mem_usage']
if 'load_connected_pipeline' in diffusers_load_config:
del diffusers_load_config['load_connected_pipeline']
if 'safety_checker' in diffusers_load_config:
del diffusers_load_config['safety_checker']
if 'requires_safety_checker' in diffusers_load_config:
del diffusers_load_config['requires_safety_checker']
pipe = diffusers.LuminaText2ImgPipeline.from_pretrained(
'Alpha-VLLM/Lumina-Next-SFT-diffusers',
cache_dir = shared.opts.diffusers_dir,
**diffusers_load_config,
)
devices.torch_gc()
return pipe

View File

@ -13,9 +13,9 @@ def load_sd3(fn=None, cache_dir=None, config=None):
if fn is not None and fn.endswith('.safetensors') and os.path.exists(fn):
model_id = fn
loader = diffusers.StableDiffusion3Pipeline.from_single_file
diffusers_minor = int(diffusers.__version__.split('.')[1])
_diffusers_major, diffusers_minor, diffusers_micro = int(diffusers.__version__.split('.')[0]), int(diffusers.__version__.split('.')[1]), int(diffusers.__version__.split('.')[2])
fn_size = os.path.getsize(fn)
if diffusers_minor < 30 or fn_size < 5e9: # te1/te2 do not get loaded correctly in diffusers 0.29.0 or model is without te1/te2
if (diffusers_minor <= 29 and diffusers_micro < 1) or fn_size < 5e9: # te1/te2 do not get loaded correctly in diffusers 0.29.0 if model is without te1/te2
kwargs = {
'text_encoder': transformers.CLIPTextModelWithProjection.from_pretrained(
repo_id,

View File

@ -75,3 +75,4 @@ def set_t5(pipe, module, t5=None, cache_dir=None):
else:
pipe.maybe_free_model_hooks()
devices.torch_gc()
return pipe

View File

@ -3,7 +3,6 @@ import numpy as np
import torch
import diffusers
import onnxruntime as ort
import optimum.onnxruntime
initialized = False
@ -208,8 +207,6 @@ def initialize_onnx():
from .pipelines.onnx_stable_diffusion_img2img_pipeline import OnnxStableDiffusionImg2ImgPipeline
from .pipelines.onnx_stable_diffusion_inpaint_pipeline import OnnxStableDiffusionInpaintPipeline
from .pipelines.onnx_stable_diffusion_upscale_pipeline import OnnxStableDiffusionUpscalePipeline
from .pipelines.onnx_stable_diffusion_xl_pipeline import OnnxStableDiffusionXLPipeline
from .pipelines.onnx_stable_diffusion_xl_img2img_pipeline import OnnxStableDiffusionXLImg2ImgPipeline
OnnxRuntimeModel.__module__ = 'diffusers' # OnnxRuntimeModel Hijack.
diffusers.OnnxRuntimeModel = OnnxRuntimeModel
@ -225,6 +222,16 @@ def initialize_onnx():
diffusers.OnnxStableDiffusionUpscalePipeline = OnnxStableDiffusionUpscalePipeline
log.debug(f'ONNX: version={ort.__version__} provider={opts.onnx_execution_provider}, available={available_execution_providers}')
except Exception as e:
log.error(f'ONNX failed to initialize: {e}')
try:
# load xl pipelines. may fail if the user has the latest diffusers (0.30.x)
import optimum.onnxruntime
from .pipelines.onnx_stable_diffusion_xl_pipeline import OnnxStableDiffusionXLPipeline
from .pipelines.onnx_stable_diffusion_xl_img2img_pipeline import OnnxStableDiffusionXLImg2ImgPipeline
diffusers.OnnxStableDiffusionXLPipeline = OnnxStableDiffusionXLPipeline
diffusers.pipelines.auto_pipeline.AUTO_TEXT2IMAGE_PIPELINES_MAPPING["onnx-stable-diffusion-xl"] = diffusers.OnnxStableDiffusionXLPipeline
@ -235,10 +242,9 @@ def initialize_onnx():
diffusers.ORTStableDiffusionXLImg2ImgPipeline = diffusers.OnnxStableDiffusionXLImg2ImgPipeline
optimum.onnxruntime.modeling_diffusion._ORTDiffusionModelPart.to = ORTDiffusionModelPart_to # pylint: disable=protected-access
except Exception:
pass
log.debug(f'ONNX: version={ort.__version__} provider={opts.onnx_execution_provider}, available={available_execution_providers}')
except Exception as e:
log.error(f'ONNX failed to initialize: {e}')
initialized = True

View File

@ -52,8 +52,8 @@ def get_execution_provider_options():
execution_provider_options = { "device_id": int(cmd_opts.device_id or 0) }
if opts.onnx_execution_provider == ExecutionProvider.ROCm:
if ExecutionProvider.ROCm in available_execution_providers:
execution_provider_options["tunable_op_enable"] = 1
execution_provider_options["tunable_op_tuning_enable"] = 1
execution_provider_options["tunable_op_enable"] = True
execution_provider_options["tunable_op_tuning_enable"] = True
elif opts.onnx_execution_provider == ExecutionProvider.OpenVINO:
from modules.intel.openvino import get_device as get_raw_openvino_device
device = get_raw_openvino_device()

View File

@ -1,14 +1,12 @@
import os
import sys
import json
import shutil
import tempfile
from abc import ABCMeta
from typing import Type, Tuple, List, Any, Dict
from packaging import version
import torch
import diffusers
import onnxruntime as ort
import optimum.onnxruntime
from installer import log, install
from modules import shared
from modules.paths import sd_configs_path, models_path
@ -23,7 +21,6 @@ from modules.onnx_impl.execution_providers import ExecutionProvider, EP_TO_NAME,
SUBMODELS_SD = ("text_encoder", "unet", "vae_encoder", "vae_decoder",)
SUBMODELS_SDXL = ("text_encoder", "text_encoder_2", "unet", "vae_encoder", "vae_decoder",)
SUBMODELS_SDXL_REFINER = ("text_encoder_2", "unet", "vae_encoder", "vae_decoder",)
SUBMODELS_LARGE = ("text_encoder_2", "unet",)
@ -48,11 +45,13 @@ class PipelineBase(TorchCompatibleModule, diffusers.DiffusionPipeline, metaclass
module = getattr(self, name)
if isinstance(module, optimum.onnxruntime.modeling_diffusion._ORTDiffusionModelPart): # pylint: disable=protected-access
device = extract_device(args, kwargs)
if device is None:
return self
module.session = move_inference_session(module.session, device)
if "optimum.onnxruntime" in sys.modules:
import optimum.onnxruntime
if isinstance(module, optimum.onnxruntime.modeling_diffusion._ORTDiffusionModelPart): # pylint: disable=protected-access
device = extract_device(args, kwargs)
if device is None:
return self
module.session = move_inference_session(module.session, device)
if not isinstance(module, diffusers.OnnxRuntimeModel):
continue

View File

@ -5,7 +5,6 @@ from typing import Any, Callable, Dict, List, Optional, Tuple, Union
import torch
import torch.nn.functional as F
from packaging import version
from transformers import (
CLIPImageProcessor,
@ -26,8 +25,6 @@ from diffusers.models import AutoencoderKL, ImageProjection, UNet2DConditionMode
from diffusers.models.attention_processor import (
AttnProcessor2_0,
FusedAttnProcessor2_0,
LoRAAttnProcessor2_0,
LoRAXFormersAttnProcessor,
XFormersAttnProcessor,
)
from diffusers.models.lora import adjust_lora_scale_text_encoder
@ -943,8 +940,6 @@ class StableDiffusionXLPAGPipeline(
(
AttnProcessor2_0,
XFormersAttnProcessor,
LoRAXFormersAttnProcessor,
LoRAAttnProcessor2_0,
FusedAttnProcessor2_0,
),
)

View File

@ -0,0 +1,833 @@
# AuraSR: GAN-based Super-Resolution for real-world, a reproduction of the GigaGAN* paper. Implementation is
# based on the unofficial lucidrains/gigagan-pytorch repository. Heavily modified from there.
#
# https://mingukkang.github.io/GigaGAN/
from math import log2, ceil
from functools import partial
from typing import Any, Optional, List, Iterable
import torch
from torchvision import transforms
from PIL import Image
from torch import nn, einsum, Tensor
import torch.nn.functional as F
from einops import rearrange, repeat, reduce
from einops.layers.torch import Rearrange
def get_same_padding(size, kernel, dilation, stride):
return ((size - 1) * (stride - 1) + dilation * (kernel - 1)) // 2
class AdaptiveConv2DMod(nn.Module):
def __init__(
self,
dim,
dim_out,
kernel,
*,
demod=True,
stride=1,
dilation=1,
eps=1e-8,
num_conv_kernels=1, # set this to be greater than 1 for adaptive
):
super().__init__()
self.eps = eps
self.dim_out = dim_out
self.kernel = kernel
self.stride = stride
self.dilation = dilation
self.adaptive = num_conv_kernels > 1
self.weights = nn.Parameter(
torch.randn((num_conv_kernels, dim_out, dim, kernel, kernel))
)
self.demod = demod
nn.init.kaiming_normal_(
self.weights, a=0, mode="fan_in", nonlinearity="leaky_relu"
)
def forward(
self, fmap, mod: Optional[Tensor] = None, kernel_mod: Optional[Tensor] = None
):
"""
notation
b - batch
n - convs
o - output
i - input
k - kernel
"""
b, h = fmap.shape[0], fmap.shape[-2]
# account for feature map that has been expanded by the scale in the first dimension
# due to multiscale inputs and outputs
if mod.shape[0] != b:
mod = repeat(mod, "b ... -> (s b) ...", s=b // mod.shape[0])
if exists(kernel_mod):
kernel_mod_has_el = kernel_mod.numel() > 0
assert self.adaptive or not kernel_mod_has_el
if kernel_mod_has_el and kernel_mod.shape[0] != b:
kernel_mod = repeat(
kernel_mod, "b ... -> (s b) ...", s=b // kernel_mod.shape[0]
)
# prepare weights for modulation
weights = self.weights
if self.adaptive:
weights = repeat(weights, "... -> b ...", b=b)
# determine an adaptive weight and 'select' the kernel to use with softmax
assert exists(kernel_mod) and kernel_mod.numel() > 0
kernel_attn = kernel_mod.softmax(dim=-1)
kernel_attn = rearrange(kernel_attn, "b n -> b n 1 1 1 1")
weights = reduce(weights * kernel_attn, "b n ... -> b ...", "sum")
# do the modulation, demodulation, as done in stylegan2
mod = rearrange(mod, "b i -> b 1 i 1 1")
weights = weights * (mod + 1)
if self.demod:
inv_norm = (
reduce(weights**2, "b o i k1 k2 -> b o 1 1 1", "sum")
.clamp(min=self.eps)
.rsqrt()
)
weights = weights * inv_norm
fmap = rearrange(fmap, "b c h w -> 1 (b c) h w")
weights = rearrange(weights, "b o ... -> (b o) ...")
padding = get_same_padding(h, self.kernel, self.dilation, self.stride)
fmap = F.conv2d(fmap, weights, padding=padding, groups=b)
return rearrange(fmap, "1 (b o) ... -> b o ...", b=b)
class Attend(nn.Module):
def __init__(self, dropout=0.0, flash=False):
super().__init__()
self.dropout = dropout
self.attn_dropout = nn.Dropout(dropout)
self.scale = nn.Parameter(torch.randn(1))
self.flash = flash
def flash_attn(self, q, k, v):
q, k, v = map(lambda t: t.contiguous(), (q, k, v))
out = F.scaled_dot_product_attention(
q, k, v, dropout_p=self.dropout if self.training else 0.0
)
return out
def forward(self, q, k, v):
if self.flash:
return self.flash_attn(q, k, v)
scale = q.shape[-1] ** -0.5
# similarity
sim = einsum("b h i d, b h j d -> b h i j", q, k) * scale
# attention
attn = sim.softmax(dim=-1)
attn = self.attn_dropout(attn)
# aggregate values
out = einsum("b h i j, b h j d -> b h i d", attn, v)
return out
def exists(x):
return x is not None
def default(val, d):
if exists(val):
return val
return d() if callable(d) else d
def cast_tuple(t, length=1):
if isinstance(t, tuple):
return t
return (t,) * length
def identity(t, *args, **kwargs):
return t
def is_power_of_two(n):
return log2(n).is_integer()
def null_iterator():
while True:
yield None
def Downsample(dim, dim_out=None):
return nn.Sequential(
Rearrange("b c (h p1) (w p2) -> b (c p1 p2) h w", p1=2, p2=2),
nn.Conv2d(dim * 4, default(dim_out, dim), 1),
)
class RMSNorm(nn.Module):
def __init__(self, dim):
super().__init__()
self.g = nn.Parameter(torch.ones(1, dim, 1, 1))
self.eps = 1e-4
def forward(self, x):
return F.normalize(x, dim=1) * self.g * (x.shape[1] ** 0.5)
# building block modules
class Block(nn.Module):
def __init__(self, dim, dim_out, groups=8, num_conv_kernels=0):
super().__init__()
self.proj = AdaptiveConv2DMod(
dim, dim_out, kernel=3, num_conv_kernels=num_conv_kernels
)
self.kernel = 3
self.dilation = 1
self.stride = 1
self.act = nn.SiLU()
def forward(self, x, conv_mods_iter: Optional[Iterable] = None):
conv_mods_iter = default(conv_mods_iter, null_iterator())
x = self.proj(x, mod=next(conv_mods_iter), kernel_mod=next(conv_mods_iter))
x = self.act(x)
return x
class ResnetBlock(nn.Module):
def __init__(
self, dim, dim_out, *, groups=8, num_conv_kernels=0, style_dims: List = []
):
super().__init__()
style_dims.extend([dim, num_conv_kernels, dim_out, num_conv_kernels])
self.block1 = Block(
dim, dim_out, groups=groups, num_conv_kernels=num_conv_kernels
)
self.block2 = Block(
dim_out, dim_out, groups=groups, num_conv_kernels=num_conv_kernels
)
self.res_conv = nn.Conv2d(dim, dim_out, 1) if dim != dim_out else nn.Identity()
def forward(self, x, conv_mods_iter: Optional[Iterable] = None):
h = self.block1(x, conv_mods_iter=conv_mods_iter)
h = self.block2(h, conv_mods_iter=conv_mods_iter)
return h + self.res_conv(x)
class LinearAttention(nn.Module):
def __init__(self, dim, heads=4, dim_head=32):
super().__init__()
self.scale = dim_head**-0.5
self.heads = heads
hidden_dim = dim_head * heads
self.norm = RMSNorm(dim)
self.to_qkv = nn.Conv2d(dim, hidden_dim * 3, 1, bias=False)
self.to_out = nn.Sequential(nn.Conv2d(hidden_dim, dim, 1), RMSNorm(dim))
def forward(self, x):
b, c, h, w = x.shape
x = self.norm(x)
qkv = self.to_qkv(x).chunk(3, dim=1)
q, k, v = map(
lambda t: rearrange(t, "b (h c) x y -> b h c (x y)", h=self.heads), qkv
)
q = q.softmax(dim=-2)
k = k.softmax(dim=-1)
q = q * self.scale
context = torch.einsum("b h d n, b h e n -> b h d e", k, v)
out = torch.einsum("b h d e, b h d n -> b h e n", context, q)
out = rearrange(out, "b h c (x y) -> b (h c) x y", h=self.heads, x=h, y=w)
return self.to_out(out)
class Attention(nn.Module):
def __init__(self, dim, heads=4, dim_head=32, flash=False):
super().__init__()
self.heads = heads
hidden_dim = dim_head * heads
self.norm = RMSNorm(dim)
self.attend = Attend(flash=flash)
self.to_qkv = nn.Conv2d(dim, hidden_dim * 3, 1, bias=False)
self.to_out = nn.Conv2d(hidden_dim, dim, 1)
def forward(self, x):
b, c, h, w = x.shape
x = self.norm(x)
qkv = self.to_qkv(x).chunk(3, dim=1)
q, k, v = map(
lambda t: rearrange(t, "b (h c) x y -> b h (x y) c", h=self.heads), qkv
)
out = self.attend(q, k, v)
out = rearrange(out, "b h (x y) d -> b (h d) x y", x=h, y=w)
return self.to_out(out)
# feedforward
def FeedForward(dim, mult=4):
return nn.Sequential(
RMSNorm(dim),
nn.Conv2d(dim, dim * mult, 1),
nn.GELU(),
nn.Conv2d(dim * mult, dim, 1),
)
# transformers
class Transformer(nn.Module):
def __init__(self, dim, dim_head=64, heads=8, depth=1, flash_attn=True, ff_mult=4):
super().__init__()
self.layers = nn.ModuleList([])
for _ in range(depth):
self.layers.append(
nn.ModuleList(
[
Attention(
dim=dim, dim_head=dim_head, heads=heads, flash=flash_attn
),
FeedForward(dim=dim, mult=ff_mult),
]
)
)
def forward(self, x):
for attn, ff in self.layers:
x = attn(x) + x
x = ff(x) + x
return x
class LinearTransformer(nn.Module):
def __init__(self, dim, dim_head=64, heads=8, depth=1, ff_mult=4):
super().__init__()
self.layers = nn.ModuleList([])
for _ in range(depth):
self.layers.append(
nn.ModuleList(
[
LinearAttention(dim=dim, dim_head=dim_head, heads=heads),
FeedForward(dim=dim, mult=ff_mult),
]
)
)
def forward(self, x):
for attn, ff in self.layers:
x = attn(x) + x
x = ff(x) + x
return x
class NearestNeighborhoodUpsample(nn.Module):
def __init__(self, dim, dim_out=None):
super().__init__()
dim_out = default(dim_out, dim)
self.conv = nn.Conv2d(dim, dim_out, kernel_size=3, stride=1, padding=1)
def forward(self, x):
if x.shape[0] >= 64:
x = x.contiguous()
x = F.interpolate(x, scale_factor=2.0, mode="nearest")
x = self.conv(x)
return x
class EqualLinear(nn.Module):
def __init__(self, dim, dim_out, lr_mul=1, bias=True):
super().__init__()
self.weight = nn.Parameter(torch.randn(dim_out, dim))
if bias:
self.bias = nn.Parameter(torch.zeros(dim_out))
self.lr_mul = lr_mul
def forward(self, input):
return F.linear(input, self.weight * self.lr_mul, bias=self.bias * self.lr_mul)
class StyleGanNetwork(nn.Module):
def __init__(self, dim_in=128, dim_out=512, depth=8, lr_mul=0.1, dim_text_latent=0):
super().__init__()
self.dim_in = dim_in
self.dim_out = dim_out
self.dim_text_latent = dim_text_latent
layers = []
for i in range(depth):
is_first = i == 0
if is_first:
dim_in_layer = dim_in + dim_text_latent
else:
dim_in_layer = dim_out
dim_out_layer = dim_out
layers.extend(
[EqualLinear(dim_in_layer, dim_out_layer, lr_mul), nn.LeakyReLU(0.2)]
)
self.net = nn.Sequential(*layers)
def forward(self, x, text_latent=None):
x = F.normalize(x, dim=1)
if self.dim_text_latent > 0:
assert exists(text_latent)
x = torch.cat((x, text_latent), dim=-1)
return self.net(x)
class UnetUpsampler(torch.nn.Module):
def __init__(
self,
dim: int,
*,
image_size: int,
input_image_size: int,
init_dim: Optional[int] = None,
out_dim: Optional[int] = None,
style_network: Optional[dict] = None,
up_dim_mults: tuple = (1, 2, 4, 8, 16),
down_dim_mults: tuple = (4, 8, 16),
channels: int = 3,
resnet_block_groups: int = 8,
full_attn: tuple = (False, False, False, True, True),
flash_attn: bool = True,
self_attn_dim_head: int = 64,
self_attn_heads: int = 8,
attn_depths: tuple = (2, 2, 2, 2, 4),
mid_attn_depth: int = 4,
num_conv_kernels: int = 4,
resize_mode: str = "bilinear",
unconditional: bool = True,
skip_connect_scale: Optional[float] = None,
):
super().__init__()
self.style_network = style_network = StyleGanNetwork(**style_network)
self.unconditional = unconditional
assert not (
unconditional
and exists(style_network)
and style_network.dim_text_latent > 0
)
assert is_power_of_two(image_size) and is_power_of_two(
input_image_size
), "both output image size and input image size must be power of 2"
assert (
input_image_size < image_size
), "input image size must be smaller than the output image size, thus upsampling"
self.image_size = image_size
self.input_image_size = input_image_size
style_embed_split_dims = []
self.channels = channels
input_channels = channels
init_dim = default(init_dim, dim)
up_dims = [init_dim, *map(lambda m: dim * m, up_dim_mults)]
init_down_dim = up_dims[len(up_dim_mults) - len(down_dim_mults)]
down_dims = [init_down_dim, *map(lambda m: dim * m, down_dim_mults)]
self.init_conv = nn.Conv2d(input_channels, init_down_dim, 7, padding=3)
up_in_out = list(zip(up_dims[:-1], up_dims[1:]))
down_in_out = list(zip(down_dims[:-1], down_dims[1:]))
block_klass = partial(
ResnetBlock,
groups=resnet_block_groups,
num_conv_kernels=num_conv_kernels,
style_dims=style_embed_split_dims,
)
FullAttention = partial(Transformer, flash_attn=flash_attn)
*_, mid_dim = up_dims
self.skip_connect_scale = default(skip_connect_scale, 2**-0.5)
self.downs = nn.ModuleList([])
self.ups = nn.ModuleList([])
block_count = 6
for ind, (
(dim_in, dim_out),
layer_full_attn,
layer_attn_depth,
) in enumerate(zip(down_in_out, full_attn, attn_depths)):
attn_klass = FullAttention if layer_full_attn else LinearTransformer
blocks = []
for i in range(block_count):
blocks.append(block_klass(dim_in, dim_in))
self.downs.append(
nn.ModuleList(
[
nn.ModuleList(blocks),
nn.ModuleList(
[
(
attn_klass(
dim_in,
dim_head=self_attn_dim_head,
heads=self_attn_heads,
depth=layer_attn_depth,
)
if layer_full_attn
else None
),
nn.Conv2d(
dim_in, dim_out, kernel_size=3, stride=2, padding=1
),
]
),
]
)
)
self.mid_block1 = block_klass(mid_dim, mid_dim)
self.mid_attn = FullAttention(
mid_dim,
dim_head=self_attn_dim_head,
heads=self_attn_heads,
depth=mid_attn_depth,
)
self.mid_block2 = block_klass(mid_dim, mid_dim)
*_, last_dim = up_dims
for ind, (
(dim_in, dim_out),
layer_full_attn,
layer_attn_depth,
) in enumerate(
zip(
reversed(up_in_out),
reversed(full_attn),
reversed(attn_depths),
)
):
attn_klass = FullAttention if layer_full_attn else LinearTransformer
blocks = []
input_dim = dim_in * 2 if ind < len(down_in_out) else dim_in
for i in range(block_count):
blocks.append(block_klass(input_dim, dim_in))
self.ups.append(
nn.ModuleList(
[
nn.ModuleList(blocks),
nn.ModuleList(
[
NearestNeighborhoodUpsample(
last_dim if ind == 0 else dim_out,
dim_in,
),
(
attn_klass(
dim_in,
dim_head=self_attn_dim_head,
heads=self_attn_heads,
depth=layer_attn_depth,
)
if layer_full_attn
else None
),
]
),
]
)
)
self.out_dim = default(out_dim, channels)
self.final_res_block = block_klass(dim, dim)
self.final_to_rgb = nn.Conv2d(dim, channels, 1)
self.resize_mode = resize_mode
self.style_to_conv_modulations = nn.Linear(
style_network.dim_out, sum(style_embed_split_dims)
)
self.style_embed_split_dims = style_embed_split_dims
@property
def allowable_rgb_resolutions(self):
input_res_base = int(log2(self.input_image_size))
output_res_base = int(log2(self.image_size))
allowed_rgb_res_base = list(range(input_res_base, output_res_base))
return [*map(lambda p: 2**p, allowed_rgb_res_base)]
@property
def device(self):
return next(self.parameters()).device
@property
def total_params(self):
return sum([p.numel() for p in self.parameters()])
def resize_image_to(self, x, size):
return F.interpolate(x, (size, size), mode=self.resize_mode)
def forward(
self,
lowres_image: torch.Tensor,
styles: Optional[torch.Tensor] = None,
noise: Optional[torch.Tensor] = None,
global_text_tokens: Optional[torch.Tensor] = None,
return_all_rgbs: bool = False,
):
x = lowres_image
noise_scale = 0.001 # Adjust the scale of the noise as needed
noise_aug = torch.randn_like(x) * noise_scale
x = x + noise_aug
x = x.clamp(0, 1)
shape = x.shape
batch_size = shape[0]
assert shape[-2:] == ((self.input_image_size,) * 2)
# styles
if not exists(styles):
assert exists(self.style_network)
noise = default(
noise,
torch.randn(
(batch_size, self.style_network.dim_in), device=self.device
),
)
styles = self.style_network(noise, global_text_tokens)
# project styles to conv modulations
conv_mods = self.style_to_conv_modulations(styles)
conv_mods = conv_mods.split(self.style_embed_split_dims, dim=-1)
conv_mods = iter(conv_mods)
x = self.init_conv(x)
h = []
for blocks, (attn, downsample) in self.downs:
for block in blocks:
x = block(x, conv_mods_iter=conv_mods)
h.append(x)
if attn is not None:
x = attn(x)
x = downsample(x)
x = self.mid_block1(x, conv_mods_iter=conv_mods)
x = self.mid_attn(x)
x = self.mid_block2(x, conv_mods_iter=conv_mods)
for (
blocks,
(
upsample,
attn,
),
) in self.ups:
x = upsample(x)
for block in blocks:
if h != []:
res = h.pop()
res = res * self.skip_connect_scale
x = torch.cat((x, res), dim=1)
x = block(x, conv_mods_iter=conv_mods)
if attn is not None:
x = attn(x)
x = self.final_res_block(x, conv_mods_iter=conv_mods)
rgb = self.final_to_rgb(x)
if not return_all_rgbs:
return rgb
return rgb, []
def tile_image(image, chunk_size=64):
c, h, w = image.shape
h_chunks = ceil(h / chunk_size)
w_chunks = ceil(w / chunk_size)
tiles = []
for i in range(h_chunks):
for j in range(w_chunks):
tile = image[:, i * chunk_size:(i + 1) * chunk_size, j * chunk_size:(j + 1) * chunk_size]
tiles.append(tile)
return tiles, h_chunks, w_chunks
def merge_tiles(tiles, h_chunks, w_chunks, chunk_size=64):
# Determine the shape of the output tensor
c = tiles[0].shape[0]
h = h_chunks * chunk_size
w = w_chunks * chunk_size
# Create an empty tensor to hold the merged image
merged = torch.zeros((c, h, w), dtype=tiles[0].dtype)
# Iterate over the tiles and place them in the correct position
for idx, tile in enumerate(tiles):
i = idx // w_chunks
j = idx % w_chunks
h_start = i * chunk_size
w_start = j * chunk_size
tile_h, tile_w = tile.shape[1:]
merged[:, h_start:h_start+tile_h, w_start:w_start+tile_w] = tile
return merged
class AuraSR:
def __init__(self, config: dict[str, Any], device: str = "cuda"):
self.upsampler = UnetUpsampler(**config).to(device)
self.input_image_size = config["input_image_size"]
@classmethod
def from_pretrained(cls, model_id: str = "fal-ai/AuraSR", use_safetensors: bool = True):
import json
import torch
from pathlib import Path
from huggingface_hub import snapshot_download
# Check if model_id is a local file
if Path(model_id).is_file():
local_file = Path(model_id)
if local_file.suffix == '.safetensors':
use_safetensors = True
elif local_file.suffix == '.ckpt':
use_safetensors = False
else:
raise ValueError(f"Unsupported file format: {local_file.suffix}. Please use .safetensors or .ckpt files.")
# For local files, we need to provide the config separately
config_path = local_file.with_name('config.json')
if not config_path.exists():
raise FileNotFoundError(
f"Config file not found: {config_path}. "
f"When loading from a local file, ensure that 'config.json' "
f"is present in the same directory as '{local_file.name}'. "
f"If you're trying to load a model from Hugging Face, "
f"please provide the model ID instead of a file path."
)
config = json.loads(config_path.read_text())
hf_model_path = local_file.parent
else:
hf_model_path = Path(snapshot_download(model_id))
config = json.loads((hf_model_path / "config.json").read_text())
model = cls(config)
if use_safetensors:
try:
from safetensors.torch import load_file
checkpoint = load_file(hf_model_path / "model.safetensors" if not Path(model_id).is_file() else model_id)
except ImportError:
raise ImportError(
"The safetensors library is not installed. "
"Please install it with `pip install safetensors` "
"or use `use_safetensors=False` to load the model with PyTorch."
)
else:
checkpoint = torch.load(hf_model_path / "model.ckpt" if not Path(model_id).is_file() else model_id)
model.upsampler.load_state_dict(checkpoint, strict=True)
return model
@torch.no_grad()
def upscale_4x(self, image: Image.Image, max_batch_size=8) -> Image.Image:
tensor_transform = transforms.ToTensor()
device = self.upsampler.device
image_tensor = tensor_transform(image).unsqueeze(0)
_, _, h, w = image_tensor.shape
pad_h = (self.input_image_size - h % self.input_image_size) % self.input_image_size
pad_w = (self.input_image_size - w % self.input_image_size) % self.input_image_size
# Pad the image
image_tensor = torch.nn.functional.pad(image_tensor, (0, pad_w, 0, pad_h), mode='reflect').squeeze(0)
tiles, h_chunks, w_chunks = tile_image(image_tensor, self.input_image_size)
# Batch processing of tiles
num_tiles = len(tiles)
batches = [tiles[i:i + max_batch_size] for i in range(0, num_tiles, max_batch_size)]
reconstructed_tiles = []
for batch in batches:
model_input = torch.stack(batch).to(device)
generator_output = self.upsampler(
lowres_image=model_input,
noise=torch.randn(model_input.shape[0], 128, device=device)
)
reconstructed_tiles.extend(list(generator_output.clamp_(0, 1).detach().cpu()))
merged_tensor = merge_tiles(reconstructed_tiles, h_chunks, w_chunks, self.input_image_size * 4)
unpadded = merged_tensor[:, :h * 4, :w * 4]
to_pil = transforms.ToPILImage()
return to_pil(unpadded)

View File

@ -0,0 +1,37 @@
import torch
import diffusers
from PIL import Image
from modules import shared, devices
from modules.upscaler import Upscaler, UpscalerData
from installer import install
class UpscalerAuraSR(Upscaler):
def __init__(self, dirname): # pylint: disable=super-init-not-called
self.name = "AuraSR"
self.user_path = dirname
self.model = None
if not shared.native:
super().__init__()
return
self.scalers = [
UpscalerData(name="Aura SR 4x", path="stabilityai/sd-x2-latent-upscaler", upscaler=self, model=None, scale=4),
]
def callback(self, _step: int, _timestep: int, _latents: torch.FloatTensor):
pass
def do_upscale(self, img: Image.Image, selected_model):
from modules.postprocess.aurasr_arch import AuraSR
if self.model is None:
self.model = AuraSR.from_pretrained("vladmandic/aurasr", use_safetensors=False)
devices.torch_gc()
self.model.upsampler.to(devices.device)
image = self.model.upscale_4x(img)
self.model.upsampler.to(devices.cpu)
if shared.opts.upscaler_unload and selected_model in self.models:
self.model = None
shared.log.debug(f"Upscaler unloaded: type={self.name} model={selected_model}")
devices.torch_gc(force=True)
return image

View File

@ -389,6 +389,8 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
if hasattr(shared.sd_model, 'restore_pipeline') and shared.sd_model.restore_pipeline is not None:
shared.sd_model.restore_pipeline()
if shared.native: # reset pipeline for each iteration
shared.sd_model = sd_models.set_diffuser_pipe(shared.sd_model, sd_models.DiffusersTaskType.TEXT_2_IMAGE)
t1 = time.time()
shared.log.info(f'Processed: images={len(output_images)} time={t1 - t0:.2f} its={(p.steps * len(output_images)) / (t1 - t0):.2f} memory={memstats.memory_stats()}')

View File

@ -38,7 +38,8 @@ def diffusers_callback(pipe, step: int, timestep: int, kwargs: dict):
return kwargs
latents = kwargs.get('latents', None)
debug_callback(f'Callback: step={step} timestep={timestep} latents={latents.shape if latents is not None else None} kwargs={list(kwargs)}')
shared.state.sampling_step = step
order = getattr(pipe.scheduler, "order", 1) if hasattr(pipe, 'scheduler') else 1
shared.state.sampling_step = step // order
if shared.state.interrupted or shared.state.skipped:
raise AssertionError('Interrupted...')
if shared.state.paused:

View File

@ -85,7 +85,7 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
shared.sd_model = update_pipeline(shared.sd_model, p)
shared.log.info(f'Base: class={shared.sd_model.__class__.__name__}')
update_sampler(p, shared.sd_model) # TODO SD3
update_sampler(p, shared.sd_model)
base_args = set_pipeline_args(
p=p,
model=shared.sd_model,
@ -104,7 +104,7 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
clip_skip=p.clip_skip,
desc='Base',
)
shared.state.sampling_steps = base_args.get('prior_num_inference_steps', None) or base_args.get('num_inference_steps', None) or p.steps
shared.state.sampling_steps = base_args.get('prior_num_inference_steps', None) or p.steps or base_args.get('num_inference_steps', None)
if shared.opts.scheduler_eta is not None and shared.opts.scheduler_eta > 0 and shared.opts.scheduler_eta < 1:
p.extra_generation_params["Sampler Eta"] = shared.opts.scheduler_eta
output = None
@ -215,7 +215,7 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
desc='Hires',
)
shared.state.job = 'HiRes'
shared.state.sampling_steps = hires_args.get('prior_num_inference_steps', None) or hires_args.get('num_inference_steps', None) or p.steps
shared.state.sampling_steps = hires_args.get('prior_num_inference_steps', None) or p.steps or hires_args.get('num_inference_steps', None)
try:
sd_models_compile.check_deepcache(enable=True)
output = shared.sd_model(**hires_args) # pylint: disable=not-callable
@ -280,7 +280,7 @@ def process_diffusers(p: processing.StableDiffusionProcessing):
clip_skip=p.clip_skip,
desc='Refiner',
)
shared.state.sampling_steps = refiner_args.get('prior_num_inference_steps', None) or refiner_args.get('num_inference_steps', None) or p.steps
shared.state.sampling_steps = refiner_args.get('prior_num_inference_steps', None) or p.steps or refiner_args.get('num_inference_steps', None)
try:
if 'requires_aesthetics_score' in shared.sd_refiner.config: # sdxl-model needs false and sdxl-refiner needs true
shared.sd_refiner.register_to_config(requires_aesthetics_score = getattr(shared.sd_refiner, 'tokenizer', None) is None)

View File

@ -62,7 +62,7 @@ def progressapi(req: ProgressRequest):
paused = shared.state.paused
if not active:
return InternalProgressResponse(job=shared.state.job, active=active, queued=queued, paused=paused, completed=completed, id_live_preview=-1, textinfo="Queued..." if queued else "Waiting...")
shared.state.job_count = max(shared.state.job_count, shared.state.job_no)
shared.state.job_count = max(shared.state.frame_count, shared.state.job_count, shared.state.job_no)
batch_x = max(shared.state.job_no, 0)
batch_y = max(shared.state.job_count, 1)
step_x = max(shared.state.sampling_step, 0)

View File

@ -389,7 +389,7 @@ def get_weighted_text_embeddings(pipe, prompt: str = "", neg_prompt: str = "", c
except Exception:
pooled_prompt_embeds = None
negative_pooled_prompt_embeds = None
debug(f'Prompt: pooled shape={pooled_prompt_embeds[0].shape} time={(time.time() - t0):.3f}')
debug(f'Prompt: pooled shape={pooled_prompt_embeds[0].shape if pooled_prompt_embeds is not None else None} time={(time.time() - t0):.3f}')
prompt_embeds = torch.cat(prompt_embeds, dim=-1) if len(prompt_embeds) > 1 else prompt_embeds[0]
negative_prompt_embeds = torch.cat(negative_prompt_embeds, dim=-1) if len(negative_prompt_embeds) > 1 else \

View File

@ -489,10 +489,9 @@ class ScriptRunner:
s = ScriptSummary('before-process')
for script in self.alwayson_scripts:
try:
args = p.script_args[script.args_from:script.args_to]
if len(args) == 0:
continue
script.before_process(p, *args, **kwargs)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
script.before_process(p, *args, **kwargs)
except Exception as e:
errors.display(e, f"Error running before process: {script.filename}")
s.record(script.title())
@ -502,10 +501,9 @@ class ScriptRunner:
s = ScriptSummary('process')
for script in self.alwayson_scripts:
try:
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
if len(args) == 0:
continue
script.process(p, *args, **kwargs)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
script.process(p, *args, **kwargs)
except Exception as e:
errors.display(e, f'Running script process: {script.filename}')
s.record(script.title())
@ -516,10 +514,9 @@ class ScriptRunner:
processed = None
for script in self.alwayson_scripts:
try:
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
if len(args) == 0:
continue
processed = script.process_images(p, *args, **kwargs)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
processed = script.process_images(p, *args, **kwargs)
except Exception as e:
errors.display(e, f'Running script process images: {script.filename}')
s.record(script.title())
@ -530,10 +527,9 @@ class ScriptRunner:
s = ScriptSummary('before-process-batch')
for script in self.alwayson_scripts:
try:
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
if len(args) == 0:
continue
script.before_process_batch(p, *args, **kwargs)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
script.before_process_batch(p, *args, **kwargs)
except Exception as e:
errors.display(e, f'Running script before process batch: {script.filename}')
s.record(script.title())
@ -543,10 +539,9 @@ class ScriptRunner:
s = ScriptSummary('process-batch')
for script in self.alwayson_scripts:
try:
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
if len(args) == 0:
continue
script.process_batch(p, *args, **kwargs)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
script.process_batch(p, *args, **kwargs)
except Exception as e:
errors.display(e, f'Running script process batch: {script.filename}')
s.record(script.title())
@ -556,10 +551,9 @@ class ScriptRunner:
s = ScriptSummary('postprocess')
for script in self.alwayson_scripts:
try:
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
if len(args) == 0:
continue
script.postprocess(p, processed, *args)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
script.postprocess(p, processed, *args)
except Exception as e:
errors.display(e, f'Running script postprocess: {script.filename}')
s.record(script.title())
@ -569,10 +563,9 @@ class ScriptRunner:
s = ScriptSummary('postprocess-batch')
for script in self.alwayson_scripts:
try:
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
if len(args) == 0:
continue
script.postprocess_batch(p, *args, images=images, **kwargs)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
script.postprocess_batch(p, *args, images=images, **kwargs)
except Exception as e:
errors.display(e, f'Running script before postprocess batch: {script.filename}')
s.record(script.title())
@ -582,10 +575,9 @@ class ScriptRunner:
s = ScriptSummary('postprocess-batch-list')
for script in self.alwayson_scripts:
try:
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
if len(args) == 0:
continue
script.postprocess_batch_list(p, pp, *args, **kwargs)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
script.postprocess_batch_list(p, pp, *args, **kwargs)
except Exception as e:
errors.display(e, f'Running script before postprocess batch list: {script.filename}')
s.record(script.title())
@ -595,10 +587,9 @@ class ScriptRunner:
s = ScriptSummary('postprocess-image')
for script in self.alwayson_scripts:
try:
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
if len(args) == 0:
continue
script.postprocess_image(p, pp, *args)
if (script.args_to > 0) and (script.args_to >= script.args_from):
args = p.per_script_args.get(script.title(), p.script_args[script.args_from:script.args_to])
script.postprocess_image(p, pp, *args)
except Exception as e:
errors.display(e, f'Running script postprocess image: {script.filename}')
s.record(script.title())

View File

@ -547,7 +547,7 @@ def change_backend():
shared.native = shared.backend == shared.Backend.DIFFUSERS
checkpoints_loaded.clear()
from modules.sd_samplers import list_samplers
list_samplers(shared.backend)
list_samplers()
list_models()
from modules.sd_vae import refresh_vae_list
refresh_vae_list()
@ -586,7 +586,7 @@ def detect_pipeline(f: str, op: str = 'model', warning=True, quiet=False):
guess = 'Stable Diffusion XL Instruct'
elif (size > 3138 and size < 3142): #3140
guess = 'Stable Diffusion XL'
elif (size > 5692 and size < 5698) or (size > 4134 and size < 4138) or (size > 10362 and size < 10366):
elif (size > 5692 and size < 5698) or (size > 4134 and size < 4138) or (size > 10362 and size < 10366) or (size > 15028 and size < 15228):
guess = 'Stable Diffusion 3'
# guess by name
"""
@ -611,6 +611,10 @@ def detect_pipeline(f: str, op: str = 'model', warning=True, quiet=False):
guess = 'Stable Cascade'
if 'pixart-sigma' in f.lower():
guess = 'PixArt-Sigma'
if 'lumina-next' in f.lower():
guess = 'Lumina-Next'
if 'kolors' in f.lower():
guess = 'Kolors'
# switch for specific variant
if guess == 'Stable Diffusion' and 'inpaint' in f.lower():
guess = 'Stable Diffusion Inpaint'
@ -992,6 +996,24 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No
if debug_load:
errors.display(e, 'Load')
return
elif model_type in ['Lumina-Next']: # forced pipeline
try:
from modules.model_lumina import load_lumina
sd_model = load_lumina(checkpoint_info, diffusers_load_config)
except Exception as e:
shared.log.error(f'Diffusers Failed loading {op}: {checkpoint_info.path} {e}')
if debug_load:
errors.display(e, 'Load')
return
elif model_type in ['Kolors']: # forced pipeline
try:
from modules.model_kolors import load_kolors
sd_model = load_kolors(checkpoint_info, diffusers_load_config)
except Exception as e:
shared.log.error(f'Diffusers Failed loading {op}: {checkpoint_info.path} {e}')
if debug_load:
errors.display(e, 'Load')
return
elif model_type in ['Stable Diffusion 3']:
try:
from modules.model_sd3 import load_sd3
@ -1150,7 +1172,7 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No
timer.record("options")
set_diffuser_offload(sd_model, op)
if op == 'model':
if op == 'model' and not (os.path.isdir(checkpoint_info.path) or checkpoint_info.type == 'huggingface'):
sd_vae.apply_vae_config(shared.sd_model.sd_checkpoint_info.filename, vae_file, sd_model)
if op == 'refiner' and shared.opts.diffusers_move_refiner:
shared.log.debug('Moving refiner model to CPU')

View File

@ -309,6 +309,9 @@ def check_deepcache(enable: bool):
def compile_deepcache(sd_model):
global deepcache_worker # pylint: disable=global-statement
if not hasattr(sd_model, 'unet'):
shared.log.warning(f'Model compile using deep-cache: {sd_model.__class__} not supported')
return sd_model
try:
from DeepCache import DeepCacheSDHelper
except Exception as e:

View File

@ -14,7 +14,7 @@ samplers_map = {}
loaded_config = None
def list_samplers(backend_name = shared.backend):
def list_samplers():
global all_samplers # pylint: disable=global-statement
global all_samplers_map # pylint: disable=global-statement
global samplers # pylint: disable=global-statement

View File

@ -1,5 +1,5 @@
# TODO a1111 compatibility module
# TODO cfg_denoiser implementation missing
# a1111 compatibility module
# cfg_denoiser implementation missing
import torch
from modules import prompt_parser, devices, sd_samplers_common
@ -95,7 +95,7 @@ class CFGDenoiser(torch.nn.Module):
if state.interrupted or state.skipped:
raise sd_samplers_common.InterruptedException
# TODO cfg_scale implementation missing
# cfg_scale implementation missing for original backend
# if sd_samplers_common.apply_refiner(self):
# cond = self.sampler.sampler_extra_args['cond']
# uncond = self.sampler.sampler_extra_args['uncond']

View File

@ -33,6 +33,7 @@ try:
PNDMScheduler,
SASolverScheduler,
FlowMatchEulerDiscreteScheduler,
# FlowMatchHeunDiscreteScheduler,
)
except Exception as e:
import diffusers
@ -67,6 +68,7 @@ config = {
'DPM++ 2M EDM': { 'solver_order': 2, 'solver_type': 'midpoint', 'final_sigmas_type': 'zero', 'algorithm_type': 'dpmsolver++' },
'CMSI': { }, #{ 'sigma_min': 0.002, 'sigma_max': 80.0, 'sigma_data': 0.5, 's_noise': 1.0, 'rho': 7.0, 'clip_denoised': True },
'Euler FlowMatch': { 'shift': 1, },
# 'Heun FlowMatch': { 'shift': 1, },
'IPNDM': { },
}
@ -99,6 +101,7 @@ samplers_data_diffusers = [
sd_samplers_common.SamplerData('TCD', lambda model: DiffusionSampler('TCD', TCDScheduler, model), [], {}),
sd_samplers_common.SamplerData('CMSI', lambda model: DiffusionSampler('CMSI', CMStochasticIterativeScheduler, model), [], {}),
sd_samplers_common.SamplerData('Euler FlowMatch', lambda model: DiffusionSampler('Euler FlowMatch', FlowMatchEulerDiscreteScheduler, model), [], {}),
# sd_samplers_common.SamplerData('Heun FlowMatch', lambda model: DiffusionSampler('Heun FlowMatch', FlowMatchHeunDiscreteScheduler, model), [], {}),
sd_samplers_common.SamplerData('Same as primary', None, [], {}),
]

View File

@ -1,4 +1,4 @@
# TODO a1111 compatibility module
# a1111 compatibility module
import torch
from modules import sd_samplers_common, sd_samplers_timesteps_impl, sd_samplers_compvis

View File

@ -1,4 +1,4 @@
# TODO a1111 compatibility module
# a1111 compatibility module
import torch
import tqdm

View File

@ -71,15 +71,20 @@ def get_pipelines():
'Kandinsky 3': getattr(diffusers, 'Kandinsky3Pipeline', None),
'DeepFloyd IF': getattr(diffusers, 'IFPipeline', None),
'Custom Diffusers Pipeline': getattr(diffusers, 'DiffusionPipeline', None),
'Kolors': getattr(diffusers, 'StableDiffusionXLPipeline', None),
'InstaFlow': getattr(diffusers, 'StableDiffusionPipeline', None), # dynamically redefined and loaded in sd_models.load_diffuser
'SegMoE': getattr(diffusers, 'StableDiffusionPipeline', None), # dynamically redefined and loaded in sd_models.load_diffuser
}
if hasattr(diffusers, 'OnnxStableDiffusionXLPipeline'):
if hasattr(diffusers, 'OnnxStableDiffusionPipeline'):
onnx_pipelines = {
'ONNX Stable Diffusion': getattr(diffusers, 'OnnxStableDiffusionPipeline', None),
'ONNX Stable Diffusion Img2Img': getattr(diffusers, 'OnnxStableDiffusionImg2ImgPipeline', None),
'ONNX Stable Diffusion Inpaint': getattr(diffusers, 'OnnxStableDiffusionInpaintPipeline', None),
'ONNX Stable Diffusion Upscale': getattr(diffusers, 'OnnxStableDiffusionUpscalePipeline', None),
}
pipelines.update(onnx_pipelines)
if hasattr(diffusers, 'OnnxStableDiffusionXLPipeline'):
onnx_pipelines = {
'ONNX Stable Diffusion XL': getattr(diffusers, 'OnnxStableDiffusionXLPipeline', None),
'ONNX Stable Diffusion XL Img2Img': getattr(diffusers, 'OnnxStableDiffusionXLImg2ImgPipeline', None),
}
@ -95,6 +100,8 @@ def get_pipelines():
if hasattr(diffusers, 'StableDiffusion3Pipeline'):
pipelines['Stable Diffusion 3'] = getattr(diffusers, 'StableDiffusion3Pipeline', None)
pipelines['Stable Diffusion 3 Img2Img'] = getattr(diffusers, 'StableDiffusion3Img2ImgPipeline', None)
if hasattr(diffusers, 'LuminaText2ImgPipeline'):
pipelines['Lumina-Next'] = getattr(diffusers, 'LuminaText2ImgPipeline', None)
for k, v in pipelines.items():
if k != 'Autodetect' and v is None:

View File

@ -12,6 +12,7 @@ class State:
job = ""
job_no = 0
job_count = 0
frame_count = 0
total_jobs = 0
job_timestamp = '0'
sampling_step = 0
@ -71,6 +72,7 @@ class State:
self.interrupted = False
self.job = title
self.job_count = -1
self.frame_count = -1
self.job_no = 0
self.job_timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
self.paused = False
@ -93,6 +95,7 @@ class State:
self.job = ""
self.job_count = 0
self.job_no = 0
self.frame_count = 0
self.paused = False
self.interrupted = False
self.skipped = False

View File

@ -1,7 +1,6 @@
from typing import List, Union
import os
import time
from collections import namedtuple
import torch
import safetensors.torch
from PIL import Image
@ -12,7 +11,6 @@ from modules.files_cache import directory_files, directory_mtime, extension_filt
debug = shared.log.trace if os.environ.get('SD_TI_DEBUG', None) is not None else lambda *args, **kwargs: None
debug('Trace: TEXTUAL INVERSION')
TokenToAdd = namedtuple("TokenToAdd", ["clip_l", "clip_g"])
def list_embeddings(*dirs):
@ -21,6 +19,134 @@ def list_embeddings(*dirs):
return list(filter(lambda fp: is_ext(fp) and is_not_preview(fp) and os.stat(fp).st_size > 0, directory_files(*dirs)))
def open_embeddings(filename):
"""
Load Embedding files from drive. Image embeddings not currently supported.
"""
if filename is None:
return
filenames = list(filename)
exts = [".SAFETENSORS", '.BIN', '.PT']
embeddings = []
skipped = []
for _filename in filenames:
# debug(f'Embedding check: {filename}')
fullname = _filename
_filename = os.path.basename(fullname)
fn, ext = os.path.splitext(_filename)
name = os.path.basename(fn)
embedding = Embedding(vec=[], name=name, filename=fullname)
try:
if ext.upper() not in exts:
debug(f'extension `{ext}` is invalid, expected one of: {exts}')
skipped.append(name)
continue
if ext.upper() in ['.SAFETENSORS']:
with safetensors.torch.safe_open(embedding.filename, framework="pt") as f: # type: ignore
for k in f.keys():
embedding.vec.append(f.get_tensor(k))
else: # fallback for sd1.5 pt embeddings
vectors = torch.load(fullname, map_location=devices.device)["string_to_param"]["*"]
embedding.vec.append(vectors)
embedding.tokens = [embedding.name if i == 0 else f"{embedding.name}_{i}" for i in range(len(embedding.vec[0]))]
except Exception as e:
debug(f"Could not load embedding file {fullname} {e}")
if embedding.vec:
embeddings.append(embedding)
else:
skipped.append(embedding)
return embeddings, skipped
def convert_bundled(data):
"""
Bundled embeddings are passed as a dict from lora loading, convert to Embedding objects and pass back as list.
"""
embeddings = []
for key in data.keys():
embedding = Embedding(vec=[], name=key, filename=None)
for vector in data[key].values():
embedding.vec.append(vector)
embedding.tokens = [embedding.name if i == 0 else f"{embedding.name}_{i}" for i in range(len(embedding.vec[0]))]
embeddings.append(embedding)
return embeddings, []
def get_text_encoders():
"""
Select all text encoder and tokenizer pairs from known pipelines, and index them based on the dimensionality of
their embedding layers.
"""
pipe = shared.sd_model
te_names = ["text_encoder", "text_encoder_2", "text_encoder_3"]
tokenizers_names = ["tokenizer", "tokenizer_2", "tokenizer_3"]
text_encoders = []
tokenizers = []
hidden_sizes = []
for te, tok in zip(te_names, tokenizers_names):
text_encoder = getattr(pipe, te, None)
if text_encoder is None:
continue
tokenizer = getattr(pipe, tok, None)
hidden_size = text_encoder.get_input_embeddings().weight.data.shape[-1] or None
if all([text_encoder, tokenizer, hidden_size]):
text_encoders.append(text_encoder)
tokenizers.append(tokenizer)
hidden_sizes.append(hidden_size)
return text_encoders, tokenizers, hidden_sizes
def deref_tokenizers(tokens, tokenizers):
"""
Bundled embeddings may have the same name as a seperately loaded embedding, or there may be multiple LoRA with
differing numbers of vectors. By editing the AddedToken objects, and deleting the dict keys pointing to them,
we can ensure that a smaller embedding will not get tokenized as itself, plus the remaining vectors of the previous.
"""
for tokenizer in tokenizers:
if len(tokens) > 1:
last_token = tokens[-1]
suffix = int(last_token.split("_")[-1])
newsuffix = suffix + 1
while last_token.replace(str(suffix), str(newsuffix)) in tokenizer.get_vocab():
idx = tokenizer.convert_tokens_to_ids(last_token.replace(str(suffix), str(newsuffix)))
debug(f"Textual inversion: deref idx={idx}")
del tokenizer._added_tokens_encoder[last_token.replace(str(suffix), str(newsuffix))] # pylint: disable=protected-access
tokenizer._added_tokens_decoder[idx].content = str(time.time()) # pylint: disable=protected-access
newsuffix += 1
def insert_tokens(embeddings: list, tokenizers: list):
"""
Add all tokens to each tokenizer in the list, with one call to each.
"""
tokens = []
for embedding in embeddings:
tokens += embedding.tokens
for tokenizer in tokenizers:
tokenizer.add_tokens(tokens)
def insert_vectors(embedding, tokenizers, text_encoders, hiddensizes):
"""
Insert embeddings into the input embedding layer of a list of text encoders, matched based on embedding size,
not by name.
Future warning, if another text encoder becomes available with embedding dimensions in [768,1280,4096]
this may cause collisions.
"""
for vector, size in zip(embedding.vec, embedding.vector_sizes):
if size not in hiddensizes:
continue
idx = hiddensizes.index(size)
unk_token_id = tokenizers[idx].convert_tokens_to_ids(tokenizers[idx].unk_token)
if text_encoders[idx].get_input_embeddings().weight.data.shape[0] != len(tokenizers[idx]):
text_encoders[idx].resize_token_embeddings(len(tokenizers[idx]))
for token, v in zip(embedding.tokens, vector.unbind()):
token_id = tokenizers[idx].convert_tokens_to_ids(token)
if token_id > unk_token_id:
text_encoders[idx].get_input_embeddings().weight.data[token_id] = v
class Embedding:
def __init__(self, vec, name, filename=None, step=None):
self.vec = vec
@ -35,6 +161,7 @@ class Embedding:
self.sd_checkpoint = None
self.sd_checkpoint_name = None
self.optimizer_state_dict = None
self.tokens = None
def save(self, filename):
embedding_data = {
@ -82,6 +209,10 @@ class DirWithTextualInversionEmbeddings:
def convert_embedding(tensor, text_encoder, text_encoder_2):
"""
Given a tensor of shape (b, embed_dim) and two text encoders whose tokenizers match, return a tensor with
approximately mathcing meaning, or padding if the input tensor is dissimilar to any frozen text embed
"""
with torch.no_grad():
vectors = []
clip_l_embeds = text_encoder.get_input_embeddings().weight.data.clone().to(device=devices.device)
@ -91,7 +222,7 @@ def convert_embedding(tensor, text_encoder, text_encoder_2):
if values < 0.707: # Arbitrary similarity to cutoff, here 45 degrees
indices *= 0 # Use SDXL padding vector 0
vectors.append(indices)
vectors = torch.stack(vectors)
vectors = torch.stack(vectors).to(text_encoder_2.device)
output = text_encoder_2.get_input_embeddings().weight.data[vectors]
return output
@ -135,123 +266,48 @@ class EmbeddingDatabase:
vec = shared.sd_model.cond_stage_model.encode_embedding_init_text(",", 1)
return vec.shape[1]
def load_diffusers_embedding(self, filename: Union[str, List[str]]):
_loaded_pre = len(self.word_embeddings)
embeddings_to_load = []
loaded_embeddings = {}
skipped_embeddings = []
def load_diffusers_embedding(self, filename: Union[str, List[str]] = None, data: dict = None):
"""
File names take precidence over bundled embeddings passed as a dict.
Bundled embeddings are automatically set to overwrite previous embeddings.
"""
overwrite = bool(data)
if not shared.sd_loaded:
return 0
tokenizer = getattr(shared.sd_model, 'tokenizer', None)
tokenizer_2 = getattr(shared.sd_model, 'tokenizer_2', None)
clip_l = getattr(shared.sd_model, 'text_encoder', None)
clip_g = getattr(shared.sd_model, 'text_encoder_2', None)
if clip_g and tokenizer_2:
model_type = 'SDXL'
elif clip_l and tokenizer:
model_type = 'SD'
else:
embeddings, skipped = open_embeddings(filename) or convert_bundled(data)
for skip in skipped:
self.skipped_embeddings[skip.name] = skipped
if not embeddings:
return 0
filenames = list(filename)
exts = [".SAFETENSORS", '.BIN', '.PT', '.PNG', '.WEBP', '.JXL', '.AVIF']
for _filename in filenames:
# debug(f'Embedding check: {filename}')
fullname = _filename
_filename = os.path.basename(fullname)
fn, ext = os.path.splitext(_filename)
name = os.path.basename(fn)
embedding = Embedding(vec=None, name=name, filename=fullname)
tokenizer_vocab = tokenizer.get_vocab()
try:
if ext.upper() not in exts:
raise ValueError(f'extension `{ext}` is invalid, expected one of: {exts}')
if name in tokenizer.get_vocab() or f"{name}_1" in tokenizer.get_vocab():
loaded_embeddings[name] = embedding
debug(f'Embedding already loaded: {name}')
embeddings_to_load.append(embedding)
except Exception as e:
skipped_embeddings.append(embedding)
debug(f'Embedding skipped: "{name}" {e}')
continue
embeddings_to_load = sorted(embeddings_to_load, key=lambda e: exts.index(os.path.splitext(e.filename)[1].upper()))
tokens_to_add = {}
for embedding in embeddings_to_load:
try:
if embedding.name in tokens_to_add or embedding.name in loaded_embeddings:
raise ValueError('duplicate token')
embeddings_dict = {}
_, ext = os.path.splitext(embedding.filename)
if ext.upper() in ['.SAFETENSORS']:
with safetensors.torch.safe_open(embedding.filename, framework="pt") as f: # type: ignore
for k in f.keys():
embeddings_dict[k] = f.get_tensor(k)
else: # fallback for sd1.5 pt embeddings
embeddings_dict["clip_l"] = self.load_from_file(embedding.filename, embedding.filename)
if 'emb_params' in embeddings_dict and 'clip_l' not in embeddings_dict:
embeddings_dict["clip_l"] = embeddings_dict["emb_params"]
if 'clip_l' not in embeddings_dict:
raise ValueError('Invalid Embedding, dict missing required key `clip_l`')
if 'clip_g' not in embeddings_dict and model_type == "SDXL" and shared.opts.diffusers_convert_embed:
embeddings_dict["clip_g"] = convert_embedding(embeddings_dict["clip_l"], clip_l, clip_g)
if 'clip_g' in embeddings_dict:
embedding_type = 'SDXL'
else:
embedding_type = 'SD'
if embedding_type != model_type:
raise ValueError(f'Unable to load {embedding_type} Embedding "{embedding.name}" into {model_type} Model')
_tokens_to_add = {}
for i in range(len(embeddings_dict["clip_l"])):
if len(clip_l.get_input_embeddings().weight.data[0]) == len(embeddings_dict["clip_l"][i]):
token = embedding.name if i == 0 else f"{embedding.name}_{i}"
if token in tokenizer_vocab:
raise RuntimeError(f'Multi-Vector Embedding would add pre-existing Token in Vocabulary: {token}')
if token in tokens_to_add:
raise RuntimeError(f'Multi-Vector Embedding would add duplicate Token to Add: {token}')
_tokens_to_add[token] = TokenToAdd(
embeddings_dict["clip_l"][i],
embeddings_dict["clip_g"][i] if 'clip_g' in embeddings_dict else None
)
if not _tokens_to_add:
raise ValueError('no valid tokens to add')
tokens_to_add.update(_tokens_to_add)
loaded_embeddings[embedding.name] = embedding
except Exception as e:
debug(f"Embedding loading: {embedding.filename} {e}")
continue
if len(tokens_to_add) > 0:
tokenizer.add_tokens(list(tokens_to_add.keys()))
clip_l.resize_token_embeddings(len(tokenizer))
if model_type == 'SDXL':
tokenizer_2.add_tokens(list(tokens_to_add.keys())) # type: ignore
clip_g.resize_token_embeddings(len(tokenizer_2)) # type: ignore
unk_token_id = tokenizer.convert_tokens_to_ids(tokenizer.unk_token)
for token, data in tokens_to_add.items():
token_id = tokenizer.convert_tokens_to_ids(token)
if token_id > unk_token_id:
clip_l.get_input_embeddings().weight.data[token_id] = data.clip_l
if model_type == 'SDXL':
clip_g.get_input_embeddings().weight.data[token_id] = data.clip_g # type: ignore
for embedding in loaded_embeddings.values():
if not embedding:
continue
self.register_embedding(embedding, shared.sd_model)
if embedding in embeddings_to_load:
embeddings_to_load.remove(embedding)
skipped_embeddings.extend(embeddings_to_load)
for embedding in skipped_embeddings:
if loaded_embeddings.get(embedding.name, None) == embedding:
continue
self.skipped_embeddings[embedding.name] = embedding
try:
if model_type == 'SD':
debug(f"Embeddings loaded: text-encoder={shared.sd_model.text_encoder.get_input_embeddings().weight.data.shape[0]}")
if model_type == 'SDXL':
debug(f"Embeddings loaded: text-encoder-1={shared.sd_model.text_encoder.get_input_embeddings().weight.data.shape[0]} text-encoder-2={shared.sd_model.text_encoder_2.get_input_embeddings().weight.data.shape[0]}")
except Exception:
pass
return len(self.word_embeddings) - _loaded_pre
text_encoders, tokenizers, hiddensizes = get_text_encoders()
if not all([text_encoders, tokenizers, hiddensizes]):
return 0
for embedding in embeddings:
embedding.vector_sizes = [v.shape[-1] for v in embedding.vec]
if shared.opts.diffusers_convert_embed and 768 in hiddensizes and 1280 in hiddensizes and 1280 not in embedding.vector_sizes and 768 in embedding.vector_sizes:
embedding.vec.append(
convert_embedding(embedding.vec[embedding.vector_sizes.index(768)], text_encoders[hiddensizes.index(768)],
text_encoders[hiddensizes.index(1280)]))
embedding.vector_sizes.append(1280)
if (not all(vs in hiddensizes for vs in embedding.vector_sizes) or # Skip SD2.1 in SD1.5/SDXL/SD3 vis versa
len(embedding.vector_sizes) > len(hiddensizes) or # Skip SDXL/SD3 in SD1.5
(len(embedding.vector_sizes) < len(hiddensizes) and len(embedding.vector_sizes) != 2)): # SD3 no T5
embedding.tokens = []
self.skipped_embeddings[embedding.name] = embedding
if overwrite:
shared.log.info(f"Loading Bundled embeddings: {list(data.keys())}")
for embedding in embeddings:
if embedding.name not in self.skipped_embeddings:
deref_tokenizers(embedding.tokens, tokenizers)
insert_tokens(embeddings, tokenizers)
for embedding in embeddings:
if embedding.name not in self.skipped_embeddings:
try:
insert_vectors(embedding, tokenizers, text_encoders, hiddensizes)
self.register_embedding(embedding, shared.sd_model)
except Exception as e:
shared.log.error(f'Embedding load: name={embedding.name} fn={embedding.filename} {e}')
return
def load_from_file(self, path, filename):
name, ext = os.path.splitext(filename)
@ -259,14 +315,14 @@ class EmbeddingDatabase:
if ext in ['.PNG', '.WEBP', '.JXL', '.AVIF']:
if '.preview' in filename.lower():
return
return None
embed_image = Image.open(path)
if hasattr(embed_image, 'text') and 'sd-ti-embedding' in embed_image.text:
data = embedding_from_b64(embed_image.text['sd-ti-embedding'])
else:
data = extract_image_data_embed(embed_image)
if not data: # if data is None, means this is not an embeding, just a preview image
return
return None
elif ext in ['.BIN', '.PT']:
data = torch.load(path, map_location="cpu")
elif ext in ['.SAFETENSORS']:
@ -284,7 +340,7 @@ class EmbeddingDatabase:
elif type(data) == dict and type(next(iter(data.values()))) == torch.Tensor:
if len(data.keys()) != 1:
self.skipped_embeddings[name] = Embedding(None, name=name, filename=path)
return
return None
emb = next(iter(data.values()))
if len(emb.shape) == 1:
emb = emb.unsqueeze(0)

View File

@ -407,4 +407,6 @@ def update_token_counter(text, steps):
ids = getattr(ids, 'input_ids', [])
token_count = len(ids) - int(has_bos_token) - int(has_eos_token)
max_length = shared.sd_model.tokenizer.model_max_length - int(has_bos_token) - int(has_eos_token)
if max_length is None or max_length < 0 or max_length > 10000:
max_length = 0
return f"<span class='gr-box gr-text-input'>{token_count}/{max_length}</span>"

View File

@ -206,7 +206,7 @@ def uninstall_extension(extension_path, search_text, sort_column):
if len(found) > 0 and os.path.isdir(extension_path):
found = found[0]
try:
shutil.rmtree(found.path, ignore_errors=False, onerror=errorRemoveReadonly)
shutil.rmtree(found.path, ignore_errors=False, onerror=errorRemoveReadonly) # pylint: disable=deprecated-argument
# extensions.extensions = [extension for extension in extensions.extensions if os.path.abspath(found.path) != os.path.abspath(extension_path)]
except Exception as e:
shared.log.warning(f'Extension uninstall failed: {found.path} {e}')

View File

@ -1,3 +1,4 @@
import json
import torch
import transformers
import transformers.dynamic_module_utils
@ -11,6 +12,7 @@ loaded: str = None
MODELS = {
"MS Florence 2 Base": "microsoft/Florence-2-base", # 0.5GB
"MS Florence 2 Large": "microsoft/Florence-2-large", # 1.5GB
"CogFlorence 2 Large": "thwri/CogFlorence-2-Large-Freeze", # 1.6GB
"Moondream 2": "vikhyatk/moondream2", # 3.7GB
"GIT TextCaps Base": "microsoft/git-base-textcaps", # 0.7GB
"GIT VQA Base": "microsoft/git-base-vqav2", # 0.7GB
@ -166,6 +168,11 @@ def florence(question: str, image: Image.Image, repo: str = None):
if 'task' in response:
response = response['task']
if 'answer' in response:
response = response['answer']
if isinstance(response, dict):
response = json.dumps(response)
response = response.replace('\n', '').replace('\r', '').replace('\t', '').strip()
shared.log.debug(f'VQA: task={task} response="{response}"')
return response

View File

@ -24,8 +24,7 @@ from diffusers.loaders import FromSingleFileMixin, LoraLoaderMixin, TextualInver
from diffusers.models import AutoencoderKL
from diffusers.models.attention_processor import (
AttnProcessor2_0,
LoRAAttnProcessor2_0,
LoRAXFormersAttnProcessor,
FusedAttnProcessor2_0,
XFormersAttnProcessor,
)
from diffusers.schedulers import KarrasDiffusionSchedulers
@ -558,8 +557,7 @@ class StableDiffusionXLAdapterPipeline(DiffusionPipeline, FromSingleFileMixin, L
(
AttnProcessor2_0,
XFormersAttnProcessor,
LoRAXFormersAttnProcessor,
LoRAAttnProcessor2_0,
FusedAttnProcessor2_0,
),
)
# if xformers or torch_2_0 is used attention block does not need

View File

@ -30,8 +30,7 @@ from diffusers.models import AutoencoderKL, ControlNetModel
from diffusers.models.attention_processor import (
AttnProcessor2_0,
LoRAAttnProcessor2_0,
LoRAXFormersAttnProcessor,
FusedAttnProcessor2_0,
XFormersAttnProcessor,
)
from diffusers.schedulers import KarrasDiffusionSchedulers
@ -572,8 +571,7 @@ class StableDiffusionXLAdapterControlnetPipeline(DiffusionPipeline, FromSingleFi
(
AttnProcessor2_0,
XFormersAttnProcessor,
LoRAXFormersAttnProcessor,
LoRAAttnProcessor2_0,
FusedAttnProcessor2_0,
),
)
# if xformers or torch_2_0 is used attention block does not need

View File

@ -31,8 +31,7 @@ from diffusers.models import AutoencoderKL, ControlNetModel
from diffusers.models.attention_processor import (
AttnProcessor2_0,
LoRAAttnProcessor2_0,
LoRAXFormersAttnProcessor,
FusedAttnProcessor2_0,
XFormersAttnProcessor,
)
from diffusers.schedulers import KarrasDiffusionSchedulers
@ -571,8 +570,7 @@ class StableDiffusionXLAdapterControlnetI2IPipeline(DiffusionPipeline, FromSingl
(
AttnProcessor2_0,
XFormersAttnProcessor,
LoRAXFormersAttnProcessor,
LoRAAttnProcessor2_0,
FusedAttnProcessor2_0,
),
)
# if xformers or torch_2_0 is used attention block does not need

View File

@ -33,7 +33,7 @@ def install(zluda_path: os.PathLike) -> None:
if os.path.exists(zluda_path):
return
if platform.system() != 'Windows': # TODO
if platform.system() != 'Windows': # Windows-only. (PyTorch should be rebuilt on Linux)
return
urllib.request.urlretrieve(f'https://github.com/lshqqytiger/ZLUDA/releases/download/{RELEASE}/ZLUDA-windows-amd64.zip', '_zluda')

View File

@ -53,7 +53,7 @@ pandas
protobuf==4.25.3
pytorch_lightning==1.9.4
tokenizers==0.19.1
transformers==4.41.2
transformers==4.42.3
urllib3==1.26.19
Pillow==10.3.0
timm==0.9.16

View File

@ -8,7 +8,7 @@ from transformers import CLIPTextModel, CLIPTextModelWithProjection, CLIPTokeniz
from diffusers.image_processor import VaeImageProcessor
from diffusers.loaders import FromSingleFileMixin, LoraLoaderMixin, TextualInversionLoaderMixin
from diffusers.models import AutoencoderKL, UNet2DConditionModel
from diffusers.models.attention_processor import AttnProcessor2_0, LoRAAttnProcessor2_0, LoRAXFormersAttnProcessor, XFormersAttnProcessor
from diffusers.models.attention_processor import AttnProcessor2_0, FusedAttnProcessor2_0, XFormersAttnProcessor
from diffusers.models.lora import adjust_lora_scale_text_encoder
from diffusers.schedulers import KarrasDiffusionSchedulers
from diffusers.utils import is_accelerate_available, is_accelerate_version
@ -484,8 +484,7 @@ class DemoFusionSDXLPipeline(DiffusionPipeline, FromSingleFileMixin, LoraLoaderM
(
AttnProcessor2_0,
XFormersAttnProcessor,
LoRAXFormersAttnProcessor,
LoRAAttnProcessor2_0,
FusedAttnProcessor2_0,
),
)
# if xformers or torch_2_0 is used attention block does not need

View File

@ -22,8 +22,7 @@ from diffusers.loaders import FromSingleFileMixin, LoraLoaderMixin, TextualInver
from diffusers.models import AutoencoderKL, UNet2DConditionModel
from diffusers.models.attention_processor import (
AttnProcessor2_0,
LoRAAttnProcessor2_0,
LoRAXFormersAttnProcessor,
FusedAttnProcessor2_0,
XFormersAttnProcessor,
)
from diffusers.configuration_utils import FrozenDict
@ -631,8 +630,7 @@ class StableDiffusionXLDiffImg2ImgPipeline(DiffusionPipeline, FromSingleFileMixi
(
AttnProcessor2_0,
XFormersAttnProcessor,
LoRAXFormersAttnProcessor,
LoRAAttnProcessor2_0,
FusedAttnProcessor2_0,
),
)
# if xformers or torch_2_0 is used attention block does not need

2
wiki

@ -1 +1 @@
Subproject commit c5c9e89981c8bd35b51823315418a4a4864bb5e1
Subproject commit 68fa996e9231572c244548ef2690adbce018d70b