mirror of https://github.com/vladmandic/automatic
wiki update all for syntax
Co-authored-by: Copilot <copilot@github.com> Signed-off-by: vladmandic <mandic00@live.com>master
parent
91d0f3df45
commit
17d4552d57
|
|
@ -138,12 +138,14 @@ Controls *when* MIOpen is allowed to update the cache:
|
|||
### Commands
|
||||
|
||||
**Linux — temporary (current session only):**
|
||||
|
||||
```bash
|
||||
export MIOPEN_FIND_MODE=3
|
||||
export MIOPEN_FIND_ENFORCE=3
|
||||
```
|
||||
|
||||
**Linux — permanent (add to `~/.bashrc` or `~/.profile`):**
|
||||
|
||||
```bash
|
||||
echo 'export MIOPEN_FIND_MODE=3' >> ~/.bashrc
|
||||
echo 'export MIOPEN_FIND_ENFORCE=3' >> ~/.bashrc
|
||||
|
|
@ -151,12 +153,14 @@ source ~/.bashrc
|
|||
```
|
||||
|
||||
**Windows — Command Prompt (temporary):**
|
||||
|
||||
```cmd
|
||||
set MIOPEN_FIND_MODE=3
|
||||
set MIOPEN_FIND_ENFORCE=3
|
||||
```
|
||||
|
||||
**Windows — PowerShell (temporary):**
|
||||
|
||||
```powershell
|
||||
$env:MIOPEN_FIND_MODE = "3"
|
||||
$env:MIOPEN_FIND_ENFORCE = "3"
|
||||
|
|
@ -170,6 +174,7 @@ $env:MIOPEN_FIND_ENFORCE = "3"
|
|||
4. Add `MIOPEN_FIND_MODE` = `3`, then repeat for `MIOPEN_FIND_ENFORCE` = `3`
|
||||
|
||||
**Windows — Alternatively, edit `webui.bat`** and add the `set` lines before the launch command:
|
||||
|
||||
```bat
|
||||
set MIOPEN_FIND_MODE=3
|
||||
set MIOPEN_FIND_ENFORCE=3
|
||||
|
|
@ -177,6 +182,7 @@ set MIOPEN_FIND_ENFORCE=3
|
|||
```
|
||||
|
||||
Then launch SD.Next as normal. On Windows with ROCm:
|
||||
|
||||
```powershell
|
||||
.\webui.bat --use-rocm
|
||||
```
|
||||
|
|
@ -196,6 +202,7 @@ The cache contains two file types:
|
|||
- **`.ufdb`** (User Find Database) — stores full benchmark results for all tested solvers. Used for analysis.
|
||||
|
||||
**You can override the cache location** with:
|
||||
|
||||
```bash
|
||||
export MIOPEN_USER_DB_PATH=/path/to/your/cache
|
||||
```
|
||||
|
|
|
|||
|
|
@ -101,7 +101,7 @@ export PYTHON=python3.12
|
|||
|
||||
Install ROCm SDK:
|
||||
> [!NOTE]
|
||||
> ROCm SDK is optional. Only required for building flash atten or similar custom kernels.
|
||||
> ROCm SDK is optional. It is only required for building flash attention or similar custom kernels.
|
||||
> ROCm SDK uses 26 GB of disk space.
|
||||
|
||||
```shell
|
||||
|
|
@ -133,7 +133,7 @@ Then run SD.Next with this command:
|
|||
|
||||
## Running SD.Next with Docker for ROCm
|
||||
|
||||
Checkout the [Docker wiki](https://github.com/vladmandic/sdnext/wiki/Docker) if you want to build a custom Docker image.
|
||||
Check out the [Docker wiki](https://github.com/vladmandic/sdnext/wiki/Docker) if you want to build a custom Docker image.
|
||||
|
||||
> [!NOTE]
|
||||
> Installing ROCm on your system is not required when using Docker as Docker has no access to it anyway.
|
||||
|
|
|
|||
18
API.md
18
API.md
|
|
@ -64,14 +64,14 @@ You will get back a JSON dictionary with these keys:
|
|||
- `progress`: a number ranging from 0 (idle / generation not started) to 1 (generation completed)
|
||||
- `eta_relative`: an indication of time remaining relative to the current step of the generation
|
||||
- `images`: A list of base64-encoded images, one or more depending how many you generate in a single batch (if `skip_current_image` is `true`, this list is empty)
|
||||
- `state`: A JSON dictionary with the following keys.
|
||||
- `job_count`: The number of jobs running, if 0, the server is idle; if > 1, the server is running one or more job
|
||||
- `sampling_step`: The current sampling step of the current job
|
||||
- `sampling_steps`: The number of sampling steps in the current job
|
||||
- `skipped`: A boolean value, `true` if the current generation was skipped
|
||||
- `interrupted`: A boolean value, `true` if the current generation was interrupted
|
||||
- `job`: A string with the current job ID
|
||||
- `job_no`: The current job number
|
||||
- `state`: A JSON dictionary with the following keys.
|
||||
- `job_count`: The number of jobs running, if 0, the server is idle; if > 1, the server is running one or more job
|
||||
- `sampling_step`: The current sampling step of the current job
|
||||
- `sampling_steps`: The number of sampling steps in the current job
|
||||
- `skipped`: A boolean value, `true` if the current generation was skipped
|
||||
- `interrupted`: A boolean value, `true` if the current generation was interrupted
|
||||
- `job`: A string with the current job ID
|
||||
- `job_no`: The current job number
|
||||
|
||||
A typical use of this API would be to poll the progress endpoint while a generation is running, collecting information, and stopping while `progress` is 1. You may want also to check that `job_count` is 0, but you must be aware that it might be 0 right before a task starts.
|
||||
|
||||
|
|
@ -85,7 +85,7 @@ To do so with the API, you will first need:
|
|||
- The number of ControlNet units to use (minimum 1)
|
||||
- For each unit, the naeme of the specific ControlNet model to use (you can obtain a list by querying the `/sdapi/v1/controlnets` endpoint)
|
||||
- The override image to use for each unit
|
||||
- For each unit, the name of the specific preprocessor to apply to the image (a list can be obtained by querying the `/sdapi/v1/preprocessors` endpoint), or `None` if no preprocessing is required
|
||||
- For each unit, the name of the specific preprocessor to apply to the image (a list can be obtained by querying the `/sdapi/v1/preprocessors` endpoint), or `None` if no preprocessing is required
|
||||
|
||||
Then you need to prepare your payload to be sent via POST to the `/sdapi/v1/control` endpoint. You can refer to the documentation of the `/sdapi/v1/txt2img` endpoint for most of the parameters like sampler, steps, hires, and so on. Here are the ones that are specific of this one:
|
||||
|
||||
|
|
|
|||
|
|
@ -77,7 +77,7 @@ Basic tests using UI:
|
|||
## Environment
|
||||
|
||||
- Hardware: Intel ARC 770 LE 16GB with R7 5800X3D & MSI B350M Mortar (PCI-E 3.0) & 48 GB 3200 MHz CL18 RAM
|
||||
- OS: Arch Linux with this Docker environment: https://github.com/Disty0/docker-sdnext-ipex
|
||||
- OS: Arch Linux with this Docker environment: <https://github.com/Disty0/docker-sdnext-ipex>
|
||||
- Packages: Torch 2.1.0a0+cxx11.abi with IPEX 2.1.10+xpu and MKL / DPCPP 2024.0.0
|
||||
- Params: model=SD15 | batch-size=1 | batch-count=1 | steps=40 | resolution=512px | sampler=Euler a | CFG 6
|
||||
|
||||
|
|
|
|||
|
|
@ -26,7 +26,7 @@
|
|||
- `api-interrogate.py`: interrogate images using clip
|
||||
|
||||
- `run-benchmark.py`: run benchmark tests
|
||||
-
|
||||
-
|
||||
|
||||
## JavaScript
|
||||
|
||||
|
|
|
|||
|
|
@ -42,13 +42,13 @@ Control images are *required* for ControlNet to work: you can't use it without o
|
|||
|
||||
### Generate control images on the fly
|
||||
|
||||
SD.Next can generate the appropriate control image from any input image you supply using a "preprocessor", which, depending on the model used, turns your input image in way suitable for use in ControlNet. There are as many preprocessors as ControlNet models, so use the model choice section to guide yourself.
|
||||
SD.Next can generate the appropriate control image from any input image you supply using a "preprocessor", which, depending on the model used, turns your input image in way suitable for use in ControlNet. There are as many preprocessors as ControlNet models, so use the model choice section to guide yourself.
|
||||
|
||||
Note that preprocessors are additional models, so using them *consumes more VRAM* (you can choose to unload or move them to the CPU after use in the SD.Next options, if this is a concern). Also, depending on the data they have been trained on, the accuracy of the resulting image vary.
|
||||
|
||||
Canny, Depth, Segmentation and Lineart preprocessors are recommended in case you do not have control images at hand.
|
||||
Canny, Depth, Segmentation and Lineart preprocessors are recommended in case you do not have control images at hand.
|
||||
|
||||
### Use a pre-existing control image
|
||||
### Use a pre-existing control image
|
||||
|
||||
In particular for Openpose, the accuracy of the preprocessor may not be enough, or you know how to generate images yourself, so you can supply a pre-made image. You will not require the additional VRAM for preprocessing, but of course you need to know how to make one.
|
||||
|
||||
|
|
@ -64,15 +64,15 @@ Several examples of the available software that can be used to generate controln
|
|||
> Following step-by-step guide is created using SD.Next ModernUI
|
||||
> Same options exist in StandardUI as well althrough their location in the UI may differ
|
||||
|
||||
First, enable **Control** by clicking on the "Control" checkbox near the preview area.
|
||||
First, enable **Control** by clicking on the "Control" checkbox near the preview area.
|
||||
|
||||

|
||||
|
||||
A new tab will appear, make sure "ControlNet" is selected.
|
||||
A new tab will appear, make sure "ControlNet" is selected.
|
||||
|
||||

|
||||
|
||||
Now you have to decide how many "units" (ControlNet models) to use. For most uses one is sufficient, but for particularly complex scenarios you may need more than one. You can control the number of units by increasing the "Units" number. This guide assumes you use one unit. The workflow is the same the more units you add. Bear in mind that the more units you use, the more VRAM will be used.
|
||||
Now you have to decide how many "units" (ControlNet models) to use. For most uses one is sufficient, but for particularly complex scenarios you may need more than one. You can control the number of units by increasing the "Units" number. This guide assumes you use one unit. The workflow is the same the more units you add. Bear in mind that the more units you use, the more VRAM will be used.
|
||||
|
||||

|
||||
|
||||
|
|
@ -80,7 +80,7 @@ Ensure the unit is enabled by checking if the checkbox under "ControlNet Unit 1"
|
|||
|
||||

|
||||
|
||||
Now you have to select the ControlNet model you want to use. Click on the "reload" icon to load the available ControlNet models and select the one you want from the list. It will be automatically downloaded and made available to SD.Next. You can check the console output or the log for progress information.
|
||||
Now you have to select the ControlNet model you want to use. Click on the "reload" icon to load the available ControlNet models and select the one you want from the list. It will be automatically downloaded and made available to SD.Next. You can check the console output or the log for progress information.
|
||||
|
||||

|
||||
|
||||
|
|
@ -116,4 +116,4 @@ Specific preprocessor settings can be changed in the Control settings section.
|
|||
> [!TIP]
|
||||
> If you have messed up, hit the "Reset" icon and the values will be all reset to default.
|
||||
|
||||
Once everything is set up, write your prompt, set your image parameters, and hit Generate. You will get a preview of your control image and generation will begin.
|
||||
Once everything is set up, write your prompt, set your image parameters, and hit Generate. You will get a preview of your control image and generation will begin.
|
||||
|
|
|
|||
|
|
@ -193,7 +193,7 @@ which have not been updated in several years and compatibility with latest versi
|
|||
|
||||
You may try to manually install DWPose dependencies using following procedure:
|
||||
- Install full [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) as mmengine requires nvidia compiler (nvcc)
|
||||
*note*: CUDA version should match version of CUDA that comes with `torch` in in the SD.Next log:
|
||||
*note*: CUDA version should match version of CUDA that comes with `torch` in in the SD.Next log:
|
||||
> Torch: torch==2.6.0+cu126 torchvision==0.21.0+cu126
|
||||
here `cu126` means CUDA version 12.6
|
||||
- Install build tools for your platform
|
||||
|
|
|
|||
|
|
@ -90,7 +90,7 @@ Models from different families such as Mediapipe are not supported
|
|||
|
||||
List of models is typically presented as drowpdown list where one or more models can be selected
|
||||
Alternatively, you can use button next to list to convert it to a text box which allows manual inputs of models and (optionally) their parameters
|
||||
Text is parsed as list of models which can be separated by comma `,`, newline `\n` or semicolon `;`
|
||||
Text is parsed as list of models which can be separated by comma `,`, newline `\n` or semicolon `;`
|
||||
Each model can have parameters added with colon `:` character
|
||||
|
||||
Example:
|
||||
|
|
|
|||
|
|
@ -237,7 +237,7 @@ Example using [RunPod](https://runpod.io/):
|
|||
2. Create Pod
|
||||
1. Select platform with desired GPU
|
||||
2. Edit template:
|
||||
> Container image: _username_/sdnext-cuda:latest
|
||||
> Container image: *username*/sdnext-cuda:latest
|
||||
> Expose HTTP port: 7860
|
||||
3. Deploy Pod
|
||||
Wait for deployment to complete
|
||||
|
|
|
|||
2
Docs.md
2
Docs.md
|
|
@ -6,7 +6,7 @@ Any standard SD.Next installation already includes cloned copy of wiki repo, sim
|
|||
|
||||
If you want to create separate copy, [SD.Next Wiki](https://github.com/vladmandic/sdnext/wiki) is a sub-repository to main SD.Next Github repo and can be cloned by simply adding `.wiki` to the end of the URL
|
||||
|
||||
> git clone https://github.com/vladmandic/sdnext.wiki
|
||||
> git clone <https://github.com/vladmandic/sdnext.wiki>
|
||||
|
||||
Standard format for all Wiki/Docs documents is [Markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github)
|
||||
|
||||
|
|
|
|||
2
FAQ.md
2
FAQ.md
|
|
@ -23,7 +23,7 @@
|
|||
## How do I use an AMD GPU on Windows?
|
||||
|
||||
*Add the --use-zluda command-line flag when starting the app.*
|
||||
Checkout the [Zluda wiki](ZLUDA) for more info.
|
||||
Check out the [Zluda wiki](ZLUDA) for more info.
|
||||
|
||||
## How can I create large images (e.g., 2048x2048) with limited VRAM?
|
||||
|
||||
|
|
|
|||
|
|
@ -24,7 +24,7 @@ This will automatically create folder structure as expected by sdnext for huggin
|
|||
> [!NOTE]
|
||||
> `hf` command [cli guide](https://huggingface.co/docs/huggingface_hub/guides/cli)
|
||||
|
||||
> [!IMPORTANT]
|
||||
> [!IMPORTANT]
|
||||
> You must set folder to a correct `Diffusers` folder as specified in your SD.Next config
|
||||
|
||||
*Example*:
|
||||
|
|
@ -42,7 +42,7 @@ For example: browser, `git lfs`, `wget`, `curl`, `hf` cli, etc.
|
|||
|
||||
- > `hf download --local-dir /sdnext/models/Diffusers/noobai-XL-Vpred-1.0 Laxhar/noobai-XL-Vpred-1.0`
|
||||
|
||||
> [!IMPORTANT]
|
||||
> [!IMPORTANT]
|
||||
> In this case, you specify full path to the model folder directly and that name is what SD.Next will use to identify the model
|
||||
> Model folder should not be prefixed with `models--<author>--` as per standard HuggingFace folder structure as to indicate to SD.Next that this is a manually downloaded model and should be treated as such
|
||||
> Folder name MUST contain original model name as on HuggingFace to allow SD.Next to match model type and select correct loader for it
|
||||
|
|
|
|||
25
Intel-ARC.md
25
Intel-ARC.md
|
|
@ -12,17 +12,20 @@
|
|||
### Running SD.Next on Windows
|
||||
|
||||
Open the CMD in a folder you want to install SD.Next and install SD.Next from Github with this command:
|
||||
|
||||
```shell
|
||||
git clone https://github.com/vladmandic/sdnext
|
||||
```
|
||||
|
||||
Then enter into the sdnext folder:
|
||||
|
||||
```shell
|
||||
cd sdnext
|
||||
```
|
||||
|
||||
Then run SD.Next with this command:
|
||||
```
|
||||
|
||||
```shell
|
||||
.\webui.bat --use-ipex
|
||||
```
|
||||
|
||||
|
|
@ -32,16 +35,19 @@ Then run SD.Next with this command:
|
|||
## Install Guide for Linux or WSL
|
||||
|
||||
> [!NOTE]
|
||||
> #### Don't use Linux Kernel 6.8 or 6.9 with Linux!
|
||||
>
|
||||
> **Do not use Linux Kernel 6.8 or 6.9 with Linux.**
|
||||
>
|
||||
> <https://github.com/intel/compute-runtime/issues/726>
|
||||
> Update your kernel to at least 6.10 or update to the latest available kernel.
|
||||
>
|
||||
> *Updating kernel is not necessary for WSL as it is using the Windows GPU drivers instead.
|
||||
>
|
||||
> *Updating the kernel is not necessary for WSL because it uses Windows GPU drivers instead.*
|
||||
|
||||
### Install Guide for Ubuntu Linux or Ubuntu with WSL
|
||||
|
||||
Following Ubuntu instructions are for Ubuntu 24.04.
|
||||
Install the base packages:
|
||||
|
||||
```shell
|
||||
sudo apt update && sudo apt install -y software-properties-common build-essential ca-certificates wget gpg git
|
||||
```
|
||||
|
|
@ -69,6 +75,7 @@ sudo pacman -S intel-compute-runtime level-zero-headers level-zero-loader base-d
|
|||
```
|
||||
|
||||
Install Python 3.12 (or anything between 3.10 and 3.13):
|
||||
|
||||
```shell
|
||||
git clone https://aur.archlinux.org/python312.git
|
||||
cd python312
|
||||
|
|
@ -80,16 +87,19 @@ export PYTHON=python3.12
|
|||
### Running SD.Next on Linux
|
||||
|
||||
Open the terminal in a folder you want to install SD.Next and install SD.Next from Github with this command:
|
||||
|
||||
```shell
|
||||
git clone https://github.com/vladmandic/sdnext
|
||||
```
|
||||
|
||||
Then enter into the sdnext folder:
|
||||
|
||||
```shell
|
||||
cd sdnext
|
||||
```
|
||||
|
||||
Then run SD.Next with this command:
|
||||
|
||||
```shell
|
||||
./webui.sh --use-ipex
|
||||
```
|
||||
|
|
@ -98,11 +108,12 @@ Then run SD.Next with this command:
|
|||
> It will install the necessary libraries at the first run so it will take a while depending on your internet.
|
||||
|
||||
## Running SD.Next with Docker
|
||||
Checkout the [Docker wiki](https://github.com/vladmandic/sdnext/wiki/Docker) if you want to build a custom Docker image.
|
||||
|
||||
Check out the [Docker wiki](https://github.com/vladmandic/sdnext/wiki/Docker) if you want to build a custom Docker image.
|
||||
|
||||
Using Docker with a prebuilt image:
|
||||
|
||||
```
|
||||
```shell
|
||||
export SDNEXT_DOCKER_ROOT_FOLDER=~/sdnext
|
||||
sudo docker run -it \
|
||||
--name sdnext-ipex \
|
||||
|
|
@ -132,4 +143,4 @@ sudo docker run -it \
|
|||
- `-1` will force disable dynamic attention slicing even if the GPU doesn't support 64 bit.
|
||||
- `0` will automatically enable or disable dynamic attention based on the GPU.
|
||||
|
||||
- `IPEXRUN`: Specify to launch the webui with ipexrun. Set it to `True` to use ipexrun. The default is unset.
|
||||
- `IPEXRUN`: Specify to launch the webui with ipexrun. Set it to `True` to use ipexrun. The default is unset.
|
||||
|
|
|
|||
|
|
@ -142,8 +142,8 @@ Use *remove* to delete the currently selected object.
|
|||
Kanvas keeps a history of stage edits and supports both toolbar and keyboard undo/redo:
|
||||
- Use toolbar buttons: *Undo* (`⟲`) and *Redo* (`⟳`)
|
||||
- Keyboard:
|
||||
- Undo: `Ctrl+Z` (or `Cmd+Z` on macOS)
|
||||
- Redo: `Ctrl+Shift+Z`, `Cmd+Shift+Z`, or `Ctrl+Y`
|
||||
- Undo: `Ctrl+Z` (or `Cmd+Z` on macOS)
|
||||
- Redo: `Ctrl+Shift+Z`, `Cmd+Shift+Z`, or `Ctrl+Y`
|
||||
|
||||
History tracks structural edits (for example upload, remove, resize, filters, text, outpaint, stage operations),
|
||||
as well as paint and wand operations.
|
||||
|
|
|
|||
2
LoRA.md
2
LoRA.md
|
|
@ -68,7 +68,7 @@ You can combine any number of LoRAs in a single prompt to get the desired output
|
|||
### Component selection
|
||||
|
||||
By default, LoRA is applied to all model components it was trained on.
|
||||
However, you can also specify which component to apply LoRA to by adding `:module=xxx` to the LoRA tag.
|
||||
However, you can also specify which component to apply LoRA to by adding `:module=xxx` to the LoRA tag.
|
||||
|
||||
Example:
|
||||
> `<lora:test_lora:1.0:module=unet>`
|
||||
|
|
|
|||
|
|
@ -10,7 +10,7 @@ Homebrew's Python is there to support other packages. Importantly, Homebrew del
|
|||
|
||||
The solution is to use another way to manage the Python version(s) that you use on your own. I use [asdf](https://asdf-vm.com/), which has a [Python plugin](https://github.com/asdf-community/asdf-python), but there are others if you prefer something else.
|
||||
|
||||
Sources / Further Reading:
|
||||
Sources / Further Reading:
|
||||
- <https://justinmayer.com/posts/homebrew-python-is-not-for-you/>
|
||||
- <https://hackercodex.com/guide/python-development-environment-on-mac-osx/>
|
||||
- <https://github.com/asdf-community/asdf-python>
|
||||
|
|
@ -29,20 +29,20 @@ brew update --force --quiet
|
|||
chmod -R go-w "$(brew --prefix)/share/zsh"
|
||||
```
|
||||
|
||||
2. Install asdf and python build dependencies:
|
||||
1. Install asdf and python build dependencies:
|
||||
|
||||
```zsh
|
||||
brew install asdf openssl readline sqlite3 xz zlib
|
||||
```
|
||||
|
||||
3. Add asdf to `.zshrc` to use it immediately and persistently:
|
||||
1. Add asdf to `.zshrc` to use it immediately and persistently:
|
||||
|
||||
```zsh
|
||||
. $(brew --prefix asdf)/asdf.sh
|
||||
echo -e "\n. $(brew --prefix asdf)/asdf.sh" >> ~/.zshrc
|
||||
```
|
||||
|
||||
4. Add the python asdf plugin:
|
||||
1. Add the python asdf plugin:
|
||||
|
||||
```zsh
|
||||
asdf plugin add python
|
||||
|
|
@ -54,8 +54,8 @@ asdf install python 3.10.14
|
|||
asdf install python latest
|
||||
```
|
||||
|
||||
5. Set the default global version of python:
|
||||
|
||||
1. Set the default global version of python:
|
||||
|
||||
Since you will always want 3.10 for SD.Next, you will want to always specifically use that version.
|
||||
You will probably want to use the command `python` in most contexts, and `python3.10` for version-specific uses.
|
||||
|
||||
|
|
@ -64,7 +64,7 @@ asdf global python 3.12.2
|
|||
# or whatever version you installed
|
||||
```
|
||||
|
||||
6. Run SD.Next using python3.10:
|
||||
1. Run SD.Next using python3.10:
|
||||
|
||||
```zsh
|
||||
export PYTHON=$(which python3.10)
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# Memory Allocator
|
||||
|
||||
Combination of OS default memory allocator `malloc` with Python's default memory allocator is pessimistic when it comes to system memory garbage collection and it will sometimes hold on to allocated memory longer than necessary even if GC is triggered explicitly
|
||||
Combination of OS default memory allocator `malloc` with Python's default memory allocator is pessimistic when it comes to system memory garbage collection and it will sometimes hold on to allocated memory longer than necessary even if GC is triggered explicitly
|
||||
This appears to user as a memory leak as process memory usage grows over time
|
||||
This is especially noticeable when frequently loading/unloading large objects such as models or LoRAs
|
||||
|
||||
|
|
|
|||
|
|
@ -13,7 +13,7 @@ See [models overview](Models) for details on each model, including their archite
|
|||
- [Black Forest Labs FLUX.1](https://blackforestlabs.ai/announcing-black-forest-labs/) Kontext-Dev
|
||||
- [Black Forest Labs FLUX.2](https://bfl.ai/blog/flux-2) Dev and Klein
|
||||
- [lodestones Chroma](https://huggingface.co/lodestones/Chroma) Standard, Detail Calibrated and Flash
|
||||
- [FreePik F-Lite](https://huggingface.co/Freepik/F-Lite) Standard, Texture and 7B
|
||||
- [FreePik F-Lite](https://huggingface.co/Freepik/F-Lite) Standard, Texture and 7B
|
||||
- [NVLabs Sana](https://nvlabs.github.io/Sana/) 1.0 and 1.5
|
||||
- [nVidia Cosmos-Predict2 T2I](https://research.nvidia.com/labs/dir/cosmos-predict2/) 2B and 14B
|
||||
- [AuraFlow](https://huggingface.co/fal/AuraFlow) 0.3 and 0.2
|
||||
|
|
|
|||
|
|
@ -74,4 +74,4 @@ Checks for updates for all checkpoint models.
|
|||
|
||||
## Extract LoRa
|
||||
|
||||
Allows you to tweak a lora to the preferred strength.
|
||||
Allows you to tweak a lora to the preferred strength.
|
||||
|
|
|
|||
2
Notes.md
2
Notes.md
|
|
@ -57,7 +57,7 @@ SD.Next is feature-rich with a focus on performance, flexibility, and user exper
|
|||
|
||||
SD.Next includes many features not found in other WebUIs, such as:
|
||||
- **SDNQ**: State-of-the-Art quantization engine
|
||||
Use pre-quantized or run with quantizaion on-the-fly for up to 4x VRAM reduction with no or minimal quality and performance impact
|
||||
Use pre-quantized models or run quantization on the fly for up to 4x VRAM reduction with minimal quality and performance impact.
|
||||
- **Balanced Offload**: Dynamically balance CPU and GPU memory to run larger models on limited hardware
|
||||
- **Captioning** with 150+ **OpenCLiP** models, **Tagger** with **WaifuDiffusion** and **DeepDanbooru** models, and 25+ built-in **VLMs**
|
||||
- **Image Processing** with full image correction color-grading suite of tools
|
||||
|
|
|
|||
|
|
@ -106,6 +106,7 @@ Finished processing dependencies for nunchaku==0.2.0+torch2.6
|
|||
```
|
||||
|
||||
> python
|
||||
|
||||
```shell
|
||||
>>> import sys
|
||||
>>> import platform
|
||||
|
|
|
|||
|
|
@ -61,7 +61,7 @@ Under development.
|
|||
|
||||
## FAQ
|
||||
|
||||
### I'm getting `OnnxStableDiffusionPipeline.__init__() missing 4 required positional arguments: 'vae_encoder', 'vae_decoder', 'text_encoder', and 'unet'`.
|
||||
### I'm getting `OnnxStableDiffusionPipeline.__init__() missing 4 required positional arguments: 'vae_encoder', 'vae_decoder', 'text_encoder', and 'unet'`
|
||||
|
||||
It's due to the broken model cache which was previously generated by failed conversion or Olive run. Find one in `models/ONNX/cache` and remove it. You can also use `ONNX` tab on UI. (You should enable it on settings to make it show up)
|
||||
|
||||
|
|
|
|||
|
|
@ -14,10 +14,10 @@ Balanced offload works differently than all other offloading methods as it perfo
|
|||
- Recommended for compatible high VRAM GPUs
|
||||
- Faster but requires compatible platform and sufficient VRAM
|
||||
- Balanced offload moves parts of the model depending on the user-specified threshold allowing to control how much VRAM is to be used
|
||||
- High threshold will set the maximum memory usage allowed for the model weights of a single model component
|
||||
- Low threshold will decide when to offload unused models back to RAM
|
||||
- High threshold will set the maximum memory usage allowed for the model weights of a single model component
|
||||
- Low threshold will decide when to offload unused models back to RAM
|
||||
If the VRAM usage is higher than the low threshold, it will offload, otherwise it will do nothing
|
||||
- Configure threshold in *Settings -> Models & Loading -> Balanced offload GPU high / low watermark*
|
||||
- Configure threshold in *Settings -> Models & Loading -> Balanced offload GPU high / low watermark*
|
||||
|
||||
Balanced offloading default behavior is based on detected GPU memory:
|
||||
- **default**: offload=balanced gpu-min=0.2 gpu-max=0.6 gc-threshold=0.7
|
||||
|
|
|
|||
12
OpenVINO.md
12
OpenVINO.md
|
|
@ -22,7 +22,7 @@ It is basically a TensorRT / Olive competitor that works with any hardware.
|
|||
- Install [Git and Python](Installation#install-python-and-git)
|
||||
|
||||
> [!NOTE]
|
||||
> Do not mix OpenVINO with your old install. Treat OpenVINO as a seperate backend.
|
||||
> Do not mix OpenVINO with your old install. Treat OpenVINO as a separate backend.
|
||||
|
||||
### Running SD.Next with OpenVINO
|
||||
|
||||
|
|
@ -56,12 +56,13 @@ Linux:
|
|||
> It will install the necessary libraries at the first run so it will take a while depending on your internet.
|
||||
|
||||
## Running SD.Next with Docker
|
||||
Checkout the [Docker wiki](https://github.com/vladmandic/sdnext/wiki/Docker) if you want to build a custom Docker image.
|
||||
|
||||
Check out the [Docker wiki](https://github.com/vladmandic/sdnext/wiki/Docker) if you want to build a custom Docker image.
|
||||
|
||||
|
||||
Using Docker with a prebuilt image:
|
||||
|
||||
```
|
||||
```shell
|
||||
export SDNEXT_DOCKER_ROOT_FOLDER=~/sdnext
|
||||
sudo docker run -it \
|
||||
--name sdnext-openvino \
|
||||
|
|
@ -77,14 +78,15 @@ sudo docker run -it \
|
|||
|
||||
> [!NOTE]
|
||||
> It will install the necessary libraries at the first run so it will take a while depending on your internet.
|
||||
> Resulting docker image will use 1.1 GB disk space (uncompressed) for the docker image and 2.5 GB for the venv.
|
||||
> Resulting docker image will use 1.1 GB disk space (uncompressed) for the docker image and 2.5 GB for the venv.
|
||||
|
||||
|
||||
## More Info
|
||||
|
||||
### Limitations
|
||||
|
||||
- Same limitations with TensorRT / Olive applies here too.
|
||||
- Compilation takes a few minutes and using LoRas will trigger recompilation.
|
||||
- Compilation takes a few minutes and using LoRas will trigger recompilation.
|
||||
- Attention Slicing and HyperTile will not work.
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -142,7 +142,7 @@ Controls the models and methods used to interpret the text prompt.
|
|||
|
||||
| Parameter | Type | Default | Details / Syntax | Description | UI Label |
|
||||
| :--- | :--- | :--- | :--- | :--- | :--- |
|
||||
| `restore_faces` | `bool` | `False, ` | `Values: False, codeformer, gfpgan` <br> `Syntax: restore_faces: True` | Apply face restoration post-processing using Codeformer or GFPGAN neural network. Automatically detects and enhances faces in the image for better facial details. *Not* the same as detailer, generally inferior, legacy option. | |
|
||||
| `restore_faces` | `bool` | `False,` | `Values: False, codeformer, gfpgan` <br> `Syntax: restore_faces: True` | Apply face restoration post-processing using Codeformer or GFPGAN neural network. Automatically detects and enhances faces in the image for better facial details. *Not* the same as detailer, generally inferior, legacy option. | |
|
||||
| `face_restoration_model` | `str` | `"None"` | `Values: 'None', 'CodeFormer', 'GFPGAN', etc.` | Selects the model to use for the face restoration post-processing step. | |
|
||||
| `code_former_weight` | `float` | `0.2` | `Range: 0.0 to 1.0` | Blending strength for CodeFormer. `0` shows the original, `1` shows the fully restored face. | CodeFormer weight parameter |
|
||||
|
||||
|
|
@ -286,7 +286,7 @@ TODO
|
|||
|
||||
## Notes
|
||||
|
||||
1. **Memory Considerations**: Higher batch_size, steps, and resolution require more GPU memory
|
||||
2. **Quality vs Speed**: More steps generally improve quality but increase generation time
|
||||
3. **Compatibility**: Some parameters may not work with all models or samplers
|
||||
4. **Defaults**: Default values are optimized for SDXL models at 1024x1024 resolution
|
||||
1. **Memory Considerations**: Higher batch_size, steps, and resolution require more GPU memory
|
||||
2. **Quality vs Speed**: More steps generally improve quality but increase generation time
|
||||
3. **Compatibility**: Some parameters may not work with all models or samplers
|
||||
4. **Defaults**: Default values are optimized for SDXL models at 1024x1024 resolution
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ which means that each operation needs to be upcasted to the original precision b
|
|||
|
||||
> [!IMPORTANT]
|
||||
> Quantization considerations
|
||||
|
||||
|
||||
Before deciding which quantization method to use, you need to consider the following:
|
||||
|
||||
- Compatibility with your platform
|
||||
|
|
@ -52,7 +52,7 @@ You can specify quantization for each model component:
|
|||
- **LLM**
|
||||
Applies to VLM models during captioning and interrogate and prompt enhance features
|
||||
- **Control**
|
||||
Applies to ControlNets
|
||||
Applies to ControlNets
|
||||
- **VAE**
|
||||
Applies to VAE, quantization of VAE module is not recommended
|
||||
|
||||
|
|
@ -108,7 +108,7 @@ Limitations:
|
|||
- `bitsandbytes` relies on `triton` packages which are not available on windows unless manually compiled/installed
|
||||
without them, performance is significantly reduced
|
||||
- for nVidia: automatically installed as needed
|
||||
- for AMD/ROCm: [link](https://huggingface.co/docs/bitsandbytes/main/en/installation?backend=AMD+ROCm#amd-gpu)
|
||||
- for AMD/ROCm: [link](https://huggingface.co/docs/bitsandbytes/main/en/installation?backend=AMD+ROCm#amd-gpu)
|
||||
- for Intel/IPEX: [link](https://huggingface.co/docs/bitsandbytes/main/en/installation?backend=Intel+CPU+%2B+GPU#multi-backend)
|
||||
|
||||
### Optimum-Quanto
|
||||
|
|
|
|||
|
|
@ -184,6 +184,7 @@ Same as `Quantization type` but for the Text Encoders.
|
|||
|
||||
|
||||
## Quantized MatMul type
|
||||
|
||||
Overrides the Quantized MatMul type.
|
||||
Default is `auto` which will use INT8 MatMul with INT and UINT types, FP8 MatMul with FP types below 16 bits and Quantized FP16 MatMul with FP types above 16 bits.
|
||||
This option has no effect if `Use Quantized MatMul` options is disabled.
|
||||
|
|
@ -207,6 +208,7 @@ For example, `minimum_6bit` will quantize the specified modules to `int6` if you
|
|||
Default is empty.
|
||||
|
||||
An example dict:
|
||||
|
||||
```json
|
||||
{
|
||||
"int8": ["transformer_blocks.0.img_mod.1.weight", "transformer_blocks.0.*"],
|
||||
|
|
@ -228,18 +230,21 @@ Using Quantized MatMul will disable group sizes if the number of bits is a 6 or
|
|||
|
||||
|
||||
### SVD rank size
|
||||
|
||||
The rank size to use for SVD quantization.
|
||||
Higher values have better quality but with less performance and more memory usage.
|
||||
Default is `32`.
|
||||
|
||||
|
||||
### SVD steps
|
||||
|
||||
The number of steps to use in the lowrank SVD estimation.
|
||||
Higher values have better quality but takes longer to quantize.
|
||||
Default is `8`.
|
||||
|
||||
|
||||
### Dynamic loss threshold
|
||||
|
||||
The target threshold to use with `Dynamic quantization`.
|
||||
SDNQ uses STD normalized MSE loss to calculate its quantization loss and this option will be used as the loss target for it.
|
||||
|
||||
|
|
@ -258,14 +263,15 @@ Some recommended target presets are:
|
|||
2 bit: `1e-1`
|
||||
|
||||
These targets are not set in stone and might require some trial and error depending on the model.
|
||||
Default is `None`
|
||||
Default is `None`
|
||||
|
||||
|
||||
### Use SVD quantization
|
||||
|
||||
Enabling this option will apply SVD quantization on top of SDNQ quantization.
|
||||
SVD has much higher quality but runs slower.
|
||||
SVD also makes Loras usable with 4 bit quantization.
|
||||
More info on SVD quantization: https://arxiv.org/abs/2411.05007
|
||||
More info on SVD quantization: <https://arxiv.org/abs/2411.05007>
|
||||
Disabled by default.
|
||||
|
||||
Note: SVD lowrank used by SDNQ is not deterministic.
|
||||
|
|
@ -273,6 +279,7 @@ Meaning that you will get slightly different quantization results every time.
|
|||
|
||||
|
||||
### Use Dynamic quantization
|
||||
|
||||
Enabling this option will dynamically select a per layer quantization type based on the `Dynamic loss threshold`.
|
||||
And the current `Quantization type` will be used as the minimum allowed quantization type when this option is enabled.
|
||||
This option takes longer to quantize but has much higher quality depending on your settings.
|
||||
|
|
@ -309,7 +316,7 @@ Enabled by default if Triton is available.
|
|||
### Use Quantized MatMul
|
||||
|
||||
Enabling this option will use quantized INT8 or FP8 MatMul instead of BF16 / FP16 when running the model.
|
||||
Has significantly higher performance on GPUs with INT8 or FP8 support.
|
||||
Has significantly higher performance on GPUs with INT8 or FP8 support.
|
||||
Requires Triton. Disabled by default.
|
||||
|
||||
**Supported GPUs**
|
||||
|
|
|
|||
|
|
@ -1,13 +1,13 @@
|
|||
# Schedulers
|
||||
|
||||
- [Introduction](#introduction)
|
||||
- [The Core Concept](#the-core-concept)
|
||||
- [Speed vs Quality Trade-offs](#speed-vs-quality-trade-offs)
|
||||
- [Model Compatibility](#model-compatibility)
|
||||
- [The Core Concept](#the-core-concept)
|
||||
- [Speed vs Quality Trade-offs](#speed-vs-quality-trade-offs)
|
||||
- [Model Compatibility](#model-compatibility)
|
||||
- [Complete List](#list)
|
||||
- [Diffusers Library](#diffusers)
|
||||
- [SDNext Extensions](#sdnext)
|
||||
- [RES4LYF Custom Suite](#res4lyf)
|
||||
- [Diffusers Library](#diffusers)
|
||||
- [SDNext Extensions](#sdnext)
|
||||
- [RES4LYF Custom Suite](#res4lyf)
|
||||
- [Table with Capabilities](#schedulers-capabilities)
|
||||
|
||||
## Introduction
|
||||
|
|
@ -50,6 +50,7 @@ List of schedulers available in SD.Next is broken down into 3 groups:
|
|||
## Diffusers
|
||||
|
||||
### 1. Foundational Gaussian Schedulers
|
||||
|
||||
The original samplers that defined the diffusion era, focusing on iterative denoising through Markovian and non-Markovian processes.
|
||||
- **Foundational Math**: Based on the original Ho et al. and Song et al. formulations.
|
||||
- **Versatility**: Support for inversion and parallel sampling.
|
||||
|
|
@ -61,6 +62,7 @@ The original samplers that defined the diffusion era, focusing on iterative deno
|
|||
| **PNDMScheduler** | Pseudo Numerical Methods for Diffusion Models; uses multi-step Runge-Kutta logic. | 1 |
|
||||
|
||||
### 2. DPM-Solver Family
|
||||
|
||||
A suite of high-order solvers specifically designed to solve the Probability Flow ODE of diffusion models with fewer steps.
|
||||
- **ODE Efficiency**: Purpose-built for the semi-linear structure of diffusion ODEs.
|
||||
- **Karras Support**: Deep integration with Karras-style noise schedules.
|
||||
|
|
@ -73,6 +75,7 @@ A suite of high-order solvers specifically designed to solve the Probability Flo
|
|||
| **EDMDPMSolverMultistepScheduler** | DPM-Solver implementation optimized for the EDM (Elucidating Design Space) framework. | 1 |
|
||||
|
||||
### 3. Euler & Heun (Karras-style) Schedulers
|
||||
|
||||
Classical numerical methods adapted for discrete-time diffusion, often preferred for their predictable and clean convergence.
|
||||
- **Prediction Modes**: Heavy support for `epsilon`, `v_prediction`, and `sample` targets.
|
||||
- **Ancestral Sampling**: Includes `ancestral` variants that add noise at each step.
|
||||
|
|
@ -85,6 +88,7 @@ Classical numerical methods adapted for discrete-time diffusion, often preferred
|
|||
| **LMSDiscreteScheduler** | Linear Multi-step solver using a history of gradients to improve convergence. | 2 (PT, Flax) |
|
||||
|
||||
### 4. Modern Distillation & High-Speed Solvers
|
||||
|
||||
State-of-the-art solvers designed for 1-4 step inference through consistency or distillation techniques.
|
||||
- **Extreme Speed**: Enables near real-time generation.
|
||||
- **Consistency Models**: Based on the Consistency Models (CM) and Latent Consistency (LCM) research.
|
||||
|
|
@ -97,6 +101,7 @@ State-of-the-art solvers designed for 1-4 step inference through consistency or
|
|||
| **ConsistencyDecoderScheduler** | Specialized scheduler for the DALL-E 3 consistency decoder models. | 1 |
|
||||
|
||||
### 5. Flow Matching & Rectified Flow
|
||||
|
||||
Schedulers for the latest generation of "Flow" models (like Flux, SD3, and AuraFlow) which use linear velocity targets.
|
||||
- **Velocity Prediction**: Designed specifically for models trained on flow-matching objectives.
|
||||
- **Linear Trajectories**: Optimal for models that denoise along a straight line.
|
||||
|
|
@ -109,6 +114,7 @@ Schedulers for the latest generation of "Flow" models (like Flux, SD3, and AuraF
|
|||
## SDNext
|
||||
|
||||
### 1. High-Precision & Advanced ODE Solvers
|
||||
|
||||
Solvers designed for superior convergence and precision, often using predictor-corrector or boundary-diffusion frameworks.
|
||||
|
||||
| Scheduler | Description | Variants |
|
||||
|
|
@ -118,6 +124,7 @@ Solvers designed for superior convergence and precision, often using predictor-c
|
|||
| **TDDScheduler** | Time-Dependent Diffusion; experimental sampler that extends DPMSolverSinglestep with special jump logic and TDD-specific training step support. | 3 |
|
||||
|
||||
### 2. Flow Matching Optimized Solvers
|
||||
|
||||
Specialized solvers designed for the latest generation of Flow-based models (e.g., Flux, SD3, AuraFlow), supporting resolution-aware trajectory shifting.
|
||||
|
||||
| Scheduler | Description | Variants |
|
||||
|
|
@ -127,6 +134,7 @@ Specialized solvers designed for the latest generation of Flow-based models (e.g
|
|||
| **FlashFlowMatchEulerDiscreteScheduler** | Optimized Euler-based scheduler for FlashFlow models, featuring resolution-aware dynamic shifting (mu/base/max shift). | 1 |
|
||||
|
||||
### 3. Fast-Step & Distillation Solvers
|
||||
|
||||
Production-grade solvers optimized for extremely low step counts (1-4 steps) while maintaining visual fidelity.
|
||||
|
||||
| Scheduler | Description | Variants |
|
||||
|
|
@ -134,6 +142,7 @@ Production-grade solvers optimized for extremely low step counts (1-4 steps) whi
|
|||
| **UFOGenScheduler** | Diffusion GAN-based sampler implementing both one-step and multi-step sampling trajectories with thresholding support. | 1 |
|
||||
|
||||
### 4. Continuous & Variational Frameworks
|
||||
|
||||
Schedulers based on specific mathematical foundations for variational objectives and continuous time formulations.
|
||||
|
||||
| Scheduler | Description | Variants |
|
||||
|
|
@ -141,6 +150,7 @@ Schedulers based on specific mathematical foundations for variational objectives
|
|||
| **VDMScheduler** | Variational Diffusion Models; supports both discrete and continuous formulations of VDM objectives (linear, cosine, or sigmoid schedules). | 1 |
|
||||
|
||||
## Unique Features
|
||||
|
||||
- **Dynamic Compensation (DC)**: Implements dynamic extrapolation to correct for trajectory drift during the denoising process.
|
||||
- **Brownian Tree Noise Sampler**: Provides significantly more stable convergence in Flow Matching compared to standard random noise.
|
||||
- **Resolution-Aware Shifting**: Automatically adjusts noise schedules based on the image sequence length (total pixels) to optimize quality across resolutions.
|
||||
|
|
@ -149,6 +159,7 @@ Schedulers based on specific mathematical foundations for variational objectives
|
|||
## RES4LYF
|
||||
|
||||
### 1. RES Family (Refined Exponential Solvers)
|
||||
|
||||
The core of the suite, implementing state-of-the-art exponential integration with high-order accuracy and perfect variance tracking.
|
||||
- **High-Order Convergence**: Maintains structural integrity at low step counts.
|
||||
- **Variance Preservation**: Eliminates brightness drift and color shift during generation.
|
||||
|
|
@ -163,6 +174,7 @@ The core of the suite, implementing state-of-the-art exponential integration wit
|
|||
| **RESSinglestepSDEScheduler** | Stochastic variant of the singlestep solver for high-quality, diverse image generation. | 4 |
|
||||
|
||||
### 2. Exponential Time Differencing (ETD) & Lawson
|
||||
|
||||
Advanced integrators that solve the linear part of the Probability Flow ODE exactly, providing superior stability for high-order updates.
|
||||
- **Exact ODE Handling**: Solves the deterministic part of the diffusion process without approximation.
|
||||
- **Superior Stability**: Prevents numerical explosions in high-order (3rd and 4th) sampling steps.
|
||||
|
|
@ -175,6 +187,7 @@ Advanced integrators that solve the linear part of the Probability Flow ODE exac
|
|||
| **DEISMultistepScheduler** | Diffusion Exponential Integrator Sampler utilizing multistep polynomial extrapolation. | 3 |
|
||||
|
||||
### 3. Classical Numerical Integrators
|
||||
|
||||
Standard mathematical integrators optimized and refactored for the specific dynamics of the diffusion reverse process.
|
||||
- **Familiar Tableaus**: Uses proven RK, Radau, and Lobatto logic.
|
||||
- **Modern Refactor**: Fully updated to operate in normalized signal space for VP/VE compatibility.
|
||||
|
|
@ -189,6 +202,7 @@ Standard mathematical integrators optimized and refactored for the specific dyna
|
|||
| **GaussLegendreScheduler** | High-precision symmetric solvers based on Gauss-Legendre quadrature. | 3 |
|
||||
|
||||
### 4. Flow Matching & Physics-Inspired Samplers
|
||||
|
||||
Solvers designed for Flow Matching/Rectified Flow models and physics-based sampling dynamics.
|
||||
- **Non-Euclidean Flows**: Supports Hyperbolic, Spherical, and Lorentzian geometries.
|
||||
- **Stochastic Refinement**: Uses Langevin and tangent-based methods for unique textures.
|
||||
|
|
@ -206,6 +220,7 @@ Solvers designed for Flow Matching/Rectified Flow models and physics-based sampl
|
|||
| **SimpleExponentialScheduler** | Lightweight solver using simple exponential decay for fast, low-step sampling. | 1 |
|
||||
|
||||
### 5. Utilities & Sigma Generators
|
||||
|
||||
Infrastructure for controlling noise profiles and driving the integration process.
|
||||
|
||||
| Scheduler | Description | Variants |
|
||||
|
|
|
|||
20
Scripts.md
20
Scripts.md
|
|
@ -201,16 +201,16 @@ Idicates how much regional prompting is applied to the image generation.
|
|||
|
||||
ResAdapter, a plug-and-play resolution adapter for enabling any diffusion model generate resolution-free images: no additional training, no additional inference and no style transfer.
|
||||
|
||||
| Models | Parameters | Resolution Range | Ratio Range |
|
||||
|:---------------------------------: |:----------: |:-----------------: |:----------------: |
|
||||
| resadapter_v2_sd1.5 | 0.9M | 128 <= x <= 1024 | 0.28 <= r <= 3.5 |
|
||||
| resadapter_v2_sdxl | 0.5M | 256 <= x <= 1536 | 0.28 <= r <= 3.5 |
|
||||
| resadapter_v1_sd1.5 | 0.9M | 128 <= x <= 1024 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sd1.5_extrapolation | 0.9M | 512 <= x <= 1024 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sd1.5_interpolation | 0.9M | 128 <= x <= 512 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sdxl | 0.5M | 256 <= x <= 1536 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sdxl_extrapolation | 0.5M | 1024 <= x <= 1536 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sdxl_interpolation | 0.5M | 256 <= x <= 1024 | 0.5 <= r <= 2 |
|
||||
| Models | Parameters | Resolution Range | Ratio Range |
|
||||
|:---------------------------------: |:----------: |:-----------------: |:----------------: |
|
||||
| resadapter_v2_sd1.5 | 0.9M | 128 <= x <= 1024 | 0.28 <= r <= 3.5 |
|
||||
| resadapter_v2_sdxl | 0.5M | 256 <= x <= 1536 | 0.28 <= r <= 3.5 |
|
||||
| resadapter_v1_sd1.5 | 0.9M | 128 <= x <= 1024 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sd1.5_extrapolation | 0.9M | 512 <= x <= 1024 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sd1.5_interpolation | 0.9M | 128 <= x <= 512 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sdxl | 0.5M | 256 <= x <= 1536 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sdxl_extrapolation | 0.5M | 1024 <= x <= 1536 | 0.5 <= r <= 2 |
|
||||
| resadapter_v1_sdxl_interpolation | 0.5M | 256 <= x <= 1024 | 0.5 <= r <= 2 |
|
||||
|
||||
#### Weight
|
||||
|
||||
|
|
|
|||
|
|
@ -17,7 +17,7 @@ Original repo: <https://github.com/Stability-AI/StableCascade>
|
|||
SD.Next automatically chooses BF16 variation when downloading from networs -> reference
|
||||
since its smaller and can be used with either BF16 or FP32 compute precision
|
||||
|
||||
### UNet models:
|
||||
### UNet models
|
||||
|
||||
1. Put the UNet safetensors in `models/UNet` folder and put the text encoder (if you use one) in there too. Text encoder name should be the UNet Name + _text_encoder
|
||||
2. Load the Stable Cascade base (or a custom decoder) from Huggingface as the main model first, then load the UNet (prior) model as the UNet model from settings.
|
||||
|
|
|
|||
|
|
@ -15,7 +15,7 @@ As they have the same name, we recommend doing them one at a time and then renam
|
|||
|
||||
1. Load your preferred SD 1.5 or SD-XL model that you want to use LCM with
|
||||
2. Load the correct **LCM lora** (**lcm-lora-sdv1-5 or lcm-lora-sdxl**) into your prompt, ex: `<lora:lcm-lora-sdv1-5:1>`
|
||||
4. Set your **sampler** to **LCM**
|
||||
3. Set your **sampler** to **LCM**
|
||||
4. Set number of steps to a low number, e.g. **4-6 steps** for SD 1.5, **2-8 steps** for SD-XL
|
||||
5. Set your **CFG Scale to 1 or 2** (or somewhere between, play with it for best quality)
|
||||
6. Optionally, turning on **Hypertile and/or FreeU** will greatly increase speed and quality of output images
|
||||
|
|
|
|||
6
WSL.md
6
WSL.md
|
|
@ -148,7 +148,7 @@ Start from Windows using WSL shortcut or from command prompt:
|
|||
And then from *bash*:
|
||||
|
||||
> cd
|
||||
> git clone https://github.com/vladmandic/sdnext/ sdnext
|
||||
> git clone <https://github.com/vladmandic/sdnext/> sdnext
|
||||
> cd sdnext
|
||||
> ./webui.sh --debug
|
||||
|
||||
|
|
@ -223,7 +223,7 @@ Optionally install SMB client (`samba`) in Ubuntu, export models folder from Win
|
|||
|
||||
> sudo apt install cifs-utils
|
||||
|
||||
3. Create credentials file `.cred` in your home folder (e.g. `/home/myuser/.cred`) with the following content:
|
||||
1. Create credentials file `.cred` in your home folder (e.g. `/home/myuser/.cred`) with the following content:
|
||||
|
||||
```shell
|
||||
touch .cred
|
||||
|
|
@ -235,7 +235,7 @@ echo "password=yourwindowspassword" >> .cred
|
|||
> [!TIP] How to get internal-loopback IP of your Window host?
|
||||
> `ip route show | grep -i default | awk '{ print $3}'`
|
||||
|
||||
4. Then mount the folder in WSL:
|
||||
1. Then mount the folder in WSL:
|
||||
|
||||
```shell
|
||||
sudo mount -t cifs -o async,noatime,rw,mfsymlinks,iocharset=utf8,uid=1000,vers=3.1.1,cache=loose,nostrictsync,resilienthandles,cred=/home/myuser/.cred //$HOST_IP/Models /mnt/models
|
||||
|
|
|
|||
58
ZLUDA.md
58
ZLUDA.md
|
|
@ -276,7 +276,7 @@ Change `gfx906;gfx1012` to your GPU LLVM Target. If you want to build multiple o
|
|||
|
||||
Upon successful compilation, rocblas.dll will be generated. In this example, the file path is `C:\ROCm\rocBLAS-rocm-5.7.0\build\release\staging\rocblas.dll`. In addition, some Tensile data files will also be produced in `C:\ROCm\rocBLAS-rocm-5.7.0\build\release\Tensile\library`.
|
||||
|
||||
To compile HIP SDK programs that use hipBLAS/rocBLAS, you need to replace the rocblas.dll file in the SDK with the one that you have just made yourself. Then, place `rocblas.dll `into `C:\Program Files\AMD\ROCm\5.7\bin` and the Tensile data files into `C:\Program Files\AMD\ROCm\5.7\bin\rocblas\library`.
|
||||
To compile HIP SDK programs that use hipBLAS/rocBLAS, you need to replace the rocblas.dll file in the SDK with the one that you have just made yourself. Then, place `rocblas.dll`into `C:\Program Files\AMD\ROCm\5.7\bin` and the Tensile data files into `C:\Program Files\AMD\ROCm\5.7\bin\rocblas\library`.
|
||||
|
||||
Your programs should run smooth as silk on the designated graphics card now.
|
||||
|
||||
|
|
@ -286,7 +286,7 @@ This guide will walk you through building rocBLAS using the official ROCm docume
|
|||
|
||||
This guide is for users with AMD GPUs lacking official ROCm/[HIP SDK](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html) support, or those wanting to enable HIP SDK support for hip sdk 5.7 and 6.1.2 on Windows for integrated AMD GPUs(iGPUs)."
|
||||
|
||||
If you already have the libraries, you can skip this section!
|
||||
If you already have the libraries, you can skip this section!
|
||||
|
||||
**Prerequisites:** Ensure the following software is installed on your PC. `python`, `git`, and the `HIP SDK`are
|
||||
essential. The script `rdeps.py` will automatically download any missing dependencies when you run it.
|
||||
|
|
@ -299,7 +299,7 @@ essential. The script `rdeps.py` will automatically download any missing depend
|
|||
* **Git:** (Download from [https://git-scm.com/](https://git-scm.com/))
|
||||
* **HIP SDK:** (Download from [https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html](https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html))
|
||||
|
||||
### Downloading the Source Code:
|
||||
### Downloading the Source Code
|
||||
|
||||
1. **rocBLAS:** Download the latest version ([https://github.com/ROCm/rocBLAS](https://github.com/ROCm/rocBLAS/releases)).
|
||||
* **ROCm 5.7.0:** Download `rocBLAS 3.1.0`
|
||||
|
|
@ -314,20 +314,20 @@ essential. The script `rdeps.py` will automatically download any missing depend
|
|||
* **ROCm 6.1.2:** Download `Tensile 4.40.0`
|
||||
[Tensile 4.40.0 for ROCm 6.1.2](https://github.com/ROCm/Tensile/releases/tag/rocm-6.1.2)
|
||||
|
||||
### Patching Tensile for ROCm (For Advanced Users, Not-a-must-Do)
|
||||
### Patching Tensile for ROCm (For Advanced Users, Not-a-must-Do)
|
||||
|
||||
These steps are necessary for specific configurations of ROCm and may not be required in all cases.
|
||||
If you had a optimized logic for you gpu arche,you may skip this steps.Especily build libs for xnack- features.
|
||||
|
||||
### Determine Your ROCm Version:
|
||||
### Determine Your ROCm Version
|
||||
|
||||
* **ROCm 5.7.0:** Follow the instructions for "**For hip 5.7**" below.
|
||||
* **ROCm 6.1.2:** Follow the instructions for "**For hip 6.1.2**" below.
|
||||
|
||||
|
||||
### Patches for Tensile:
|
||||
### Patches for Tensile
|
||||
|
||||
#### For hip 5.7.0:
|
||||
#### For hip 5.7.0
|
||||
|
||||
1. Download
|
||||
[Tensile-fix-fallback-arch-build.patch](https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/blob/main/Tensile-fix-fallback-arch-build.patch).
|
||||
|
|
@ -337,13 +337,15 @@ If you had a optimized logic for you gpu arche,you may skip this steps.Especily
|
|||
3. Open a terminal within the `Tensile` folder.
|
||||
|
||||
4. Apply the patch:
|
||||
|
||||
```bash
|
||||
git apply Tensile-fix-fallback-arch-build.patch
|
||||
```
|
||||
|
||||
* If nothing appears after applying, it's patched successfully. Otherwise, you may need to manually add the
|
||||
patch content to `TensileCreateLibrary.py`, you may also skip this steps if you have optimized logic available.
|
||||
|
||||
#### For hip 6.1.2:
|
||||
#### For hip 6.1.2
|
||||
|
||||
1. Download
|
||||
[Tensile-fix-fallback-arch-build-hip-6.1.2.patch](https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/blob/main/Tensile-fix-fallback-arch-build-hip-6.1.2.patch).
|
||||
|
|
@ -353,6 +355,7 @@ patch content to `TensileCreateLibrary.py`, you may also skip this steps if you
|
|||
3. Open a terminal within the `Tensile` folder.
|
||||
|
||||
4. Apply the patch:
|
||||
|
||||
```bash
|
||||
git apply Tensile-fix-fallback-arch-build-hip-6.1.2.patch
|
||||
```
|
||||
|
|
@ -360,11 +363,14 @@ patch content to `TensileCreateLibrary.py`, you may also skip this steps if you
|
|||
* If nothing appears after applying, it's patched successfully. Otherwise, you may need to manually add the
|
||||
patch content to `TensileCreateLibrary.py`.
|
||||
|
||||
### ( Skip this step for ROCm 6.1.2 )
|
||||
Note: edit the line 41 in file rdeps.py for rocBLAS ,The old repo has an outdated vckpg, which will lead to fail build.update the vcpkg ,by replace with the following line
|
||||
### ( Skip this step for ROCm 6.1.2 )
|
||||
|
||||
Note: edit the line 41 in file rdeps.py for rocBLAS ,The old repo has an outdated vckpg, which will lead to fail build.update the vcpkg ,by replace with the following line
|
||||
|
||||
```
|
||||
git clone -b 2024.02.14 https://github.com/microsoft/vcpkg
|
||||
```
|
||||
|
||||
to udpate the vckpg version.
|
||||
|
||||
* **vcpkg Version:** If your vcpkg version was built after April 2023, replace `CMakeLists.txt` in
|
||||
|
|
@ -373,22 +379,27 @@ to udpate the vckpg version.
|
|||
folder (e.g., `rocm`).
|
||||
* For more information, see the [official ROCm
|
||||
guide](https://rocmdocs.amd.com/projects/rocBLAS/en/latest/install/Windows_Install_Guide.html#windows-install).
|
||||
### Build with rdeps and rmake:
|
||||
|
||||
### Build with rdeps and rmake
|
||||
|
||||
1. Navigate to the `rocm/rocBLAS` directory in your terminal.
|
||||
2. Run `python rdeps.py`. This script will configure your environment and download necessary packages.
|
||||
```
|
||||
python rdeps.py
|
||||
|
||||
```
|
||||
python rdeps.py
|
||||
```
|
||||
|
||||
( using `install.sh -d` in linux , if you encounter any mistakes , try to google and fix with it or try it again )
|
||||
after done . try next step
|
||||
|
||||
3. After `rdeps.py` completes, run
|
||||
1. After `rdeps.py` completes, run
|
||||
|
||||
```
|
||||
|
||||
python rmake.py -a "gfx1101;gfx1103" --lazy-library-loading--no-merge-architectures -t "C:\rocm\Tensile-rocm-5.7.0"
|
||||
|
||||
```
|
||||
|
||||
(adjust paths and architectures as needed).
|
||||
|
||||
**Important:**
|
||||
|
|
@ -403,9 +414,9 @@ After successfully building rocBLAS from source, you need to replace the default
|
|||
version for your HIP programs to utilize it. Here's how:
|
||||
|
||||
1. **Locate your Compiled Files:**
|
||||
* `rocblas.dll`: Located in `C:\ROCM\rocBLAS-rocm-5.7.0\build\release\staging\` (or a similar path based on
|
||||
* `rocblas.dll`: Located in `C:\ROCM\rocBLAS-rocm-5.7.0\build\release\staging\` (or a similar path based on
|
||||
your build location).
|
||||
* Tensile data files: Found within `C:\ROCM\rocBLAS-rocm-5.7.0\build\release\Tensile\library\` (adjust the
|
||||
* Tensile data files: Found within `C:\ROCM\rocBLAS-rocm-5.7.0\build\release\Tensile\library\` (adjust the
|
||||
path if needed).
|
||||
|
||||
2. **Replace the Default rocBLAS:**
|
||||
|
|
@ -431,8 +442,9 @@ Tensile data files.
|
|||
* Make sure the ROCm version in the `bin` directory matches the version of rocBLAS you built.
|
||||
|
||||
### Note: Editing Tensile/Common.py
|
||||
|
||||
This file contains general parameters used by the Tensile library. To ensure compatibility with your GPU, you need
|
||||
to update two specific settings.Update the value of `" globalParameters["SupportedISA"]" `and `"CACHED_ASM_CAPS"` with your`gpu ISA and info` .and choose the simliar gpu achetecture. eg `RND2 for gfx1031 ,RND2 for gfx1032`, then copy and put below with your gpu number and others availble gpu data .For hip sdk 6.1.2 , `CACHED_ASM_CAPS` info move to tensile/AsmCaps.py, also edit architectureMap from line299 to 310 , add your arch infomation .map your arch information to correct logic file .however , some optimized logic don't exsit in the offoicial release. then we need to creat it.otherwilse ,it will creat a fallback no optimized rocblas and library.
|
||||
to update two specific settings.Update the value of `" globalParameters["SupportedISA"]"`and `"CACHED_ASM_CAPS"` with your`gpu ISA and info` .and choose the simliar gpu achetecture. eg `RND2 for gfx1031 ,RND2 for gfx1032`, then copy and put below with your gpu number and others availble gpu data .For hip sdk 6.1.2 , `CACHED_ASM_CAPS` info move to tensile/AsmCaps.py, also edit architectureMap from line299 to 310 , add your arch infomation .map your arch information to correct logic file .however , some optimized logic don't exsit in the offoicial release. then we need to creat it.otherwilse ,it will creat a fallback no optimized rocblas and library.
|
||||
|
||||
**Here's a step-by-step guide:**
|
||||
|
||||
|
|
@ -480,15 +492,15 @@ your desired GPU architecture (e.g., `gfx1031`).
|
|||
|
||||
**Important Files to Modify:**
|
||||
|
||||
* **Tensile:** Within the Tensile folder, make changes to:
|
||||
* `CMakeLists.txt`: This file configures the build process and needs adjustments for new architectures.
|
||||
* `AMDGPU.hpp`: Defines the architecture-specific interface.
|
||||
* `PlaceholderLibrary.hpp`, `Predicaters.hpp`, `OclUtiles.cpp`: These files contain code related to specific
|
||||
* **Tensile:** Within the Tensile folder, make changes to:
|
||||
* `CMakeLists.txt`: This file configures the build process and needs adjustments for new architectures.
|
||||
* `AMDGPU.hpp`: Defines the architecture-specific interface.
|
||||
* `PlaceholderLibrary.hpp`, `Predicaters.hpp`, `OclUtiles.cpp`: These files contain code related to specific
|
||||
functionalities, which might require modifications for your target GPU.
|
||||
|
||||
* **rocBLAS:** In the rocBLAS folder:
|
||||
* `CMakeLists.txt`: Similar to Tensile, update this file for your new architecture.
|
||||
* `handle.cpp`, `tensile_host.cpp`, `handle.hpp`: These files are likely involved in communication and
|
||||
* `CMakeLists.txt`: Similar to Tensile, update this file for your new architecture.
|
||||
* `handle.cpp`, `tensile_host.cpp`, `handle.hpp`: These files are likely involved in communication and
|
||||
interactions between rocBLAS and the GPU.
|
||||
|
||||
**Caution:**
|
||||
|
|
|
|||
Loading…
Reference in New Issue