Merge branch 'dev' into master

pull/1902/head
bmaltais 2024-01-25 13:24:27 -05:00 committed by GitHub
commit 82a9308f5f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
25 changed files with 1167 additions and 778 deletions

126
README.md
View File

@ -47,11 +47,6 @@ The GUI allows you to set the training parameters and generate and run the requi
- [ControlNet-LLLite](#controlnet-lllite)
- [Sample image generation during training](#sample-image-generation-during-training-1)
- [Change History](#change-history)
- [Jan 15, 2024 / 2024/1/15: v0.8.0](#jan-15-2024--2024115-v080)
- [Naming of LoRA](#naming-of-lora)
- [LoRAの名称について](#loraの名称について)
- [Sample image generation during training](#sample-image-generation-during-training-2)
- [Change History](#change-history-1)
## 🦒 Colab
@ -484,78 +479,7 @@ save_file(state_dict, file)
ControlNet-LLLite, a novel method for ControlNet with SDXL, is added. See [documentation](./docs/train_lllite_README.md) for details.
<<<<<<< HEAD
### Sample image generation during training
=======
## Change History
### Jan 15, 2024 / 2024/1/15: v0.8.0
- Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with [Upgrade](#upgrade).
- Some model files (Text Encoder without position_id) based on the latest Transformers can be loaded.
- `torch.compile` is supported (experimental). PR [#1024](https://github.com/kohya-ss/sd-scripts/pull/1024) Thanks to p1atdev!
- This feature works only on Linux or WSL.
- Please specify `--torch_compile` option in each training script.
- You can select the backend with `--dynamo_backend` option. The default is `"inductor"`. `inductor` or `eager` seems to work.
- Please use `--spda` option instead of `--xformers` option.
- PyTorch 2.1 or later is recommended.
- Please see [PR](https://github.com/kohya-ss/sd-scripts/pull/1024) for details.
- The session name for wandb can be specified with `--wandb_run_name` option. PR [#1032](https://github.com/kohya-ss/sd-scripts/pull/1032) Thanks to hopl1t!
- IPEX library is updated. PR [#1030](https://github.com/kohya-ss/sd-scripts/pull/1030) Thanks to Disty0!
- Fixed a bug that Diffusers format model cannot be saved.
- Diffusers、Accelerate、Transformers 等の関連ライブラリを更新しました。[Upgrade](#upgrade) を参照し更新をお願いします。
- 最新の Transformers を前提とした一部のモデルファイルText Encoder が position_id を持たないもの)が読み込めるようになりました。
- `torch.compile` がサポートされしました(実験的)。 PR [#1024](https://github.com/kohya-ss/sd-scripts/pull/1024) p1atdev 氏に感謝します。
- Linux または WSL でのみ動作します。
- 各学習スクリプトで `--torch_compile` オプションを指定してください。
- `--dynamo_backend` オプションで使用される backend を選択できます。デフォルトは `"inductor"` です。 `inductor` または `eager` が動作するようです。
- `--xformers` オプションとは互換性がありません。 代わりに `--spda` オプションを使用してください。
- PyTorch 2.1以降を推奨します。
- 詳細は [PR](https://github.com/kohya-ss/sd-scripts/pull/1024) をご覧ください。
- wandb 保存時のセッション名が各学習スクリプトの `--wandb_run_name` オプションで指定できるようになりました。 PR [#1032](https://github.com/kohya-ss/sd-scripts/pull/1032) hopl1t 氏に感謝します。
- IPEX ライブラリが更新されました。[PR #1030](https://github.com/kohya-ss/sd-scripts/pull/1030) Disty0 氏に感謝します。
- Diffusers 形式でのモデル保存ができなくなっていた不具合を修正しました。
Please read [Releases](https://github.com/kohya-ss/sd-scripts/releases) for recent updates.
最近の更新情報は [Release](https://github.com/kohya-ss/sd-scripts/releases) をご覧ください。
### Naming of LoRA
The LoRA supported by `train_network.py` has been named to avoid confusion. The documentation has been updated. The following are the names of LoRA types in this repository.
1. __LoRA-LierLa__ : (LoRA for __Li__ n __e__ a __r__ __La__ yers)
LoRA for Linear layers and Conv2d layers with 1x1 kernel
2. __LoRA-C3Lier__ : (LoRA for __C__ olutional layers with __3__ x3 Kernel and __Li__ n __e__ a __r__ layers)
In addition to 1., LoRA for Conv2d layers with 3x3 kernel
LoRA-LierLa is the default LoRA type for `train_network.py` (without `conv_dim` network arg). LoRA-LierLa can be used with [our extension](https://github.com/kohya-ss/sd-webui-additional-networks) for AUTOMATIC1111's Web UI, or with the built-in LoRA feature of the Web UI.
To use LoRA-C3Lier with Web UI, please use our extension.
### LoRAの名称について
`train_network.py` がサポートするLoRAについて、混乱を避けるため名前を付けました。ドキュメントは更新済みです。以下は当リポジトリ内の独自の名称です。
1. __LoRA-LierLa__ : (LoRA for __Li__ n __e__ a __r__ __La__ yers、リエラと読みます)
Linear 層およびカーネルサイズ 1x1 の Conv2d 層に適用されるLoRA
2. __LoRA-C3Lier__ : (LoRA for __C__ olutional layers with __3__ x3 Kernel and __Li__ n __e__ a __r__ layers、セリアと読みます)
1.に加え、カーネルサイズ 3x3 の Conv2d 層に適用されるLoRA
LoRA-LierLa は[Web UI向け拡張](https://github.com/kohya-ss/sd-webui-additional-networks)、またはAUTOMATIC1111氏のWeb UIのLoRA機能で使用することができます。
LoRA-C3Lierを使いWeb UIで生成するには拡張を使用してください。
## Sample image generation during training
>>>>>>> 26d35794e3b858e7b5bd20d1e70547c378550b3d
A prompt file might look like this, for example
```
@ -579,6 +503,56 @@ masterpiece, best quality, 1boy, in business suit, standing at street, looking b
## Change History
* 2024/01/02 (v22.6.0)
- Merge sd-scripts v0.8.2 code update
- [Experimental] The `--fp8_base` option is added to the training scripts for LoRA etc. The base model (U-Net, and Text Encoder when training modules for Text Encoder) can be trained with fp8. PR [#1057](https://github.com/kohya-ss/sd-scripts/pull/1057) Thanks to KohakuBlueleaf!
- Please specify `--fp8_base` in `train_network.py` or `sdxl_train_network.py`.
- PyTorch 2.1 or later is required.
- If you use xformers with PyTorch 2.1, please see [xformers repository](https://github.com/facebookresearch/xformers) and install the appropriate version according to your CUDA version.
- The sample image generation during training consumes a lot of memory. It is recommended to turn it off.
- [Experimental] The network multiplier can be specified for each dataset in the training scripts for LoRA etc.
- This is an experimental option and may be removed or changed in the future.
- For example, if you train with state A as `1.0` and state B as `-1.0`, you may be able to generate by switching between state A and B depending on the LoRA application rate.
- Also, if you prepare five states and train them as `0.2`, `0.4`, `0.6`, `0.8`, and `1.0`, you may be able to generate by switching the states smoothly depending on the application rate.
- Please specify `network_multiplier` in `[[datasets]]` in `.toml` file.
- Some options are added to `networks/extract_lora_from_models.py` to reduce the memory usage.
- `--load_precision` option can be used to specify the precision when loading the model. If the model is saved in fp16, you can reduce the memory usage by specifying `--load_precision fp16` without losing precision.
- `--load_original_model_to` option can be used to specify the device to load the original model. `--load_tuned_model_to` option can be used to specify the device to load the derived model. The default is `cpu` for both options, but you can specify `cuda` etc. You can reduce the memory usage by loading one of them to GPU. This option is available only for SDXL.
- The gradient synchronization in LoRA training with multi-GPU is improved. PR [#1064](https://github.com/kohya-ss/sd-scripts/pull/1064) Thanks to KohakuBlueleaf!
- The code for Intel IPEX support is improved. PR [#1060](https://github.com/kohya-ss/sd-scripts/pull/1060) Thanks to akx!
- Fixed a bug in multi-GPU Textual Inversion training.
- `.toml` example for network multiplier
```toml
[general]
[[datasets]]
resolution = 512
batch_size = 8
network_multiplier = 1.0
... subset settings ...
[[datasets]]
resolution = 512
batch_size = 8
network_multiplier = -1.0
... subset settings ...
```
- Merge sd-scripts v0.8.1 code update
- Fixed a bug that the VRAM usage without Text Encoder training is larger than before in training scripts for LoRA etc (`train_network.py`, `sdxl_train_network.py`).
- Text Encoders were not moved to CPU.
- Fixed typos. Thanks to akx! [PR #1053](https://github.com/kohya-ss/sd-scripts/pull/1053)
* 2024/01/15 (v22.5.0)
- Merged sd-scripts v0.8.0 updates
- Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with [Upgrade](#upgrade).

View File

@ -1,141 +1,127 @@
嗨!我把日语 README 文件的主要内容翻译成中文如下:
SDXL已得到支持。sdxl分支已合并到main分支。当更新仓库时,请执行升级步骤。由于accelerate版本也已经升级,请重新运行accelerate config。
## 关于这个仓库
有关SDXL训练的信息,请参见[此处](./README.md#sdxl-training)(英文)。
这个是用于Stable Diffusion模型训练、图像生成和其他脚本的仓库。
## 关于本仓库
[英文版 README](./README.md) <-- 更新信息在这里
用于Stable Diffusion的训练、图像生成和其他脚本的仓库。
GUI和PowerShell脚本等使其更易用的功能在[bmaltais的仓库](https://github.com/bmaltais/kohya_ss)(英语)中提供,一并参考。感谢bmaltais。
[英文README](./README.md) <- 更新信息在这里
[bmaltais的仓库](https://github.com/bmaltais/kohya_ss)中提供了GUI和PowerShell脚本等使其更易于使用的功能(英文),也请一并参阅。衷心感谢bmaltais。
包含以下脚本:
* 支持DreamBooth、U-Net和文本编码器的训练
* fine-tuning的支持
* 支持DreamBooth、U-Net和Text Encoder的训练
* 微调,同上
* 支持LoRA的训练
* 图像生成
* 模型转换(Stable Diffusion ckpt/safetensors 和 Diffusers之间的相互转换)
* 模型转换(在Stable Diffision ckpt/safetensors与Diffusers之间转换)
## 使用方法 (中国用户只需要按照这个安装教程操作)
- 进入kohya_ss文件夹根目录下点击 setup.bat 启动安装程序 *(需要科学上网)
- 根据界面上给出的英文选项:
Kohya_ss GUI setup menu:
## 使用方法
1. Install kohya_ss gui
2. (Optional) Install cudnn files (avoid unless you really need it)
3. (Optional) Install specific bitsandbytes versions
4. (Optional) Manually configure accelerate
5. (Optional) Start Kohya_ss GUI in browser
6. Quit
Enter your choice: 1
1. Torch 1 (legacy, no longer supported. Will be removed in v21.9.x)
2. Torch 2 (recommended)
3. Cancel
Enter your choice: 2
开始安装环境依赖,接着再出来的选项,按照下列选项操作:
```txt
- This machine
- No distributed training
- NO
- NO
- NO
- all
- bf16
```
--------------------------------------------------------------------
这里都选择完毕,即可关闭终端窗口,直接点击 gui.bat或者 kohya中文启动器.bat 即可运行kohya
当仓库内和note.com有相关文章,请参考那里。(未来可能全部移到这里)
* [关于训练,通用篇](./docs/train_README-zh.md): 数据准备和选项等
* [数据集设置](./docs/config_README-ja.md)
* [DreamBooth训练指南](./docs/train_db_README-zh.md)
* [fine-tuning指南](./docs/fine_tune_README_ja.md)
* [LoRA训练指南](./docs/train_network_README-zh.md)
* [文本反转训练指南](./docs/train_ti_README-ja.md)
* [通用部分的训练信息](./docs/train_README-ja.md): 数据准备和选项等
* [数据集设置](./docs/config_README-ja.md)
* [DreamBooth的训练信息](./docs/train_db_README-ja.md)
* [微调指南](./docs/fine_tune_README_ja.md)
* [LoRA的训练信息](./docs/train_network_README-ja.md)
* [Textual Inversion的训练信息](./docs/train_ti_README-ja.md)
* [图像生成脚本](./docs/gen_img_README-ja.md)
* note.com [模型转换脚本](https://note.com/kohya_ss/n/n374f316fe4ad)
## Windows环境所需程序
## Windows上需要的程序
需要Python 3.10.6和Git。
- Python 3.10.6: https://www.python.org/ftp/python/3.10.6/python-3.10.6-amd64.exe
- git: https://git-scm.com/download/win
- git: https://git-scm.com/download/win
如果要在PowerShell中使用venv,需要按以下步骤更改安全设置:
(不仅仅是venv,使脚本可以执行。请注意。)
如果要在PowerShell中使用,请按以下步骤更改安全设置以使用venv。
(不仅仅是venv,这使得脚本的执行成为可能,所以请注意。)
- 以管理员身份打开PowerShell
- 输入"Set-ExecutionPolicy Unrestricted",选择Y
- 关闭管理员PowerShell
- 以管理员身份打开PowerShell。
- 输入“Set-ExecutionPolicy Unrestricted”,并回答Y。
- 关闭管理员PowerShell。
## 在Windows环境下安装
下例中安装的是PyTorch 1.12.1/CUDA 11.6版。如果要使用CUDA 11.3或PyTorch 1.13,请适当修改
脚本已在PyTorch 2.0.1上通过测试。PyTorch 1.12.1也应该可以工作。
(如果只显示"python",请将下例中的"python"改为"py")
下例中,将安装PyTorch 2.0.1/CUDA 11.8版。如果使用CUDA 11.6版或PyTorch 1.12.1,请酌情更改。
在普通(非管理员)PowerShell中依次执行以下命令:
(注意,如果python -m venv~这行只显示“python”,请将其更改为py -m venv~。)
如果使用PowerShell,请打开常规(非管理员)PowerShell并按顺序执行以下操作:
```powershell
git clone https://github.com/kohya-ss/sd-scripts.git
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
python -m venv venv
.\venv\Scripts\activate
.\venv\Scripts\activate
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install --upgrade -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
pip install xformers==0.0.20
accelerate config
```
在命令提示符中:
在命令提示符下也相同。
```bat
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
(注:由于 ``python -m venv venv`` 比 ``python -m venv --system-site-packages venv`` 更安全,已进行更改。如果global python中安装了package,后者会引发各种问题。)
python -m venv venv
.\venv\Scripts\activate
在accelerate config的提示中,请按以下方式回答。(如果以bf16学习,最后一个问题回答bf16。)
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
pip install --upgrade -r requirements.txt
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
※从0.15.0开始,在日语环境中按方向键选择会崩溃(......)。请使用数字键0、1、2......进行选择。
copy /y .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
copy /y .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
copy /y .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
accelerate config
```
accelerate config的问题请按以下回答:
(如果要用bf16训练,最后一个问题选择bf16)
```
- 此计算机
- 不进行分布式训练
- 否
- 否
- 否
- 所有
```txt
- This machine
- No distributed training
- NO
- NO
- NO
- all
- fp16
```
### PyTorch和xformers版本注意事项
※有时可能会出现 ``ValueError: fp16 mixed precision requires a GPU`` 错误。在这种情况下,对第6个问题 ``(What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:``
)回答“0”。(将使用id `0`的GPU)。
在其他版本中训练可能失败。如果没有特殊原因,请使用指定版本。
### 可选:``bitsandbytes``(8位优化器)
`bitsandbytes`现在是可选的。在Linux上,可以通过pip正常安装(推荐0.41.1或更高版本)。
在Windows上,推荐0.35.0或0.41.1。
- `bitsandbytes` 0.35.0: 似乎是稳定的版本。可以使用AdamW8bit,但不能使用其他一些8位优化器和`full_bf16`学习时的选项。
- `bitsandbytes` 0.41.1: 支持 Lion8bit、PagedAdamW8bit、PagedLion8bit。可以使用`full_bf16`。
注意:`bitsandbytes` 从0.35.0到0.41.0之间的版本似乎存在问题。 https://github.com/TimDettmers/bitsandbytes/issues/659
请按以下步骤安装`bitsandbytes`。
### 使用0.35.0
以下是PowerShell的例子。在命令提示符中,请使用copy代替cp。
```powershell
cd sd-scripts
.\venv\Scripts\activate
pip install bitsandbytes==0.35.0
cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
```
### 使用0.41.1
请从[此处](https://github.com/jllllll/bitsandbytes-windows-webui)或其他地方安装jllllll发布的Windows whl文件。
```powershell
python -m pip install bitsandbytes==0.41.1 --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui
```
### 可选:使用Lion8bit

View File

@ -1,11 +1,7 @@
import torch
try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
ipex_init()
except Exception:
pass
from library.ipex_interop import init_ipex
init_ipex()
from typing import Union, List, Optional, Dict, Any, Tuple
from diffusers.models.unet_2d_condition import UNet2DConditionOutput

289
config_README-zh.md Normal file
View File

@ -0,0 +1,289 @@
<documents>
<document index="1">
<source>paste.txt</source>
<document_content>For non-Japanese speakers: this README is provided only in Japanese in the current state. Sorry for inconvenience. We will provide English version in the near future.
`--dataset_config` 可以传递设置文件的说明。
## 概述
通过传递设置文件,用户可以进行更详细的设置。
* 可以设置多个数据集
* 例如,可以为每个数据集设置 `resolution`,并混合它们进行训练。
* 对于支持DreamBooth方法和fine tuning方法的训练方法,可以混合使用DreamBooth方式和fine tuning方式的数据集。
* 可以为每个子集更改设置
* 数据子集是指根据图像目录或元数据拆分的数据集。几个子集构成一个数据集。
* 诸如 `keep_tokens``flip_aug` 之类的选项可以为每个子集设置。另一方面,诸如 `resolution``batch_size` 之类的选项可以为每个数据集设置,属于同一数据集的子集的值将是共享的。具体细节将在下文中说明。
设置文件的格式可以是JSON或TOML。考虑到描述的便利性,建议使用 [TOML](https://toml.io/zh/v1.0.0-rc.2)。下面将以TOML为前提进行说明。
下面是一个用TOML描述的设置文件的示例。
```toml
[general]
shuffle_caption = true
caption_extension = '.txt'
keep_tokens = 1
# 这是一个DreamBooth方式的数据集
[[datasets]]
resolution = 512
batch_size = 4
keep_tokens = 2
[[datasets.subsets]]
image_dir = 'C:\hoge'
class_tokens = 'hoge girl'
# 此子集为keep_tokens = 2(使用属于的数据集的值)
[[datasets.subsets]]
image_dir = 'C:\fuga'
class_tokens = 'fuga boy'
keep_tokens = 3
[[datasets.subsets]]
is_reg = true
image_dir = 'C:\reg'
class_tokens = 'human'
keep_tokens = 1
# 这是一个fine tuning方式的数据集
[[datasets]]
resolution = [768, 768]
batch_size = 2
[[datasets.subsets]]
image_dir = 'C:\piyo'
metadata_file = 'C:\piyo\piyo_md.json'
# 此子集为keep_tokens = 1(使用general的值)
```
在此示例中,有3个目录作为DreamBooth方式的数据集以512x512(批量大小4)进行训练,有1个目录作为fine tuning方式的数据集以768x768(批量大小2)进行训练。
## 数据子集的设置
数据集和子集的相关设置分为几个可以注册的部分。
* `[general]`
* 适用于所有数据集或所有子集的选项指定部分。
* 如果在数据集设置或子集设置中存在同名选项,则会优先使用数据集或子集的设置。
* `[[datasets]]`
* `datasets`是数据集设置注册部分。这是指定适用于每个数据集的选项的部分。
* 如果存在子集设置,则会优先使用子集设置。
* `[[datasets.subsets]]`
* `datasets.subsets`是子集设置的注册部分。这是指定适用于每个子集的选项的部分。
下面是先前示例中图像目录和注册部分的对应关系图。
```
C:\
├─ hoge -> [[datasets.subsets]] No.1 ┐ ┐
├─ fuga -> [[datasets.subsets]] No.2 |-> [[datasets]] No.1 |-> [general]
├─ reg -> [[datasets.subsets]] No.3 ┘ |
└─ piyo -> [[datasets.subsets]] No.4 --> [[datasets]] No.2 ┘
```
每个图像目录对应一个 `[[datasets.subsets]]`。然后1个或多个 `[[datasets.subsets]]` 构成一个 `[[datasets]]`。所有的 `[[datasets]]`、`[[datasets.subsets]]` 属于 `[general]`
根据注册部分,可以指定不同的选项,但如果指定了同名选项,则下级注册部分中的值将被优先使用。建议参考先前 keep_tokens 选项的处理,这将有助于理解。
此外,可指定的选项会根据训练方法支持的技术而变化。
* 仅DreamBooth方式的选项
* 仅fine tuning方式的选项
* 当可以使用caption dropout技术时的选项
在可同时使用DreamBooth方法和fine tuning方法的训练方法中,它们可以一起使用。
需要注意的是,判断是DreamBooth方式还是fine tuning方式是按数据集进行的,因此同一数据集中不能混合存在DreamBooth方式子集和fine tuning方式子集。
也就是说,如果要混合使用这两种方式,则需要设置不同方式的子集属于不同的数据集。
从程序行为上看,如果存在 `metadata_file` 选项,则判断为fine tuning方式的子集。
因此,对于属于同一数据集的子集,要么全部都具有 `metadata_file` 选项,要么全部都不具有 `metadata_file` 选项,这两种情况之一。
下面解释可用的选项。对于与命令行参数名称相同的选项,基本上省略解释。请参阅其他自述文件。
### 所有训练方法通用的选项
不管训练方法如何,都可以指定的选项。
#### 数据集的选项
与数据集设置相关的选项。不能在 `datasets.subsets` 中描述。
| 选项名称 | 设置示例 | `[general]` | `[[datasets]]` |
| ---- | ---- | ---- | ---- |
| `batch_size` | `1` | o | o |
| `bucket_no_upscale` | `true` | o | o |
| `bucket_reso_steps` | `64` | o | o |
| `enable_bucket` | `true` | o | o |
| `max_bucket_reso` | `1024` | o | o |
| `min_bucket_reso` | `128` | o | o |
| `resolution` | `256`, `[512, 512]` | o | o |
* `batch_size`
* 等同于命令行参数 `--train_batch_size`
这些设置对每个数据集是固定的。
也就是说,属于该数据集的子集将共享这些设置。
例如,如果要准备不同分辨率的数据集,可以像上面的示例那样将它们定义为单独的数据集,以便可以为它们单独设置不同的分辨率。
#### 子集的选项
与子集设置相关的选项。
| 选项名称 | 设置示例 | `[general]` | `[[datasets]]` | `[[dataset.subsets]]` |
| ---- | ---- | ---- | ---- | ---- |
| `color_aug` | `false` | o | o | o |
| `face_crop_aug_range` | `[1.0, 3.0]` | o | o | o |
| `flip_aug` | `true` | o | o | o |
| `keep_tokens` | `2` | o | o | o |
| `num_repeats` | `10` | o | o | o |
| `random_crop` | `false` | o | o | o |
| `shuffle_caption` | `true` | o | o | o |
| `caption_prefix` | `“masterpiece, best quality, ”` | o | o | o |
| `caption_suffix` | `“, from side”` | o | o | o |
* `num_repeats`
* 指定子集中图像的重复次数。在fine tuning中相当于 `--dataset_repeats`,但 `num_repeats`
在任何训练方法中都可以指定。
* `caption_prefix`, `caption_suffix`
* 指定在标题前后附加的字符串。包括这些字符串的状态下进行洗牌。当指定 `keep_tokens` 时请注意。
### 仅DreamBooth方式的选项
DreamBooth方式的选项仅存在于子集选项中。
#### 子集的选项
DreamBooth方式子集的设置相关选项。
| 选项名称 | 设置示例 | `[general]` | `[[datasets]]` | `[[dataset.subsets]]` |
| ---- | ---- | ---- | ---- | ---- |
| `image_dir` | `C:\hoge` | - | - | o(必需) |
| `caption_extension` | `".txt"` | o | o | o |
| `class_tokens` | `“sks girl”` | - | - | o |
| `is_reg` | `false` | - | - | o |
首先需要注意的是,`image_dir` 必须指定图像文件直接放置在路径下。与以往的 DreamBooth 方法不同,它与放置在子目录中的图像不兼容。另外,即使使用像 `5_cat` 这样的文件夹名,也不会反映图像的重复次数和类名。如果要单独设置这些,需要使用 `num_repeats``class_tokens` 明确指定。
* `image_dir`
* 指定图像目录路径。必填选项。
* 图像需要直接放置在目录下。
* `class_tokens`
* 设置类标记。
* 仅在不存在与图像对应的 caption 文件时才在训练中使用。是否使用是按图像判断的。如果没有指定 `class_tokens` 且也找不到 caption 文件,将报错。
* `is_reg`
* 指定子集中的图像是否用于正则化。如果未指定,则视为 `false`,即视为非正则化图像。
### 仅fine tuning方式的选项
fine tuning方式的选项仅存在于子集选项中。
#### 子集的选项
fine tuning方式子集的设置相关选项。
| 选项名称 | 设置示例 | `[general]` | `[[datasets]]` | `[[dataset.subsets]]` |
| ---- | ---- | ---- | ---- | ---- |
| `image_dir` | `C:\hoge` | - | - | o |
| `metadata_file` | `'C:\piyo\piyo_md.json'` | - | - | o(必需) |
* `image_dir`
* 指定图像目录路径。与 DreamBooth 方式不同,不是必填的,但建议设置。
* 不需要指定的情况是在生成元数据时指定了 `--full_path`
* 图像需要直接放置在目录下。
* `metadata_file`
* 指定子集使用的元数据文件路径。必填选项。
* 与命令行参数 `--in_json` 等效。
* 由于子集需要按元数据文件指定,因此避免跨目录创建一个元数据文件。强烈建议为每个图像目录准备元数据文件,并将它们作为单独的子集进行注册。
### 当可使用caption dropout技术时可以指定的选项
当可以使用 caption dropout 技术时的选项仅存在于子集选项中。
无论是 DreamBooth 方式还是 fine tuning 方式,只要训练方法支持 caption dropout,就可以指定。
#### 子集的选项
可使用 caption dropout 的子集的设置相关选项。
| 选项名称 | `[general]` | `[[datasets]]` | `[[dataset.subsets]]` |
| ---- | ---- | ---- | ---- |
| `caption_dropout_every_n_epochs` | o | o | o |
| `caption_dropout_rate` | o | o | o |
| `caption_tag_dropout_rate` | o | o | o |
## 存在重复子集时的行为
对于 DreamBooth 方式的数据集,如果其中 `image_dir` 相同的子集将被视为重复。
对于 fine tuning 方式的数据集,如果其中 `metadata_file` 相同的子集将被视为重复。
如果数据集中存在重复的子集,则从第二个开始将被忽略。
另一方面,如果它们属于不同的数据集,则不会被视为重复。
例如,像下面这样在不同的数据集中放置相同的 `image_dir` 的子集,不会被视为重复。
这在想要以不同分辨率训练相同图像的情况下很有用。
```toml
# 即使存在于不同的数据集中,也不会被视为重复,两者都将用于训练
[[datasets]]
resolution = 512
[[datasets.subsets]]
image_dir = 'C:\hoge'
[[datasets]]
resolution = 768
[[datasets.subsets]]
image_dir = 'C:\hoge'
```
## 与命令行参数的组合使用
设置文件中的某些选项与命令行参数选项的作用相同。
如果传递了设置文件,则会忽略下列命令行参数选项。
* `--train_data_dir`
* `--reg_data_dir`
* `--in_json`
如果同时在命令行参数和设置文件中指定了下列命令行参数选项,则设置文件中的值将优先于命令行参数的值。除非另有说明,否则它们是同名选项。
| 命令行参数选项 | 优先的设置文件选项 |
| ---- | ---- |
| `--bucket_no_upscale` | |
| `--bucket_reso_steps` | |
| `--caption_dropout_every_n_epochs` | |
| `--caption_dropout_rate` | |
| `--caption_extension` | |
| `--caption_tag_dropout_rate` | |
| `--color_aug` | |
| `--dataset_repeats` | `num_repeats` |
| `--enable_bucket` | |
| `--face_crop_aug_range` | |
| `--flip_aug` | |
| `--keep_tokens` | |
| `--min_bucket_reso` | |
| `--random_crop`| |
| `--resolution` | |
| `--shuffle_caption` | |
| `--train_batch_size` | `batch_size` |
## 错误指南
目前,正在使用外部库来检查设置文件是否正确,但维护不够充分,错误消息难以理解。
计划在将来改进此问题。
作为次佳方案,列出一些常见错误和解决方法。
如果确信设置正确但仍出现错误,或者完全不明白错误内容,可能是bug,请联系我们。
* `voluptuous.error.MultipleInvalid: required key not provided @ ...`:: 缺少必填选项的错误。可能忘记指定或错误输入了选项名。
* `...`部分显示了错误发生位置。例如,`voluptuous.error.MultipleInvalid: required key not provided @ data['datasets'][0]['subsets'][0]['image_dir']` 意味着在第0个 `datasets` 的第0个 `subsets` 的设置中缺失 `image_dir`
* `voluptuous.error.MultipleInvalid: expected int for dictionary value @ ...`:值的格式错误。输入值的格式可能错误。可以参考本 README 中选项的「设置示例」部分。
* `voluptuous.error.MultipleInvalid: extra keys not allowed @ ...`:存在不支持的选项名。可能错误输入或误输入了选项名。
</document_content>
</document>
</documents>

View File

@ -11,15 +11,10 @@ import toml
from tqdm import tqdm
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
from accelerate.utils import set_seed
from diffusers import DDPMScheduler

View File

@ -66,15 +66,10 @@ import diffusers
import numpy as np
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
import torchvision
from diffusers import (
AutoencoderKL,

File diff suppressed because it is too large Load Diff

24
library/ipex_interop.py Normal file
View File

@ -0,0 +1,24 @@
import torch
def init_ipex():
"""
Try to import `intel_extension_for_pytorch`, and apply
the hijacks using `library.ipex.ipex_init`.
If IPEX is not installed, this function does nothing.
"""
try:
import intel_extension_for_pytorch as ipex # noqa
except ImportError:
return
try:
from library.ipex import ipex_init
if torch.xpu.is_available():
is_initialized, error_message = ipex_init()
if not is_initialized:
print("failed to initialize ipex:", error_message)
except Exception as e:
print("failed to initialize ipex:", e)

View File

@ -5,15 +5,9 @@ import math
import os
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
ipex_init()
except Exception:
pass
init_ipex()
import diffusers
from transformers import CLIPTextModel, CLIPTokenizer, CLIPTextConfig, logging
from diffusers import AutoencoderKL, DDIMScheduler, StableDiffusionPipeline # , UNet2DConditionModel

View File

@ -1262,9 +1262,9 @@ class CrossAttnUpBlock2D(nn.Module):
for attn in self.attentions:
attn.set_use_memory_efficient_attention(xformers, mem_eff)
def set_use_sdpa(self, spda):
def set_use_sdpa(self, sdpa):
for attn in self.attentions:
attn.set_use_sdpa(spda)
attn.set_use_sdpa(sdpa)
def forward(
self,

View File

@ -923,7 +923,11 @@ class SdxlStableDiffusionLongPromptWeightingPipeline:
if up1 is not None:
uncond_pool = up1
dtype = self.unet.dtype
unet_dtype = self.unet.dtype
dtype = unet_dtype
if hasattr(dtype, "itemsize") and dtype.itemsize == 1: # fp8
dtype = torch.float16
self.unet.to(dtype)
# 4. Preprocess image and mask
if isinstance(image, PIL.Image.Image):
@ -1028,6 +1032,7 @@ class SdxlStableDiffusionLongPromptWeightingPipeline:
if is_cancelled_callback is not None and is_cancelled_callback():
return None
self.unet.to(unet_dtype)
return latents
def latents_to_image(self, latents):

View File

@ -558,6 +558,7 @@ class BaseDataset(torch.utils.data.Dataset):
tokenizer: Union[CLIPTokenizer, List[CLIPTokenizer]],
max_token_length: int,
resolution: Optional[Tuple[int, int]],
network_multiplier: float,
debug_dataset: bool,
) -> None:
super().__init__()
@ -567,6 +568,7 @@ class BaseDataset(torch.utils.data.Dataset):
self.max_token_length = max_token_length
# width/height is used when enable_bucket==False
self.width, self.height = (None, None) if resolution is None else resolution
self.network_multiplier = network_multiplier
self.debug_dataset = debug_dataset
self.subsets: List[Union[DreamBoothSubset, FineTuningSubset]] = []
@ -1106,7 +1108,9 @@ class BaseDataset(torch.utils.data.Dataset):
for image_key in bucket[image_index : image_index + bucket_batch_size]:
image_info = self.image_data[image_key]
subset = self.image_to_subset[image_key]
loss_weights.append(self.prior_loss_weight if image_info.is_reg else 1.0)
loss_weights.append(
self.prior_loss_weight if image_info.is_reg else 1.0
) # in case of fine tuning, is_reg is always False
flipped = subset.flip_aug and random.random() < 0.5 # not flipped or flipped with 50% chance
@ -1272,6 +1276,8 @@ class BaseDataset(torch.utils.data.Dataset):
example["target_sizes_hw"] = torch.stack([torch.LongTensor(x) for x in target_sizes_hw])
example["flippeds"] = flippeds
example["network_multipliers"] = torch.FloatTensor([self.network_multiplier] * len(captions))
if self.debug_dataset:
example["image_keys"] = bucket[image_index : image_index + self.batch_size]
return example
@ -1346,15 +1352,16 @@ class DreamBoothDataset(BaseDataset):
tokenizer,
max_token_length,
resolution,
network_multiplier: float,
enable_bucket: bool,
min_bucket_reso: int,
max_bucket_reso: int,
bucket_reso_steps: int,
bucket_no_upscale: bool,
prior_loss_weight: float,
debug_dataset,
debug_dataset: bool,
) -> None:
super().__init__(tokenizer, max_token_length, resolution, debug_dataset)
super().__init__(tokenizer, max_token_length, resolution, network_multiplier, debug_dataset)
assert resolution is not None, f"resolution is required / resolution解像度指定は必須です"
@ -1520,14 +1527,15 @@ class FineTuningDataset(BaseDataset):
tokenizer,
max_token_length,
resolution,
network_multiplier: float,
enable_bucket: bool,
min_bucket_reso: int,
max_bucket_reso: int,
bucket_reso_steps: int,
bucket_no_upscale: bool,
debug_dataset,
debug_dataset: bool,
) -> None:
super().__init__(tokenizer, max_token_length, resolution, debug_dataset)
super().__init__(tokenizer, max_token_length, resolution, network_multiplier, debug_dataset)
self.batch_size = batch_size
@ -1724,14 +1732,15 @@ class ControlNetDataset(BaseDataset):
tokenizer,
max_token_length,
resolution,
network_multiplier: float,
enable_bucket: bool,
min_bucket_reso: int,
max_bucket_reso: int,
bucket_reso_steps: int,
bucket_no_upscale: bool,
debug_dataset,
debug_dataset: float,
) -> None:
super().__init__(tokenizer, max_token_length, resolution, debug_dataset)
super().__init__(tokenizer, max_token_length, resolution, network_multiplier, debug_dataset)
db_subsets = []
for subset in subsets:
@ -2039,6 +2048,8 @@ def debug_dataset(train_dataset, show_input_ids=False):
print(
f'{ik}, size: {train_dataset.image_data[ik].image_size}, loss weight: {lw}, caption: "{cap}", original size: {orgsz}, crop top left: {crptl}, target size: {trgsz}, flipped: {flpdz}'
)
if "network_multipliers" in example:
print(f"network multiplier: {example['network_multipliers'][j]}")
if show_input_ids:
print(f"input ids: {iid}")
@ -2105,8 +2116,8 @@ def glob_images_pathlib(dir_path, recursive):
class MinimalDataset(BaseDataset):
def __init__(self, tokenizer, max_token_length, resolution, debug_dataset=False):
super().__init__(tokenizer, max_token_length, resolution, debug_dataset)
def __init__(self, tokenizer, max_token_length, resolution, network_multiplier, debug_dataset=False):
super().__init__(tokenizer, max_token_length, resolution, network_multiplier, debug_dataset)
self.num_train_images = 0 # update in subclass
self.num_reg_images = 0 # update in subclass
@ -2850,14 +2861,14 @@ def add_training_arguments(parser: argparse.ArgumentParser, support_dreambooth:
)
parser.add_argument("--torch_compile", action="store_true", help="use torch.compile (requires PyTorch 2.0) / torch.compile を使う")
parser.add_argument(
"--dynamo_backend",
type=str,
default="inductor",
"--dynamo_backend",
type=str,
default="inductor",
# available backends:
# https://github.com/huggingface/accelerate/blob/d1abd59114ada8ba673e1214218cb2878c13b82d/src/accelerate/utils/dataclasses.py#L376-L388C5
# https://pytorch.org/docs/stable/torch.compiler.html
choices=["eager", "aot_eager", "inductor", "aot_ts_nvfuser", "nvprims_nvfuser", "cudagraphs", "ofi", "fx2trt", "onnxrt"],
help="dynamo backend type (default is inductor) / dynamoのbackendの種類デフォルトは inductor"
choices=["eager", "aot_eager", "inductor", "aot_ts_nvfuser", "nvprims_nvfuser", "cudagraphs", "ofi", "fx2trt", "onnxrt"],
help="dynamo backend type (default is inductor) / dynamoのbackendの種類デフォルトは inductor",
)
parser.add_argument("--xformers", action="store_true", help="use xformers for CrossAttention / CrossAttentionにxformersを使う")
parser.add_argument(
@ -2904,6 +2915,7 @@ def add_training_arguments(parser: argparse.ArgumentParser, support_dreambooth:
parser.add_argument(
"--full_bf16", action="store_true", help="bf16 training including gradients / 勾配も含めてbf16で学習する"
) # TODO move to SDXL training, because it is not supported by SD1/2
parser.add_argument("--fp8_base", action="store_true", help="use fp8 for base model / base modelにfp8を使う")
parser.add_argument(
"--ddp_timeout",
type=int,
@ -3886,7 +3898,7 @@ def prepare_accelerator(args: argparse.Namespace):
os.environ["WANDB_DIR"] = logging_dir
if args.wandb_api_key is not None:
wandb.login(key=args.wandb_api_key)
# torch.compile のオプション。 NO の場合は torch.compile は使わない
dynamo_backend = "NO"
if args.torch_compile:

View File

@ -43,6 +43,9 @@ def svd(
clamp_quantile=0.99,
min_diff=0.01,
no_metadata=False,
load_precision=None,
load_original_model_to=None,
load_tuned_model_to=None,
):
def str_to_dtype(p):
if p == "float":
@ -57,28 +60,51 @@ def svd(
if v_parameterization is None:
v_parameterization = v2
load_dtype = str_to_dtype(load_precision) if load_precision else None
save_dtype = str_to_dtype(save_precision)
work_device = "cpu"
# load models
if not sdxl:
print(f"loading original SD model : {model_org}")
text_encoder_o, _, unet_o = model_util.load_models_from_stable_diffusion_checkpoint(v2, model_org)
text_encoders_o = [text_encoder_o]
if load_dtype is not None:
text_encoder_o = text_encoder_o.to(load_dtype)
unet_o = unet_o.to(load_dtype)
print(f"loading tuned SD model : {model_tuned}")
text_encoder_t, _, unet_t = model_util.load_models_from_stable_diffusion_checkpoint(v2, model_tuned)
text_encoders_t = [text_encoder_t]
if load_dtype is not None:
text_encoder_t = text_encoder_t.to(load_dtype)
unet_t = unet_t.to(load_dtype)
model_version = model_util.get_model_version_str_for_sd1_sd2(v2, v_parameterization)
else:
device_org = load_original_model_to if load_original_model_to else "cpu"
device_tuned = load_tuned_model_to if load_tuned_model_to else "cpu"
print(f"loading original SDXL model : {model_org}")
text_encoder_o1, text_encoder_o2, _, unet_o, _, _ = sdxl_model_util.load_models_from_sdxl_checkpoint(
sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, model_org, "cpu"
sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, model_org, device_org
)
text_encoders_o = [text_encoder_o1, text_encoder_o2]
if load_dtype is not None:
text_encoder_o1 = text_encoder_o1.to(load_dtype)
text_encoder_o2 = text_encoder_o2.to(load_dtype)
unet_o = unet_o.to(load_dtype)
print(f"loading original SDXL model : {model_tuned}")
text_encoder_t1, text_encoder_t2, _, unet_t, _, _ = sdxl_model_util.load_models_from_sdxl_checkpoint(
sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, model_tuned, "cpu"
sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, model_tuned, device_tuned
)
text_encoders_t = [text_encoder_t1, text_encoder_t2]
if load_dtype is not None:
text_encoder_t1 = text_encoder_t1.to(load_dtype)
text_encoder_t2 = text_encoder_t2.to(load_dtype)
unet_t = unet_t.to(load_dtype)
model_version = sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0
# create LoRA network to extract weights: Use dim (rank) as alpha
@ -100,38 +126,54 @@ def svd(
lora_name = lora_o.lora_name
module_o = lora_o.org_module
module_t = lora_t.org_module
diff = module_t.weight - module_o.weight
diff = module_t.weight.to(work_device) - module_o.weight.to(work_device)
# clear weight to save memory
module_o.weight = None
module_t.weight = None
# Text Encoder might be same
if not text_encoder_different and torch.max(torch.abs(diff)) > min_diff:
text_encoder_different = True
print(f"Text encoder is different. {torch.max(torch.abs(diff))} > {min_diff}")
diff = diff.float()
diffs[lora_name] = diff
# clear target Text Encoder to save memory
for text_encoder in text_encoders_t:
del text_encoder
if not text_encoder_different:
print("Text encoder is same. Extract U-Net only.")
lora_network_o.text_encoder_loras = []
diffs = {}
diffs = {} # clear diffs
for i, (lora_o, lora_t) in enumerate(zip(lora_network_o.unet_loras, lora_network_t.unet_loras)):
lora_name = lora_o.lora_name
module_o = lora_o.org_module
module_t = lora_t.org_module
diff = module_t.weight - module_o.weight
diff = diff.float()
diff = module_t.weight.to(work_device) - module_o.weight.to(work_device)
if args.device:
diff = diff.to(args.device)
# clear weight to save memory
module_o.weight = None
module_t.weight = None
diffs[lora_name] = diff
# clear LoRA network, target U-Net to save memory
del lora_network_o
del lora_network_t
del unet_t
# make LoRA with svd
print("calculating by svd")
lora_weights = {}
with torch.no_grad():
for lora_name, mat in tqdm(list(diffs.items())):
if args.device:
mat = mat.to(args.device)
mat = mat.to(torch.float) # calc by float
# if conv_dim is None, diffs do not include LoRAs for conv2d-3x3
conv2d = len(mat.size()) == 4
kernel_size = None if not conv2d else mat.size()[2:4]
@ -171,8 +213,8 @@ def svd(
U = U.reshape(out_dim, rank, 1, 1)
Vh = Vh.reshape(rank, in_dim, kernel_size[0], kernel_size[1])
U = U.to("cpu").contiguous()
Vh = Vh.to("cpu").contiguous()
U = U.to(work_device, dtype=save_dtype).contiguous()
Vh = Vh.to(work_device, dtype=save_dtype).contiguous()
lora_weights[lora_name] = (U, Vh)
@ -230,6 +272,13 @@ def setup_parser() -> argparse.ArgumentParser:
parser.add_argument(
"--sdxl", action="store_true", help="load Stable Diffusion SDXL base model / Stable Diffusion SDXL baseのモデルを読み込む"
)
parser.add_argument(
"--load_precision",
type=str,
default=None,
choices=[None, "float", "fp16", "bf16"],
help="precision in loading, model default if omitted / 読み込み時に精度を変更して読み込む、省略時はモデルファイルによる"
)
parser.add_argument(
"--save_precision",
type=str,
@ -285,6 +334,18 @@ def setup_parser() -> argparse.ArgumentParser:
help="do not save sai modelspec metadata (minimum ss_metadata for LoRA is saved) / "
+ "sai modelspecのメタデータを保存しないLoRAの最低限のss_metadataは保存される",
)
parser.add_argument(
"--load_original_model_to",
type=str,
default=None,
help="location to load original model, cpu or cuda, cuda:0, etc, default is cpu, only for SDXL / 元モデル読み込み先、cpuまたはcuda、cuda:0など、省略時はcpu、SDXLのみ有効",
)
parser.add_argument(
"--load_tuned_model_to",
type=str,
default=None,
help="location to load tuned model, cpu or cuda, cuda:0, etc, default is cpu, only for SDXL / 派生モデル読み込み先、cpuまたはcuda、cuda:0など、省略時はcpu、SDXLのみ有効",
)
return parser

View File

@ -18,15 +18,10 @@ import diffusers
import numpy as np
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
import torchvision
from diffusers import (
AutoencoderKL,

View File

@ -9,13 +9,11 @@ import random
from einops import repeat
import numpy as np
import torch
try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
ipex_init()
except Exception:
pass
from library.ipex_interop import init_ipex
init_ipex()
from tqdm import tqdm
from transformers import CLIPTokenizer
from diffusers import EulerDiscreteScheduler

View File

@ -11,15 +11,10 @@ import toml
from tqdm import tqdm
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
from accelerate.utils import set_seed
from diffusers import DDPMScheduler
from library import sdxl_model_util

View File

@ -14,13 +14,11 @@ import toml
from tqdm import tqdm
import torch
try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
ipex_init()
except Exception:
pass
from library.ipex_interop import init_ipex
init_ipex()
from torch.nn.parallel import DistributedDataParallel as DDP
from accelerate.utils import set_seed
import accelerate

View File

@ -11,13 +11,11 @@ import toml
from tqdm import tqdm
import torch
try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
ipex_init()
except Exception:
pass
from library.ipex_interop import init_ipex
init_ipex()
from torch.nn.parallel import DistributedDataParallel as DDP
from accelerate.utils import set_seed
from diffusers import DDPMScheduler, ControlNetModel

View File

@ -1,15 +1,10 @@
import argparse
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
from library import sdxl_model_util, sdxl_train_util, train_util
import train_network
@ -95,8 +90,8 @@ class SdxlNetworkTrainer(train_network.NetworkTrainer):
unet.to(org_unet_device)
else:
# Text Encoderから毎回出力を取得するので、GPUに乗せておく
text_encoders[0].to(accelerator.device)
text_encoders[1].to(accelerator.device)
text_encoders[0].to(accelerator.device, dtype=weight_dtype)
text_encoders[1].to(accelerator.device, dtype=weight_dtype)
def get_text_cond(self, args, accelerator, batch, tokenizers, text_encoders, weight_dtype):
if "text_encoder_outputs1_list" not in batch or batch["text_encoder_outputs1_list"] is None:

View File

@ -3,13 +3,9 @@ import os
import regex
import torch
try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
ipex_init()
except Exception:
pass
from library.ipex_interop import init_ipex
init_ipex()
import open_clip
from library import sdxl_model_util, sdxl_train_util, train_util

View File

@ -12,15 +12,10 @@ import toml
from tqdm import tqdm
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
from torch.nn.parallel import DistributedDataParallel as DDP
from accelerate.utils import set_seed
from diffusers import DDPMScheduler, ControlNetModel

View File

@ -12,15 +12,10 @@ import toml
from tqdm import tqdm
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
from accelerate.utils import set_seed
from diffusers import DDPMScheduler

View File

@ -14,15 +14,10 @@ from tqdm import tqdm
import torch
from torch.nn.parallel import DistributedDataParallel as DDP
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
from accelerate.utils import set_seed
from diffusers import DDPMScheduler
from library import model_util
@ -117,7 +112,7 @@ class NetworkTrainer:
self, args, accelerator, unet, vae, tokenizers, text_encoders, data_loader, weight_dtype
):
for t_enc in text_encoders:
t_enc.to(accelerator.device)
t_enc.to(accelerator.device, dtype=weight_dtype)
def get_text_cond(self, args, accelerator, batch, tokenizers, text_encoders, weight_dtype):
input_ids = batch["input_ids"].to(accelerator.device)
@ -278,6 +273,7 @@ class NetworkTrainer:
accelerator.wait_for_everyone()
# 必要ならテキストエンコーダーの出力をキャッシュする: Text Encoderはcpuまたはgpuへ移される
# cache text encoder outputs if needed: Text Encoder is moved to cpu or gpu
self.cache_text_encoder_outputs_if_needed(
args, accelerator, unet, vae, tokenizers, text_encoders, train_dataset_group, weight_dtype
)
@ -309,6 +305,7 @@ class NetworkTrainer:
)
if network is None:
return
network_has_multiplier = hasattr(network, "set_multiplier")
if hasattr(network, "prepare_network"):
network.prepare_network(args)
@ -389,17 +386,33 @@ class NetworkTrainer:
accelerator.print("enable full bf16 training.")
network.to(weight_dtype)
unet_weight_dtype = te_weight_dtype = weight_dtype
# Experimental Feature: Put base model into fp8 to save vram
if args.fp8_base:
assert torch.__version__ >= "2.1.0", "fp8_base requires torch>=2.1.0 / fp8を使う場合はtorch>=2.1.0が必要です。"
assert (
args.mixed_precision != "no"
), "fp8_base requires mixed precision='fp16' or 'bf16' / fp8を使う場合はmixed_precision='fp16'または'bf16'が必要です。"
accelerator.print("enable fp8 training.")
unet_weight_dtype = torch.float8_e4m3fn
te_weight_dtype = torch.float8_e4m3fn
unet.requires_grad_(False)
unet.to(dtype=weight_dtype)
unet.to(dtype=unet_weight_dtype)
for t_enc in text_encoders:
t_enc.requires_grad_(False)
# acceleratorがなんかよろしくやってくれるらしい
# TODO めちゃくちゃ冗長なのでコードを整理する
# in case of cpu, dtype is already set to fp32 because cpu does not support fp8/fp16/bf16
if t_enc.device.type != "cpu":
t_enc.to(dtype=te_weight_dtype)
# nn.Embedding not support FP8
t_enc.text_model.embeddings.to(dtype=(weight_dtype if te_weight_dtype != weight_dtype else te_weight_dtype))
# acceleratorがなんかよろしくやってくれるらしい / accelerator will do something good
if train_unet:
unet = accelerator.prepare(unet)
else:
unet.to(accelerator.device, dtype=weight_dtype) # move to device because unet is not prepared by accelerator
unet.to(accelerator.device, dtype=unet_weight_dtype) # move to device because unet is not prepared by accelerator
if train_text_encoder:
if len(text_encoders) > 1:
text_encoder = text_encoders = [accelerator.prepare(t_enc) for t_enc in text_encoders]
@ -407,8 +420,8 @@ class NetworkTrainer:
text_encoder = accelerator.prepare(text_encoder)
text_encoders = [text_encoder]
else:
for t_enc in text_encoders:
t_enc.to(accelerator.device, dtype=weight_dtype)
pass # if text_encoder is not trained, no need to prepare. and device and dtype are already set
network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(network, optimizer, train_dataloader, lr_scheduler)
if args.gradient_checkpointing:
@ -421,9 +434,6 @@ class NetworkTrainer:
if train_text_encoder:
t_enc.text_model.embeddings.requires_grad_(True)
# set top parameter requires_grad = True for gradient checkpointing works
if not train_text_encoder: # train U-Net only
unet.parameters().__next__().requires_grad_(True)
else:
unet.eval()
for t_enc in text_encoders:
@ -685,7 +695,7 @@ class NetworkTrainer:
if accelerator.is_main_process:
init_kwargs = {}
if args.wandb_run_name:
init_kwargs['wandb'] = {'name': args.wandb_run_name}
init_kwargs["wandb"] = {"name": args.wandb_run_name}
if args.log_tracker_config is not None:
init_kwargs = toml.load(args.log_tracker_config)
accelerator.init_trackers(
@ -754,7 +764,17 @@ class NetworkTrainer:
accelerator.print("NaN found in latents, replacing with zeros")
latents = torch.nan_to_num(latents, 0, out=latents)
latents = latents * self.vae_scale_factor
b_size = latents.shape[0]
# get multiplier for each sample
if network_has_multiplier:
multipliers = batch["network_multipliers"]
# if all multipliers are same, use single multiplier
if torch.all(multipliers == multipliers[0]):
multipliers = multipliers[0].item()
else:
raise NotImplementedError("multipliers for each sample is not supported yet")
# print(f"set multiplier: {multipliers}")
network.set_multiplier(multipliers)
with torch.set_grad_enabled(train_text_encoder), accelerator.autocast():
# Get the text embedding for conditioning
@ -778,10 +798,24 @@ class NetworkTrainer:
args, noise_scheduler, latents
)
# ensure the hidden state will require grad
if args.gradient_checkpointing:
for x in noisy_latents:
x.requires_grad_(True)
for t in text_encoder_conds:
t.requires_grad_(True)
# Predict the noise residual
with accelerator.autocast():
noise_pred = self.call_unet(
args, accelerator, unet, noisy_latents, timesteps, text_encoder_conds, batch, weight_dtype
args,
accelerator,
unet,
noisy_latents.requires_grad_(train_unet),
timesteps,
text_encoder_conds,
batch,
weight_dtype,
)
if args.v_parameterization:
@ -808,10 +842,11 @@ class NetworkTrainer:
loss = loss.mean() # 平均なのでbatch_sizeで割る必要なし
accelerator.backward(loss)
self.all_reduce_network(accelerator, network) # sync DDP grad manually
if accelerator.sync_gradients and args.max_grad_norm != 0.0:
params_to_clip = accelerator.unwrap_model(network).get_trainable_params()
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
if accelerator.sync_gradients:
self.all_reduce_network(accelerator, network) # sync DDP grad manually
if args.max_grad_norm != 0.0:
params_to_clip = accelerator.unwrap_model(network).get_trainable_params()
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
optimizer.step()
lr_scheduler.step()

View File

@ -8,15 +8,10 @@ import toml
from tqdm import tqdm
import torch
try:
import intel_extension_for_pytorch as ipex
from library.ipex_interop import init_ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
init_ipex()
ipex_init()
except Exception:
pass
from accelerate.utils import set_seed
from diffusers import DDPMScheduler
from transformers import CLIPTokenizer
@ -505,7 +500,7 @@ class TextualInversionTrainer:
if accelerator.is_main_process:
init_kwargs = {}
if args.wandb_run_name:
init_kwargs['wandb'] = {'name': args.wandb_run_name}
init_kwargs["wandb"] = {"name": args.wandb_run_name}
if args.log_tracker_config is not None:
init_kwargs = toml.load(args.log_tracker_config)
accelerator.init_trackers(
@ -730,14 +725,13 @@ class TextualInversionTrainer:
is_main_process = accelerator.is_main_process
if is_main_process:
text_encoder = accelerator.unwrap_model(text_encoder)
updated_embs = text_encoder.get_input_embeddings().weight[token_ids].data.detach().clone()
accelerator.end_training()
if args.save_state and is_main_process:
train_util.save_state_on_train_end(args, accelerator)
updated_embs = text_encoder.get_input_embeddings().weight[token_ids].data.detach().clone()
if is_main_process:
ckpt_name = train_util.get_last_ckpt_name(args, "." + args.save_model_as)
save_model(ckpt_name, updated_embs_list, global_step, num_train_epochs, force_sync_upload=True)

View File

@ -8,13 +8,11 @@ from multiprocessing import Value
from tqdm import tqdm
import torch
try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
from library.ipex import ipex_init
ipex_init()
except Exception:
pass
from library.ipex_interop import init_ipex
init_ipex()
from accelerate.utils import set_seed
import diffusers
from diffusers import DDPMScheduler