mirror of https://github.com/bmaltais/kohya_ss
Fix Docker setup issues and rewrite README.md
Identified and documented 12+ critical issues with Docker setup: - Missing docker.md documentation (broken link in README) - Duplicate volume mounts in docker-compose.yaml - Hard-coded UID causing permission issues - No health checks or restart policies - Oversized TensorBoard image - Missing resource limits and .env documentation - Platform support ambiguities Created comprehensive Docker documentation (docs/docker.md): - Complete prerequisites for Windows/Linux/macOS - Detailed setup and troubleshooting guides - Configuration examples and best practices - Advanced usage patterns (multi-GPU, resource limits) - Security and performance tips Rewrote README.md with improved structure: - Better organization with clear navigation - Fixed broken docker.md link - Enhanced Docker installation section - Improved quick start guide with comparison table - Expanded troubleshooting section - Better formatting and readability - Added quick reference sectionpull/3474/head
parent
4161d1d80a
commit
5b3d3ab806
608
README.md
608
README.md
|
|
@ -5,194 +5,275 @@
|
|||
[](LICENSE.md)
|
||||
[](https://github.com/bmaltais/kohya_ss/issues)
|
||||
|
||||
This is a GUI and CLI for training diffusion models.
|
||||
A comprehensive GUI and CLI toolkit for training Stable Diffusion models, LoRAs, and other diffusion model variants.
|
||||
|
||||
This project provides a user-friendly Gradio-based Graphical User Interface (GUI) for [Kohya's Stable Diffusion training scripts](https://github.com/kohya-ss/sd-scripts).
|
||||
Stable Diffusion training empowers users to customize image generation models by fine-tuning existing models, creating unique artistic styles,
|
||||
and training specialized models like LoRA (Low-Rank Adaptation).
|
||||
## Overview
|
||||
|
||||
Key features of this GUI include:
|
||||
* Easy-to-use interface for setting a wide range of training parameters.
|
||||
* Automatic generation of the command-line interface (CLI) commands required to run the training scripts.
|
||||
* Support for various training methods, including LoRA, Dreambooth, fine-tuning, and SDXL training.
|
||||
This project provides a user-friendly **Gradio-based interface** for [Kohya's Stable Diffusion training scripts](https://github.com/kohya-ss/sd-scripts), making it accessible for both beginners and advanced users to fine-tune diffusion models.
|
||||
|
||||
Support for Linux and macOS is also available. While Linux support is actively maintained through community contributions, macOS compatibility may vary.
|
||||
**Key Features:**
|
||||
- **Easy-to-use GUI** for configuring training parameters
|
||||
- **Automatic CLI command generation** for advanced users
|
||||
- **Multiple training methods**: LoRA, Dreambooth, Fine-tuning, SDXL, Flux.1, SD3
|
||||
- **Cross-platform support**: Windows, Linux, macOS
|
||||
- **Flexible deployment**: Local installation, Docker, or cloud-based
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Quick Start](#quick-start)
|
||||
- [Installation Options](#installation-options)
|
||||
- [Local Installation Overview](#local-installation-overview)
|
||||
- [`uv` vs `pip` – What's the Difference?](#uv-vs-pip--whats-the-difference)
|
||||
- [Cloud Installation Overview](#cloud-installation-overview)
|
||||
- [Colab](#-colab)
|
||||
- [Runpod, Novita, Docker](#runpod-novita-docker)
|
||||
- [Custom Path Defaults](#custom-path-defaults)
|
||||
- [LoRA](#lora)
|
||||
- [Sample image generation during training](#sample-image-generation-during-training)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [Page File Limit](#page-file-limit)
|
||||
- [No module called tkinter](#no-module-called-tkinter)
|
||||
- [LORA Training on TESLA V100 - GPU Utilization Issue](#lora-training-on-tesla-v100---gpu-utilization-issue)
|
||||
- [SDXL training](#sdxl-training)
|
||||
- [Masked loss](#masked-loss)
|
||||
- [Guides](#guides)
|
||||
- [Using Accelerate Lora Tab to Select GPU ID](#using-accelerate-lora-tab-to-select-gpu-id)
|
||||
- [Starting Accelerate in GUI](#starting-accelerate-in-gui)
|
||||
- [Running Multiple Instances (linux)](#running-multiple-instances-linux)
|
||||
- [Monitoring Processes](#monitoring-processes)
|
||||
- [Interesting Forks](#interesting-forks)
|
||||
- [Local Installation](#local-installation)
|
||||
- [Docker Installation](#docker-installation)
|
||||
- [Cloud-Based Solutions](#cloud-based-solutions)
|
||||
- [Configuration](#configuration)
|
||||
- [Training Features](#training-features)
|
||||
- [LoRA Training](#lora-training)
|
||||
- [SDXL Training](#sdxl-training)
|
||||
- [Sample Image Generation](#sample-image-generation)
|
||||
- [Masked Loss](#masked-loss)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [Advanced Usage](#advanced-usage)
|
||||
- [Contributing](#contributing)
|
||||
- [License](#license)
|
||||
- [Change History](#change-history)
|
||||
- [v25.0.3](#v2503)
|
||||
- [v25.0.2](#v2502)
|
||||
- [v25.0.1](#v2501)
|
||||
- [v25.0.0](#v2500)
|
||||
|
||||
## Quick Start
|
||||
|
||||
Choose your preferred installation method:
|
||||
|
||||
| Method | Best For | Time to Setup |
|
||||
|--------|----------|---------------|
|
||||
| **Docker** | Quick start, consistency across systems | 5-10 minutes |
|
||||
| **uv (Recommended)** | Latest features, faster dependency management | 10-15 minutes |
|
||||
| **pip** | Traditional Python users, easier debugging | 15-20 minutes |
|
||||
| **Cloud (Colab)** | No local GPU, testing, or limited resources | 2-5 minutes |
|
||||
|
||||
**Fastest way to get started:**
|
||||
|
||||
```bash
|
||||
# Docker (if you have Docker + NVIDIA GPU)
|
||||
git clone --recursive https://github.com/bmaltais/kohya_ss.git
|
||||
cd kohya_ss
|
||||
docker compose up -d
|
||||
# Access GUI at http://localhost:7860
|
||||
|
||||
# OR Local installation with uv (Linux/Windows)
|
||||
git clone https://github.com/bmaltais/kohya_ss.git
|
||||
cd kohya_ss
|
||||
# See installation guides below for platform-specific steps
|
||||
```
|
||||
|
||||
## Installation Options
|
||||
|
||||
You can run `kohya_ss` either **locally on your machine** or via **cloud-based solutions** like Colab or Runpod.
|
||||
### Local Installation
|
||||
|
||||
- If you have a GPU-equipped PC and want full control: install it locally using `uv` or `pip`.
|
||||
- If your system doesn’t meet requirements or you prefer a browser-based setup: use Colab or a paid GPU provider like Runpod or Novita.
|
||||
- If you are a developer or DevOps user, Docker is also supported.
|
||||
Install `kohya_ss` directly on your machine for maximum flexibility and performance.
|
||||
|
||||
---
|
||||
#### System Requirements
|
||||
|
||||
### Local Installation Overview
|
||||
- **GPU**: NVIDIA GPU with CUDA support (8GB+ VRAM recommended)
|
||||
- **RAM**: 16GB minimum (32GB recommended for SDXL)
|
||||
- **Storage**: 20GB+ free space
|
||||
- **Python**: 3.10 or 3.11 (3.12 not yet supported)
|
||||
|
||||
You can install `kohya_ss` locally using either the `uv` or `pip` method. Choose one depending on your platform and preferences:
|
||||
#### Installation Methods
|
||||
|
||||
| Platform | Recommended Method | Instructions |
|
||||
|--------------|----------------|---------------------------------------------|
|
||||
| Linux | `uv` | [uv_linux.md](./docs/Installation/uv_linux.md) |
|
||||
| Linux or Mac | `pip` | [pip_linux.md](./docs/Installation/pip_linux.md) |
|
||||
| Windows | `uv` | [uv_windows.md](./docs/Installation/uv_windows.md) |
|
||||
| Windows | `pip` | [pip_windows.md](./docs/Installation/pip_windows.md) |
|
||||
| Platform | Recommended | Alternative | Installation Guide |
|
||||
|--------------|-------------|-------------|-------------------|
|
||||
| **Windows** | uv | pip | [uv_windows.md](./docs/Installation/uv_windows.md) / [pip_windows.md](./docs/Installation/pip_windows.md) |
|
||||
| **Linux** | uv | pip | [uv_linux.md](./docs/Installation/uv_linux.md) / [pip_linux.md](./docs/Installation/pip_linux.md) |
|
||||
| **macOS** | pip | uv | [pip_linux.md](./docs/Installation/pip_linux.md) |
|
||||
|
||||
#### `uv` vs `pip` – What's the Difference?
|
||||
#### `uv` vs `pip` - Which Should I Choose?
|
||||
|
||||
- `uv` is faster and isolates dependencies more cleanly, ideal if you want minimal setup hassle.
|
||||
- `pip` is more traditional, easier to debug if issues arise, and works better with some IDEs or Python tooling.
|
||||
- If unsure: try `uv`. If it doesn't work for you, fall back to `pip`.
|
||||
**Use `uv` if:**
|
||||
- You want the fastest installation and updates
|
||||
- You prefer automatic dependency isolation
|
||||
- You're setting up a new environment
|
||||
- You want minimal configuration hassle
|
||||
|
||||
### Cloud Installation Overview
|
||||
**Use `pip` if:**
|
||||
- You're experienced with Python package management
|
||||
- You need fine-grained control over dependencies
|
||||
- You're integrating with existing Python tooling
|
||||
- You encounter issues with `uv`
|
||||
|
||||
#### 🦒 Colab
|
||||
**Still unsure?** Start with `uv`. If you encounter problems, fall back to `pip`.
|
||||
|
||||
For browser-based training without local setup, use this Colab notebook:
|
||||
<https://github.com/camenduru/kohya_ss-colab>
|
||||
### Docker Installation
|
||||
|
||||
- No installation required
|
||||
- Free to use (GPU availability may vary)
|
||||
- Maintained by **camenduru**, not the original author
|
||||
**Best for:** Consistent environment, easy updates, isolation from system Python.
|
||||
|
||||
| Colab | Info |
|
||||
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------ |
|
||||
| [](https://colab.research.google.com/github/camenduru/kohya_ss-colab/blob/main/kohya_ss_colab.ipynb) | kohya_ss_gui_colab |
|
||||
Docker provides the fastest and most reliable way to run Kohya_ss with all dependencies pre-configured.
|
||||
|
||||
> 💡 If you encounter issues, please report them on camenduru’s repo.
|
||||
#### Prerequisites
|
||||
|
||||
**Special thanks**
|
||||
I would like to express my gratitude to camenduru for their valuable contribution.
|
||||
- Docker Desktop (Windows/Mac) or Docker Engine (Linux)
|
||||
- NVIDIA GPU with CUDA support
|
||||
- NVIDIA Container Toolkit (Linux) or WSL2 with GPU support (Windows)
|
||||
|
||||
#### Runpod, Novita, Docker
|
||||
#### Quick Start with Docker
|
||||
|
||||
These options are for users running training on hosted GPU infrastructure or containers.
|
||||
```bash
|
||||
# Clone repository with submodules
|
||||
git clone --recursive https://github.com/bmaltais/kohya_ss.git
|
||||
cd kohya_ss
|
||||
|
||||
- **[Runpod setup](docs/runpod_setup.md)** – Ready-made GPU background training via templates.
|
||||
- **[Novita setup](docs/novita_setup.md)** – Similar to Runpod, but integrated into the Novita UI.
|
||||
- **[Docker setup](docs/docker.md)** – For developers/sysadmins using containerized environments.
|
||||
# Start services
|
||||
docker compose up -d
|
||||
|
||||
# Access the GUI
|
||||
# Kohya GUI: http://localhost:7860
|
||||
# TensorBoard: http://localhost:6006
|
||||
```
|
||||
|
||||
## Custom Path Defaults with `config.toml`
|
||||
#### Updating Docker Installation
|
||||
|
||||
The GUI supports a configuration file named `config.toml` that allows you to set default paths for many of the input fields. This is useful for avoiding repetitive manual selection of directories every time you start the GUI.
|
||||
```bash
|
||||
# Stop containers
|
||||
docker compose down
|
||||
|
||||
**Purpose of `config.toml`:**
|
||||
# Pull latest images and restart
|
||||
docker compose up -d --pull always
|
||||
```
|
||||
|
||||
* Pre-fill default directory paths for pretrained models, datasets, output folders, LoRA models, etc.
|
||||
* Streamline your workflow by having the GUI remember your preferred locations.
|
||||
**Complete Docker documentation:** [docs/docker.md](./docs/docker.md)
|
||||
|
||||
**How to Use and Customize:**
|
||||
**Platform-specific setup:**
|
||||
- **Windows**: [Docker Desktop + WSL2 GPU Setup](./docs/docker.md#windows)
|
||||
- **Linux**: [NVIDIA Container Toolkit Setup](./docs/docker.md#linux)
|
||||
- **macOS**: Docker does not support NVIDIA GPUs (use cloud or native installation)
|
||||
|
||||
1. **Create your configuration file:**
|
||||
* In the root directory of the `kohya_ss` repository, you'll find a file named `config example.toml`.
|
||||
* Copy this file and rename the copy to `config.toml`. This `config.toml` file will be automatically loaded when the GUI starts.
|
||||
2. **Edit `config.toml`:**
|
||||
* Open `config.toml` with a text editor.
|
||||
* The file uses TOML (Tom's Obvious, Minimal Language) format, which consists of `key = "value"` pairs.
|
||||
* Modify the paths for the keys according to your local directory structure.
|
||||
* **Important:**
|
||||
* Use absolute paths (e.g., `C:/Users/YourName/StableDiffusion/Models` or `/home/yourname/sd-models`).
|
||||
* Alternatively, you can use paths relative to the `kohya_ss` root directory.
|
||||
* Ensure you use forward slashes (`/`) for paths, even on Windows, as this is generally more compatible with TOML and Python.
|
||||
* Make sure the specified directories exist on your system.
|
||||
### Cloud-Based Solutions
|
||||
|
||||
**Structure of `config.toml`:**
|
||||
No local GPU? Use these cloud alternatives:
|
||||
|
||||
The `config.toml` file can have several sections, typically corresponding to different training modes or general settings. Common keys you might want to set include:
|
||||
#### Google Colab (Free)
|
||||
|
||||
* `model_dir`: Default directory for loading base Stable Diffusion models.
|
||||
* `lora_model_dir`: Default directory for saving and loading LoRA models.
|
||||
* `output_dir`: Default base directory for training outputs (images, logs, model checkpoints).
|
||||
* `dataset_dir`: A general default if you store all your datasets in one place.
|
||||
* Specific input paths for different training tabs like Dreambooth, Finetune, LoRA, etc. (e.g., `db_model_dir`, `ft_source_model_name_or_path`).
|
||||
**Pros:** Free GPU access, no installation required, browser-based
|
||||
**Cons:** Session limits, may disconnect, shared resources
|
||||
|
||||
**Example Configurations:**
|
||||
[](https://colab.research.google.com/github/camenduru/kohya_ss-colab/blob/main/kohya_ss_colab.ipynb)
|
||||
|
||||
Here's an example snippet of what your `config.toml` might look like:
|
||||
- **Repository:** <https://github.com/camenduru/kohya_ss-colab>
|
||||
- **Maintained by:** camenduru (community contributor)
|
||||
- **Note:** Report Colab-specific issues to camenduru's repository
|
||||
|
||||
**Special thanks to camenduru for maintaining the Colab version!**
|
||||
|
||||
#### RunPod (Paid)
|
||||
|
||||
**Pros:** Dedicated GPUs, persistent storage, no session limits
|
||||
**Cons:** Costs money, requires account setup
|
||||
|
||||
- **Setup Guide:** [docs/installation_runpod.md](docs/installation_runpod.md)
|
||||
- **Templates available** with pre-configured environments
|
||||
|
||||
#### Novita (Paid)
|
||||
|
||||
**Pros:** Integrated UI, easy setup, good for beginners
|
||||
**Cons:** Costs money, platform-specific
|
||||
|
||||
- **Setup Guide:** [docs/installation_novita.md](docs/installation_novita.md)
|
||||
|
||||
## Configuration
|
||||
|
||||
### Custom Path Defaults with `config.toml`
|
||||
|
||||
Streamline your workflow by setting default paths for models, datasets, and outputs.
|
||||
|
||||
#### Quick Setup
|
||||
|
||||
1. **Copy the example configuration:**
|
||||
```bash
|
||||
cp "config example.toml" config.toml
|
||||
```
|
||||
|
||||
2. **Edit `config.toml`** with your preferred paths:
|
||||
```toml
|
||||
# Example configuration
|
||||
model_dir = "C:/ai/models/Stable-diffusion"
|
||||
lora_model_dir = "C:/ai/models/Lora"
|
||||
output_dir = "C:/ai/outputs"
|
||||
dataset_dir = "C:/ai/datasets"
|
||||
```
|
||||
|
||||
3. **Use absolute paths** or paths relative to the kohya_ss root directory
|
||||
|
||||
4. **Use forward slashes** (/) even on Windows for compatibility
|
||||
|
||||
#### Configuration Structure
|
||||
|
||||
The `config.toml` file supports multiple sections for different training modes:
|
||||
|
||||
```toml
|
||||
# General settings
|
||||
model_dir = "C:/ai_stuff/stable-diffusion-webui/models/Stable-diffusion"
|
||||
lora_model_dir = "C:/ai_stuff/stable-diffusion-webui/models/Lora"
|
||||
vae_dir = "C:/ai_stuff/stable-diffusion-webui/models/VAE"
|
||||
output_dir = "C:/ai_stuff/kohya_ss_outputs"
|
||||
logging_dir = "C:/ai_stuff/kohya_ss_outputs/logs"
|
||||
model_dir = "/path/to/models"
|
||||
lora_model_dir = "/path/to/lora"
|
||||
vae_dir = "/path/to/vae"
|
||||
output_dir = "/path/to/outputs"
|
||||
logging_dir = "/path/to/logs"
|
||||
|
||||
# Dreambooth specific paths
|
||||
db_model_dir = "C:/ai_stuff/stable-diffusion-webui/models/Stable-diffusion"
|
||||
db_reg_image_dir = "C:/ai_stuff/datasets/dreambooth_regularization_images"
|
||||
# Add other db_... paths as needed
|
||||
# Dreambooth specific
|
||||
db_model_dir = "/path/to/models"
|
||||
db_reg_image_dir = "/path/to/regularization"
|
||||
|
||||
# Finetune specific paths
|
||||
ft_model_dir = "C:/ai_stuff/stable-diffusion-webui/models/Stable-diffusion"
|
||||
# Add other ft_... paths as needed
|
||||
# LoRA specific
|
||||
lc_model_dir = "/path/to/models"
|
||||
lc_output_dir = "/path/to/outputs/lora"
|
||||
lc_dataset_dir = "/path/to/datasets"
|
||||
|
||||
# LoRA / LoCon specific paths
|
||||
lc_model_dir = "C:/ai_stuff/stable-diffusion-webui/models/Stable-diffusion" # Base model for LoRA training
|
||||
lc_output_dir = "C:/ai_stuff/kohya_ss_outputs/lora"
|
||||
lc_dataset_dir = "C:/ai_stuff/datasets/my_lora_project"
|
||||
# Add other lc_... paths as needed
|
||||
|
||||
# You can find a comprehensive list of all available keys in the `config example.toml` file.
|
||||
# Refer to it to customize paths for all supported options in the GUI.
|
||||
# See 'config example.toml' for complete list of options
|
||||
```
|
||||
|
||||
**Using a Custom Config File Path:**
|
||||
#### Using Custom Config Path
|
||||
|
||||
If you prefer to name your configuration file differently or store it in another location, you can specify its path using the `--config` command-line argument when launching the GUI:
|
||||
Specify a different config file location:
|
||||
|
||||
* On Windows: `gui.bat --config D:/my_configs/kohya_settings.toml`
|
||||
* On Linux/macOS: `./gui.sh --config /home/user/my_configs/kohya_settings.toml`
|
||||
```bash
|
||||
# Windows
|
||||
gui.bat --config D:/my_configs/kohya_settings.toml
|
||||
|
||||
By effectively using `config.toml`, you can significantly speed up your training setup process. Always refer to the `config example.toml` for the most up-to-date list of configurable paths.
|
||||
# Linux/macOS
|
||||
./gui.sh --config /home/user/my_configs/kohya_settings.toml
|
||||
```
|
||||
|
||||
## LoRA
|
||||
**Full configuration reference:** See `config example.toml` in the root directory
|
||||
|
||||
To train a LoRA, you can currently use the `train_network.py` code. You can create a LoRA network by using the all-in-one GUI.
|
||||
## Training Features
|
||||
|
||||
Once you have created the LoRA network, you can generate images using auto1111 by installing [this extension](https://github.com/kohya-ss/sd-webui-additional-networks).
|
||||
### LoRA Training
|
||||
|
||||
For more detailed information on LoRA training options and advanced configurations, please refer to our LoRA documentation:
|
||||
- [LoRA Training Guide](docs/LoRA/top_level.md)
|
||||
- [LoRA Training Options](docs/LoRA/options.md)
|
||||
LoRA (Low-Rank Adaptation) allows efficient fine-tuning of Stable Diffusion models with minimal computational requirements.
|
||||
|
||||
## Sample image generation during training
|
||||
**Training a LoRA:**
|
||||
1. Use the GUI's LoRA training tab
|
||||
2. Configure dataset and parameters
|
||||
3. Start training via `train_network.py`
|
||||
|
||||
A prompt file might look like this, for example:
|
||||
**Using trained LoRAs:**
|
||||
- Install [Additional Networks extension](https://github.com/kohya-ss/sd-webui-additional-networks) for Auto1111
|
||||
- Load LoRA in your preferred Stable Diffusion UI
|
||||
|
||||
**Documentation:**
|
||||
- [LoRA Training Guide](docs/LoRA/top_level.md) - Comprehensive overview
|
||||
- [LoRA Training Options](docs/LoRA/options.md) - Advanced configuration
|
||||
|
||||
### SDXL Training
|
||||
|
||||
Support for Stable Diffusion XL model training with optimized settings.
|
||||
|
||||
**Resources:**
|
||||
- [Official SDXL Training Guide](https://github.com/kohya-ss/sd-scripts/blob/main/README.md#sdxl-training)
|
||||
- [LoRA Training Guide](docs/LoRA/top_level.md) (includes SDXL sections)
|
||||
|
||||
### Sample Image Generation
|
||||
|
||||
Generate sample images during training to monitor progress and quality.
|
||||
|
||||
#### Creating a Prompt File
|
||||
|
||||
Create a text file with prompts and generation parameters:
|
||||
|
||||
```txt
|
||||
# prompt 1
|
||||
|
|
@ -202,106 +283,271 @@ masterpiece, best quality, (1girl), in white shirts, upper body, looking at view
|
|||
masterpiece, best quality, 1boy, in business suit, standing at street, looking back --n (low quality, worst quality), bad anatomy, bad composition, poor, low effort --w 576 --h 832 --d 2 --l 5.5 --s 40
|
||||
```
|
||||
|
||||
Lines beginning with `#` are comments. You can specify options for the generated image with options like `--n` after the prompt. The following options can be used:
|
||||
#### Available Options
|
||||
|
||||
- `--n`: Negative prompt up to the next option.
|
||||
- `--w`: Specifies the width of the generated image.
|
||||
- `--h`: Specifies the height of the generated image.
|
||||
- `--d`: Specifies the seed of the generated image.
|
||||
- `--l`: Specifies the CFG scale of the generated image.
|
||||
- `--s`: Specifies the number of steps in the generation.
|
||||
- `--n`: Negative prompt (text to avoid)
|
||||
- `--w`: Image width in pixels
|
||||
- `--h`: Image height in pixels
|
||||
- `--d`: Seed for reproducibility
|
||||
- `--l`: CFG scale (guidance strength)
|
||||
- `--s`: Number of sampling steps
|
||||
|
||||
The prompt weighting such as `( )` and `[ ]` is working.
|
||||
**Note:** Prompt weighting with `()` and `[]` is supported.
|
||||
|
||||
### Masked Loss
|
||||
|
||||
Enable masked loss to train only specific regions of images.
|
||||
|
||||
**Activation:** Add `--masked_loss` option in training configuration
|
||||
|
||||
**How it works:**
|
||||
- Uses ControlNet dataset format
|
||||
- RGB mask images where Red channel value determines weight
|
||||
- 255 (full weight) = train this area
|
||||
- 0 (no weight) = ignore this area
|
||||
- 128 (half weight) = partial training
|
||||
- Pixel values 0-255 map to loss weights 0.0-1.0
|
||||
|
||||
**Documentation:** [LLLite Training Guide](./docs/train_lllite_README.md#preparing-the-dataset)
|
||||
|
||||
**Warning:** This feature is experimental. Please report issues on GitHub.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If you encounter any issues, refer to the troubleshooting steps below.
|
||||
### Common Issues
|
||||
|
||||
### Page File Limit
|
||||
#### Page File Limit (Windows)
|
||||
|
||||
If you encounter an X error related to the page file, you may need to increase the page file size limit in Windows.
|
||||
**Symptom:** Error about page file size
|
||||
|
||||
### No module called tkinter
|
||||
**Solution:** Increase Windows virtual memory (page file) size:
|
||||
1. System Properties > Advanced > Performance Settings
|
||||
2. Virtual Memory > Change
|
||||
3. Set custom size (16GB+ recommended)
|
||||
|
||||
If you encounter an error indicating that the module `tkinter` is not found, try reinstalling Python 3.10 on your system.
|
||||
#### No module called 'tkinter'
|
||||
|
||||
### LORA Training on TESLA V100 - GPU Utilization Issue
|
||||
**Symptom:** Import error for tkinter module
|
||||
|
||||
See [Troubleshooting LORA Training on TESLA V100](docs/troubleshooting_tesla_v100.md) for details.
|
||||
**Solutions:**
|
||||
- **Windows:** Reinstall Python 3.10 or 3.11 with "tcl/tk" option enabled
|
||||
- **Linux:** `sudo apt-get install python3-tk`
|
||||
- **macOS:** Reinstall Python from python.org (not Homebrew)
|
||||
|
||||
## SDXL training
|
||||
#### GPU Not Being Used / Low GPU Utilization
|
||||
|
||||
For detailed guidance on SDXL training, please refer to the [official sd-scripts documentation](https://github.com/kohya-ss/sd-scripts/blob/main/README.md#sdxl-training) and relevant sections in our [LoRA Training Guide](docs/LoRA/top_level.md).
|
||||
**Symptoms:** Training is slow, GPU usage at 0-10%
|
||||
|
||||
## Masked loss
|
||||
**Solutions:**
|
||||
1. Verify CUDA installation: `nvidia-smi`
|
||||
2. Check PyTorch GPU access:
|
||||
```python
|
||||
import torch
|
||||
print(torch.cuda.is_available())
|
||||
print(torch.cuda.get_device_name(0))
|
||||
```
|
||||
3. Increase batch size
|
||||
4. Disable CPU offloading options
|
||||
5. See: [Tesla V100 Troubleshooting](docs/troubleshooting_tesla_v100.md)
|
||||
|
||||
The masked loss is supported in each training script. To enable the masked loss, specify the `--masked_loss` option.
|
||||
#### Out of Memory Errors
|
||||
|
||||
> [!WARNING]
|
||||
> The feature is not fully tested, so there may be bugs. If you find any issues, please open an Issue.
|
||||
**Solutions:**
|
||||
- Reduce batch size
|
||||
- Enable gradient checkpointing
|
||||
- Use mixed precision training (fp16)
|
||||
- Lower resolution
|
||||
- Enable CPU offloading
|
||||
- Close other GPU applications
|
||||
|
||||
ControlNet dataset is used to specify the mask. The mask images should be the RGB images. The pixel value 255 in R channel is treated as the mask (the loss is calculated only for the pixels with the mask), and 0 is treated as the non-mask. The pixel values 0-255 are converted to 0-1 (i.e., the pixel value 128 is treated as the half weight of the loss). See details for the dataset specification in the [LLLite documentation](./docs/train_lllite_README.md#preparing-the-dataset).
|
||||
#### Docker-Specific Issues
|
||||
|
||||
## Guides
|
||||
See the comprehensive [Docker Troubleshooting Guide](./docs/docker.md#troubleshooting) for:
|
||||
- GPU not detected in container
|
||||
- Permission denied errors
|
||||
- Volume mount issues
|
||||
- Port conflicts
|
||||
|
||||
The following are guides extracted from issues discussions
|
||||
### Getting Help
|
||||
|
||||
### Using Accelerate Lora Tab to Select GPU ID
|
||||
If you're stuck:
|
||||
|
||||
#### Starting Accelerate in GUI
|
||||
1. **Search existing issues:** <https://github.com/bmaltais/kohya_ss/issues>
|
||||
2. **Check documentation:** See `/docs` directory
|
||||
3. **Open a new issue** with:
|
||||
- Operating system and version
|
||||
- Installation method (Docker/uv/pip)
|
||||
- Python version
|
||||
- Full error message and logs
|
||||
- Steps to reproduce
|
||||
|
||||
- Open the kohya GUI on your desired port.
|
||||
- Open the `Accelerate launch` tab
|
||||
- Ensure the Multi-GPU checkbox is unchecked.
|
||||
- Set GPU IDs to the desired GPU (like 1).
|
||||
## Advanced Usage
|
||||
|
||||
#### Running Multiple Instances (linux)
|
||||
### Accelerate Configuration for Multi-GPU
|
||||
|
||||
- For tracking multiple processes, use separate kohya GUI instances on different ports (e.g., 7860, 7861).
|
||||
- Start instances using `nohup ./gui.sh --listen 0.0.0.0 --server_port <port> --headless > log.log 2>&1 &`.
|
||||
Use the Accelerate tab in the GUI to configure multi-GPU training:
|
||||
|
||||
#### Monitoring Processes
|
||||
1. Open the "Accelerate launch" tab
|
||||
2. For single GPU: Uncheck "Multi-GPU", set GPU ID (e.g., "0" or "1")
|
||||
3. For multi-GPU: Check "Multi-GPU", configure device IDs
|
||||
|
||||
- Open each GUI in a separate browser tab.
|
||||
- For terminal access, use SSH and tools like `tmux` or `screen`.
|
||||
#### Running Multiple Instances (Linux)
|
||||
|
||||
For more details, visit the [GitHub issue](https://github.com/bmaltais/kohya_ss/issues/2577).
|
||||
Run separate GUI instances for different training jobs:
|
||||
|
||||
```bash
|
||||
# Start first instance on port 7860
|
||||
nohup ./gui.sh --listen 0.0.0.0 --server_port 7860 --headless > log_7860.log 2>&1 &
|
||||
|
||||
# Start second instance on port 7861
|
||||
nohup ./gui.sh --listen 0.0.0.0 --server_port 7861 --headless > log_7861.log 2>&1 &
|
||||
```
|
||||
|
||||
**Monitoring:** Use `tmux` or `screen` for terminal management
|
||||
|
||||
**More details:** [GitHub Issue #2577](https://github.com/bmaltais/kohya_ss/issues/2577)
|
||||
|
||||
### Command-Line Usage
|
||||
|
||||
The GUI generates CLI commands that can be run directly:
|
||||
|
||||
```bash
|
||||
# Activate virtual environment first
|
||||
source venv/bin/activate # Linux/macOS
|
||||
# or
|
||||
venv\Scripts\activate.bat # Windows
|
||||
|
||||
# Run training script directly
|
||||
python sd-scripts/train_network.py \
|
||||
--pretrained_model_name_or_path=/path/to/model.safetensors \
|
||||
--train_data_dir=/path/to/dataset \
|
||||
--output_dir=/path/to/output \
|
||||
# ... additional parameters
|
||||
```
|
||||
|
||||
### Using Different Python Versions
|
||||
|
||||
Kohya_ss supports Python 3.10 and 3.11:
|
||||
|
||||
```bash
|
||||
# Create environment with specific version
|
||||
uv venv --python 3.11
|
||||
# or
|
||||
python3.11 -m venv venv
|
||||
```
|
||||
|
||||
## Interesting Forks
|
||||
|
||||
To finetune HunyuanDiT models or create LoRAs, visit this [fork](https://github.com/Tencent/HunyuanDiT/tree/main/kohya_ss-hydit)
|
||||
Community-maintained variants with additional features:
|
||||
|
||||
- **HunyuanDiT Support:** Fine-tune HunyuanDiT models
|
||||
- Repository: <https://github.com/Tencent/HunyuanDiT/tree/main/kohya_ss-hydit>
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are welcome! If you'd like to contribute to this project, please consider the following:
|
||||
- For bug reports or feature requests, please open an issue on the [GitHub Issues page](https://github.com/bmaltais/kohya_ss/issues).
|
||||
- If you'd like to submit code changes, please open a pull request. Ensure your changes are well-tested and follow the existing code style.
|
||||
- For security-related concerns, please refer to our `SECURITY.md` file.
|
||||
Contributions are welcome! Help improve Kohya_ss by:
|
||||
|
||||
**Reporting Issues:**
|
||||
- Use [GitHub Issues](https://github.com/bmaltais/kohya_ss/issues)
|
||||
- Include detailed reproduction steps
|
||||
- Provide system information and logs
|
||||
|
||||
**Submitting Code:**
|
||||
- Fork the repository
|
||||
- Create a feature branch
|
||||
- Follow existing code style
|
||||
- Test thoroughly before submitting PR
|
||||
- Document new features
|
||||
|
||||
**Security Issues:**
|
||||
- See [SECURITY.md](SECURITY.md) for responsible disclosure
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the Apache License 2.0. See the [LICENSE.md](LICENSE.md) file for details.
|
||||
This project is licensed under the **Apache License 2.0**.
|
||||
|
||||
See [LICENSE.md](LICENSE.md) for complete terms.
|
||||
|
||||
## Change History
|
||||
|
||||
### v25.2.1 (Current)
|
||||
|
||||
- Latest stable release
|
||||
- Python 3.11 support
|
||||
- Updated dependencies
|
||||
|
||||
### v25.0.3
|
||||
|
||||
- Upgrade Gradio, diffusers and huggingface-hub to latest release to fix issue with ASGI.
|
||||
- Add a new method to setup and run the GUI. You will find two new script for both Windows (gui-uv.bat) and Linux (gui-uv.sh). With those scripts there is no need to run setup.bat or setup.sh anymore.
|
||||
- Upgraded Gradio, diffusers, and huggingface-hub to fix ASGI issues
|
||||
- New simplified setup scripts:
|
||||
- `gui-uv.bat` (Windows) and `gui-uv.sh` (Linux)
|
||||
- No need to run separate setup scripts anymore
|
||||
|
||||
### v25.0.2
|
||||
|
||||
- Force gradio to 5.14.0 or greater so it is updated.
|
||||
- Forced Gradio upgrade to 5.14.0+ for critical updates
|
||||
|
||||
### v25.0.1
|
||||
|
||||
- Fix issue with requirements version causing huggingface download issues
|
||||
- Fixed requirements versioning issues affecting Hugging Face downloads
|
||||
|
||||
### v25.0.0
|
||||
|
||||
- Major update: Introduced support for flux.1 and sd3, moving the GUI to align with more recent script functionalities.
|
||||
- Users preferring the pre-flux.1/sd3 version can check out tag `v24.1.7`.
|
||||
```shell
|
||||
git checkout v24.1.7
|
||||
```
|
||||
- For details on new flux.1 and sd3 parameters, refer to the [sd-scripts README](https://github.com/kohya-ss/sd-scripts/blob/sd3/README.md).
|
||||
- **Major update:** Added support for Flux.1 and SD3
|
||||
- Aligned GUI with latest sd-scripts features
|
||||
- Breaking changes: Previous workflows may need adjustment
|
||||
|
||||
**Note:** For pre-Flux.1/SD3 version, checkout tag `v24.1.7`:
|
||||
```bash
|
||||
git checkout v24.1.7
|
||||
```
|
||||
|
||||
**Flux.1 and SD3 Parameters:**
|
||||
- See [sd-scripts README](https://github.com/kohya-ss/sd-scripts/blob/sd3/README.md)
|
||||
|
||||
### Older Versions
|
||||
|
||||
For complete version history, see [GitHub Releases](https://github.com/bmaltais/kohya_ss/releases).
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Important Links
|
||||
|
||||
- **Main Repository:** <https://github.com/bmaltais/kohya_ss>
|
||||
- **SD-Scripts (Core Training):** <https://github.com/kohya-ss/sd-scripts>
|
||||
- **Issues & Support:** <https://github.com/bmaltais/kohya_ss/issues>
|
||||
- **Colab Version:** <https://github.com/camenduru/kohya_ss-colab>
|
||||
|
||||
### Default Ports
|
||||
|
||||
- **Kohya GUI:** 7860
|
||||
- **TensorBoard:** 6006
|
||||
|
||||
### File Locations
|
||||
|
||||
- **Config:** `config.toml` (root directory)
|
||||
- **Training Scripts:** `sd-scripts/` (submodule)
|
||||
- **Documentation:** `docs/`
|
||||
- **Examples:** `examples/`
|
||||
|
||||
### Supported Models
|
||||
|
||||
- Stable Diffusion 1.x, 2.x
|
||||
- Stable Diffusion XL (SDXL)
|
||||
- Stable Diffusion 3 (SD3)
|
||||
- Flux.1
|
||||
- Custom fine-tuned models
|
||||
|
||||
### Training Methods
|
||||
|
||||
- LoRA (Low-Rank Adaptation)
|
||||
- Dreambooth
|
||||
- Fine-tuning
|
||||
- Textual Inversion
|
||||
- LLLite
|
||||
|
||||
---
|
||||
|
||||
**Need help?** Check the [documentation](./docs/) or open an [issue](https://github.com/bmaltais/kohya_ss/issues)!
|
||||
|
|
|
|||
|
|
@ -0,0 +1,506 @@
|
|||
# Docker Setup Guide for Kohya_ss
|
||||
|
||||
This guide provides comprehensive instructions for running Kohya_ss in Docker containers.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Quick Start](#quick-start)
|
||||
- [Configuration](#configuration)
|
||||
- [Usage](#usage)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
- [Advanced Configuration](#advanced-configuration)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### System Requirements
|
||||
|
||||
- **GPU**: NVIDIA GPU with CUDA support (compute capability 7.0+)
|
||||
- **RAM**: Minimum 16GB recommended
|
||||
- **Storage**: At least 50GB free space for models and datasets
|
||||
- **OS**: Linux, Windows 10/11 with WSL2, or macOS (limited support)
|
||||
|
||||
### Required Software
|
||||
|
||||
#### Windows
|
||||
|
||||
1. **Docker Desktop** (version 4.0+)
|
||||
- Download from: <https://www.docker.com/products/docker-desktop/>
|
||||
- Ensure WSL2 backend is enabled
|
||||
|
||||
2. **NVIDIA CUDA Toolkit**
|
||||
- Download from: <https://developer.nvidia.com/cuda-downloads>
|
||||
- Version 12.8 or compatible
|
||||
|
||||
3. **NVIDIA Windows Driver**
|
||||
- Download from: <https://www.nvidia.com/Download/index.aspx>
|
||||
- Version 525.60.11 or newer
|
||||
|
||||
4. **WSL2 with GPU Support**
|
||||
- Enable WSL2: <https://docs.docker.com/desktop/wsl/#turn-on-docker-desktop-wsl-2>
|
||||
- Verify GPU support: <https://docs.docker.com/desktop/wsl/use-wsl/#gpu-support>
|
||||
|
||||
**Official Documentation:**
|
||||
- <https://docs.nvidia.com/cuda/wsl-user-guide/index.html#nvidia-compute-software-support-on-wsl-2>
|
||||
|
||||
#### Linux
|
||||
|
||||
1. **Docker Engine** or **Docker Desktop**
|
||||
- Install guide: <https://docs.docker.com/engine/install/>
|
||||
|
||||
2. **NVIDIA GPU Driver**
|
||||
- Install the latest driver for your GPU
|
||||
- Guide: <https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html>
|
||||
|
||||
3. **NVIDIA Container Toolkit**
|
||||
- Required for GPU access in containers
|
||||
- Install guide: <https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html>
|
||||
|
||||
#### macOS
|
||||
|
||||
Docker on macOS does not support NVIDIA GPU acceleration. For GPU-accelerated training on Mac:
|
||||
- Use cloud-based solutions (see [Cloud Alternatives](#cloud-alternatives))
|
||||
- Or install natively using the installation guides in `/docs/Installation/`
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Using Pre-built Images (Recommended)
|
||||
|
||||
This is the fastest way to get started. The images are automatically built and published to GitHub Container Registry.
|
||||
|
||||
```bash
|
||||
# Clone the repository recursively (important!)
|
||||
git clone --recursive https://github.com/bmaltais/kohya_ss.git
|
||||
cd kohya_ss
|
||||
|
||||
# Start the services
|
||||
docker compose up -d
|
||||
|
||||
# View logs
|
||||
docker compose logs -f
|
||||
```
|
||||
|
||||
**Access the GUI:**
|
||||
- Kohya GUI: <http://localhost:7860>
|
||||
- TensorBoard: <http://localhost:6006>
|
||||
|
||||
### Building Locally
|
||||
|
||||
If you need to modify the Dockerfile or want to build from source:
|
||||
|
||||
```bash
|
||||
# Clone recursively to include submodules
|
||||
git clone --recursive https://github.com/bmaltais/kohya_ss.git
|
||||
cd kohya_ss
|
||||
|
||||
# Build and start
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
**Note:** Initial build may take 15-30 minutes depending on your internet connection and hardware.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file in the root directory to customize settings:
|
||||
|
||||
```bash
|
||||
# .env file example
|
||||
TENSORBOARD_PORT=6006
|
||||
UID=1000
|
||||
```
|
||||
|
||||
**Available Variables:**
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `TENSORBOARD_PORT` | Port for TensorBoard web interface | `6006` |
|
||||
| `UID` | User ID for file permissions | `1000` |
|
||||
|
||||
### User ID Configuration
|
||||
|
||||
The `UID` parameter is critical for file permissions. To find your user ID:
|
||||
|
||||
```bash
|
||||
# Linux/macOS/WSL
|
||||
id -u
|
||||
|
||||
# Then set it in docker-compose.yaml or .env
|
||||
```
|
||||
|
||||
If you encounter permission errors, ensure the UID in docker-compose.yaml matches your host user ID.
|
||||
|
||||
### Volume Mounts
|
||||
|
||||
The Docker setup uses the following directory structure:
|
||||
|
||||
```
|
||||
kohya_ss/
|
||||
├── dataset/ # Your training datasets
|
||||
│ ├── images/ # Training images
|
||||
│ ├── logs/ # TensorBoard logs
|
||||
│ ├── outputs/ # Trained models output
|
||||
│ └── regularization/ # Regularization images
|
||||
├── models/ # Pre-trained models
|
||||
└── .cache/ # Cache directories
|
||||
├── config/
|
||||
├── user/
|
||||
├── triton/
|
||||
├── nv/
|
||||
└── keras/
|
||||
```
|
||||
|
||||
**Important:** All training data must be placed in the `dataset/` directory or its subdirectories.
|
||||
|
||||
### Directory Setup
|
||||
|
||||
Before first use, ensure these directories exist:
|
||||
|
||||
```bash
|
||||
mkdir -p dataset/images dataset/logs dataset/outputs dataset/regularization
|
||||
mkdir -p models
|
||||
mkdir -p .cache/{config,user,triton,nv,keras}
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Starting the Services
|
||||
|
||||
```bash
|
||||
# Start in detached mode
|
||||
docker compose up -d
|
||||
|
||||
# Start with logs visible
|
||||
docker compose up
|
||||
|
||||
# Start only specific service
|
||||
docker compose up -d kohya-ss-gui
|
||||
```
|
||||
|
||||
### Stopping the Services
|
||||
|
||||
```bash
|
||||
# Stop all services
|
||||
docker compose down
|
||||
|
||||
# Stop and remove volumes (warning: deletes data)
|
||||
docker compose down -v
|
||||
```
|
||||
|
||||
### Updating
|
||||
|
||||
To update to the latest version:
|
||||
|
||||
```bash
|
||||
# Pull latest images
|
||||
docker compose down
|
||||
docker compose pull
|
||||
docker compose up -d
|
||||
|
||||
# Or with auto-pull
|
||||
docker compose down && docker compose up -d --pull always
|
||||
```
|
||||
|
||||
If you're building locally:
|
||||
|
||||
```bash
|
||||
# Update code
|
||||
git pull
|
||||
git submodule update --init --recursive
|
||||
|
||||
# Rebuild and restart
|
||||
docker compose down
|
||||
docker compose up -d --build --pull always
|
||||
```
|
||||
|
||||
### Viewing Logs
|
||||
|
||||
```bash
|
||||
# All services
|
||||
docker compose logs -f
|
||||
|
||||
# Specific service
|
||||
docker compose logs -f kohya-ss-gui
|
||||
|
||||
# Last 100 lines
|
||||
docker compose logs --tail=100
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### GPU Not Detected
|
||||
|
||||
**Symptoms:** Training is slow, no GPU utilization in `nvidia-smi`
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Verify GPU is visible to Docker:
|
||||
```bash
|
||||
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi
|
||||
```
|
||||
|
||||
2. Check NVIDIA Container Toolkit:
|
||||
```bash
|
||||
# Linux
|
||||
nvidia-ctk --version
|
||||
|
||||
# If not installed, see prerequisites
|
||||
```
|
||||
|
||||
3. Windows WSL2 users:
|
||||
- Ensure Docker Desktop is using WSL2 backend
|
||||
- Verify CUDA is working in WSL: `nvidia-smi` in WSL terminal
|
||||
|
||||
### Permission Denied Errors
|
||||
|
||||
**Symptoms:** Cannot read/write files in mounted volumes
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check your user ID:
|
||||
```bash
|
||||
id -u
|
||||
```
|
||||
|
||||
2. Update docker-compose.yaml:
|
||||
```yaml
|
||||
services:
|
||||
kohya-ss-gui:
|
||||
user: YOUR_UID:0 # Replace YOUR_UID with actual UID
|
||||
build:
|
||||
args:
|
||||
- UID=YOUR_UID # Same here
|
||||
```
|
||||
|
||||
3. Fix ownership of existing files:
|
||||
```bash
|
||||
sudo chown -R YOUR_UID:YOUR_UID dataset/ models/ .cache/
|
||||
```
|
||||
|
||||
### Out of Memory Errors
|
||||
|
||||
**Symptoms:** Container crashes, training fails with OOM
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Add memory limits to docker-compose.yaml:
|
||||
```yaml
|
||||
services:
|
||||
kohya-ss-gui:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 32G # Adjust based on your system
|
||||
```
|
||||
|
||||
2. Reduce batch size in training parameters
|
||||
3. Use gradient checkpointing
|
||||
4. Enable CPU offloading in training settings
|
||||
|
||||
### Container Won't Start
|
||||
|
||||
**Symptoms:** Container exits immediately or shows errors
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check logs:
|
||||
```bash
|
||||
docker compose logs kohya-ss-gui
|
||||
```
|
||||
|
||||
2. Verify all submodules are cloned:
|
||||
```bash
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
3. Remove old containers and images:
|
||||
```bash
|
||||
docker compose down
|
||||
docker system prune -a
|
||||
docker compose up -d --build
|
||||
```
|
||||
|
||||
### File Picker Not Working
|
||||
|
||||
**Note:** This is a known limitation of the Docker setup.
|
||||
|
||||
**Workaround:** Manually type the full path instead of using the file picker. Paths should be relative to `/app` or `/dataset`:
|
||||
|
||||
Examples:
|
||||
- Training images: `/dataset/images/my_dataset`
|
||||
- Model output: `/dataset/outputs/my_model`
|
||||
- Pretrained model: `/app/models/sd_xl_base_1.0.safetensors`
|
||||
|
||||
### TensorBoard Not Accessible
|
||||
|
||||
**Symptoms:** Cannot access TensorBoard at localhost:6006
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. Check if container is running:
|
||||
```bash
|
||||
docker compose ps
|
||||
```
|
||||
|
||||
2. Verify logs are being written:
|
||||
```bash
|
||||
ls -la dataset/logs/
|
||||
```
|
||||
|
||||
3. Check port conflicts:
|
||||
```bash
|
||||
# Linux/macOS
|
||||
sudo lsof -i :6006
|
||||
|
||||
# Windows PowerShell
|
||||
netstat -ano | findstr :6006
|
||||
```
|
||||
|
||||
4. Change port in .env file if needed:
|
||||
```bash
|
||||
echo "TENSORBOARD_PORT=6007" > .env
|
||||
docker compose down && docker compose up -d
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Custom CUDA Version
|
||||
|
||||
If you need a different CUDA version, modify the Dockerfile:
|
||||
|
||||
```dockerfile
|
||||
# Line 39-40
|
||||
ENV CUDA_VERSION=12.8
|
||||
ENV NVIDIA_REQUIRE_CUDA=cuda>=12.8
|
||||
|
||||
# Line 61
|
||||
ENV UV_INDEX=https://download.pytorch.org/whl/cu128
|
||||
```
|
||||
|
||||
### Resource Limits
|
||||
|
||||
Add resource limits to prevent container from consuming all system resources:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yaml
|
||||
services:
|
||||
kohya-ss-gui:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '8'
|
||||
memory: 32G
|
||||
reservations:
|
||||
cpus: '4'
|
||||
memory: 16G
|
||||
devices:
|
||||
- driver: nvidia
|
||||
capabilities: [gpu]
|
||||
device_ids: ["0"] # Specific GPU
|
||||
```
|
||||
|
||||
### Multiple GPU Setup
|
||||
|
||||
To use specific GPUs:
|
||||
|
||||
```yaml
|
||||
# Use GPU 0 and 1
|
||||
device_ids: ["0", "1"]
|
||||
|
||||
# Use all GPUs
|
||||
device_ids: ["all"]
|
||||
```
|
||||
|
||||
In the container, you can also use `CUDA_VISIBLE_DEVICES`:
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
CUDA_VISIBLE_DEVICES: "0,1"
|
||||
```
|
||||
|
||||
### Restart Policies
|
||||
|
||||
Add automatic restart on failure:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
kohya-ss-gui:
|
||||
restart: unless-stopped
|
||||
tensorboard:
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
### Using Different Base Images
|
||||
|
||||
For development or debugging, you can switch base images:
|
||||
|
||||
```dockerfile
|
||||
# Use full CUDA toolkit instead of minimal
|
||||
FROM docker.io/nvidia/cuda:12.8.0-devel-ubuntu22.04 AS base
|
||||
```
|
||||
|
||||
## Docker Design Philosophy
|
||||
|
||||
This Docker setup follows these principles:
|
||||
|
||||
1. **Disposable Containers**: Containers can be destroyed and recreated at any time. All important data is stored in mounted volumes.
|
||||
|
||||
2. **Data Separation**: Training data, models, and outputs are kept outside the container in the `dataset/` directory.
|
||||
|
||||
3. **No Built-in File Picker**: Due to container isolation, the GUI file picker is disabled. Use manual path entry instead.
|
||||
|
||||
4. **Separate TensorBoard**: TensorBoard runs in its own container for better resource isolation and easier updates.
|
||||
|
||||
5. **Minimal Image Size**: Only essential CUDA libraries are included to reduce image size from ~8GB to ~3GB.
|
||||
|
||||
## Cloud Alternatives
|
||||
|
||||
If Docker on your local machine isn't suitable:
|
||||
|
||||
- **RunPod**: See [docs/installation_runpod.md](installation_runpod.md)
|
||||
- **Novita**: See [docs/installation_novita.md](installation_novita.md)
|
||||
- **Colab**: See [README.md](../README.md#-colab) for free cloud-based option
|
||||
|
||||
## Community Docker Builds
|
||||
|
||||
Alternative Docker implementations with different features:
|
||||
|
||||
- **P2Enjoy's Linux-optimized build**: <https://github.com/P2Enjoy/kohya_ss-docker>
|
||||
- Fewer limitations on Linux
|
||||
- Different architecture
|
||||
|
||||
- **Ashley Kleynhans' RunPod templates**:
|
||||
- Standalone: <https://github.com/ashleykleynhans/kohya-docker>
|
||||
- With Auto1111: <https://github.com/ashleykleynhans/stable-diffusion-docker>
|
||||
|
||||
## Getting Help
|
||||
|
||||
If you encounter issues:
|
||||
|
||||
1. Check this troubleshooting guide
|
||||
2. Review container logs: `docker compose logs`
|
||||
3. Search existing issues: <https://github.com/bmaltais/kohya_ss/issues>
|
||||
4. Open a new issue with:
|
||||
- Your OS and Docker version
|
||||
- Complete error logs
|
||||
- Steps to reproduce
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **Use SSD storage** for dataset and model directories
|
||||
2. **Increase Docker memory limit** in Docker Desktop settings (Windows/macOS)
|
||||
3. **Use tmpfs for temporary files** (already configured in docker-compose.yaml)
|
||||
4. **Enable BuildKit** for faster builds:
|
||||
```bash
|
||||
export DOCKER_BUILDKIT=1
|
||||
```
|
||||
5. **Use pillow-simd** (automatically enabled on x86_64 in Dockerfile)
|
||||
|
||||
## Security Notes
|
||||
|
||||
1. The container runs as a non-root user (UID 1000 by default)
|
||||
2. Only necessary ports are exposed
|
||||
3. Sensitive data should not be included in the image build
|
||||
4. Use `.dockerignore` to exclude credentials and secrets
|
||||
5. Keep base images updated for security patches
|
||||
Loading…
Reference in New Issue