12 KiB

Raw Blame History

Docker Setup Guide for Kohya_ss

This guide provides comprehensive instructions for running Kohya_ss in Docker containers.

Prerequisites
Quick Start
Configuration
Usage
Troubleshooting
Advanced Configuration

Prerequisites

System Requirements

GPU: NVIDIA GPU with CUDA support (compute capability 7.0+)
RAM: Minimum 16GB recommended
Storage: At least 50GB free space for models and datasets
OS: Linux, Windows 10/11 with WSL2, or macOS (limited support)

Required Software

Windows

Docker Desktop (version 4.0+)
- Download from: https://www.docker.com/products/docker-desktop/
- Ensure WSL2 backend is enabled
NVIDIA CUDA Toolkit
- Download from: https://developer.nvidia.com/cuda-downloads
- Version 12.8 or compatible
NVIDIA Windows Driver
- Download from: https://www.nvidia.com/Download/index.aspx
- Version 525.60.11 or newer
WSL2 with GPU Support
- Enable WSL2: https://docs.docker.com/desktop/wsl/#turn-on-docker-desktop-wsl-2
- Verify GPU support: https://docs.docker.com/desktop/wsl/use-wsl/#gpu-support

Official Documentation:

https://docs.nvidia.com/cuda/wsl-user-guide/index.html#nvidia-compute-software-support-on-wsl-2

Linux

Docker Engine or Docker Desktop
- Install guide: https://docs.docker.com/engine/install/
NVIDIA GPU Driver
- Install the latest driver for your GPU
- Guide: https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html
NVIDIA Container Toolkit
- Required for GPU access in containers
- Install guide: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

macOS

Docker on macOS does not support NVIDIA GPU acceleration. For GPU-accelerated training on Mac:

Use cloud-based solutions (see Cloud Alternatives)
Or install natively using the installation guides in /docs/Installation/

Quick Start

Using Pre-built Images (Recommended)

This is the fastest way to get started. The images are automatically built and published to GitHub Container Registry.

# Clone the repository recursively (important!)
git clone --recursive https://github.com/bmaltais/kohya_ss.git
cd kohya_ss

# Start the services
docker compose up -d

# View logs
docker compose logs -f

Access the GUI:

Kohya GUI: http://localhost:7860
TensorBoard: http://localhost:6006

Building Locally

If you need to modify the Dockerfile or want to build from source:

# Clone recursively to include submodules
git clone --recursive https://github.com/bmaltais/kohya_ss.git
cd kohya_ss

# Build and start
docker compose up -d --build

Note: Initial build may take 15-30 minutes depending on your internet connection and hardware.

Configuration

Environment Variables

Create a .env file in the root directory to customize settings:

# .env file example
TENSORBOARD_PORT=6006
UID=1000

Available Variables:

Variable	Description	Default
`TENSORBOARD_PORT`	Port for TensorBoard web interface	`6006`
`UID`	User ID for file permissions	`1000`

User ID Configuration

The UID parameter is critical for file permissions. To find your user ID:

# Linux/macOS/WSL
id -u

# Then set it in docker-compose.yaml or .env

If you encounter permission errors, ensure the UID in docker-compose.yaml matches your host user ID.

Volume Mounts

The Docker setup uses the following directory structure:

kohya_ss/
├── dataset/              # Your training datasets
│   ├── images/          # Training images
│   ├── logs/            # TensorBoard logs
│   ├── outputs/         # Trained models output
│   └── regularization/  # Regularization images
├── models/              # Pre-trained models
└── .cache/              # Cache directories
    ├── config/
    ├── user/
    ├── triton/
    ├── nv/
    └── keras/

Important: All training data must be placed in the dataset/ directory or its subdirectories.

Directory Setup

Before first use, ensure these directories exist:

mkdir -p dataset/images dataset/logs dataset/outputs dataset/regularization
mkdir -p models
mkdir -p .cache/{config,user,triton,nv,keras}

Usage

Starting the Services

# Start in detached mode
docker compose up -d

# Start with logs visible
docker compose up

# Start only specific service
docker compose up -d kohya-ss-gui

Stopping the Services

# Stop all services
docker compose down

# Stop and remove volumes (warning: deletes data)
docker compose down -v

Updating

To update to the latest version:

# Pull latest images
docker compose down
docker compose pull
docker compose up -d

# Or with auto-pull
docker compose down && docker compose up -d --pull always

If you're building locally:

# Update code
git pull
git submodule update --init --recursive

# Rebuild and restart
docker compose down
docker compose up -d --build --pull always

Viewing Logs

# All services
docker compose logs -f

# Specific service
docker compose logs -f kohya-ss-gui

# Last 100 lines
docker compose logs --tail=100

Troubleshooting

GPU Not Detected

Symptoms: Training is slow, no GPU utilization in nvidia-smi

Solutions:

Verify GPU is visible to Docker:

docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi

Check NVIDIA Container Toolkit:

# Linux
nvidia-ctk --version

# If not installed, see prerequisites

Windows WSL2 users:
- Ensure Docker Desktop is using WSL2 backend
- Verify CUDA is working in WSL: nvidia-smi in WSL terminal

Permission Denied Errors

Symptoms: Cannot read/write files in mounted volumes

Solutions:

Check your user ID:
```
id -u
```

Update docker-compose.yaml:

services:
  kohya-ss-gui:
    user: YOUR_UID:0  # Replace YOUR_UID with actual UID
    build:
      args:
        - UID=YOUR_UID  # Same here

Fix ownership of existing files:

sudo chown -R YOUR_UID:YOUR_UID dataset/ models/ .cache/

Out of Memory Errors

Symptoms: Container crashes, training fails with OOM

Solutions:

Add memory limits to docker-compose.yaml:

services:
  kohya-ss-gui:
    deploy:
      resources:
        limits:
          memory: 32G  # Adjust based on your system

Reduce batch size in training parameters
Use gradient checkpointing
Enable CPU offloading in training settings

Container Won't Start

Symptoms: Container exits immediately or shows errors

Solutions:

Check logs:
```
docker compose logs kohya-ss-gui
```

Verify all submodules are cloned:

git submodule update --init --recursive

Remove old containers and images:

docker compose down
docker system prune -a
docker compose up -d --build

File Picker Not Working

Note: This is a known limitation of the Docker setup.

Workaround: Manually type the full path instead of using the file picker. Paths should be relative to /app or /dataset:

Examples:

Training images: /dataset/images/my_dataset
Model output: /dataset/outputs/my_model
Pretrained model: /app/models/sd_xl_base_1.0.safetensors

TensorBoard Not Accessible

Symptoms: Cannot access TensorBoard at localhost:6006

Solutions:

Check if container is running:
```
docker compose ps
```
Verify logs are being written:
```
ls -la dataset/logs/
```

Check port conflicts:

# Linux/macOS
sudo lsof -i :6006

# Windows PowerShell
netstat -ano | findstr :6006

Change port in .env file if needed:

echo "TENSORBOARD_PORT=6007" > .env
docker compose down && docker compose up -d

Advanced Configuration

Custom CUDA Version

If you need a different CUDA version, modify the Dockerfile:

# Line 39-40
ENV CUDA_VERSION=12.8
ENV NVIDIA_REQUIRE_CUDA=cuda>=12.8

# Line 61
ENV UV_INDEX=https://download.pytorch.org/whl/cu128

Resource Limits

Add resource limits to prevent container from consuming all system resources:

# docker-compose.yaml
services:
  kohya-ss-gui:
    deploy:
      resources:
        limits:
          cpus: '8'
          memory: 32G
        reservations:
          cpus: '4'
          memory: 16G
          devices:
            - driver: nvidia
              capabilities: [gpu]
              device_ids: ["0"]  # Specific GPU

Multiple GPU Setup

To use specific GPUs:

# Use GPU 0 and 1
device_ids: ["0", "1"]

# Use all GPUs
device_ids: ["all"]

In the container, you can also use CUDA_VISIBLE_DEVICES:

environment:
  CUDA_VISIBLE_DEVICES: "0,1"

Restart Policies

Add automatic restart on failure:

services:
  kohya-ss-gui:
    restart: unless-stopped
  tensorboard:
    restart: unless-stopped

Using Different Base Images

For development or debugging, you can switch base images:

# Use full CUDA toolkit instead of minimal
FROM docker.io/nvidia/cuda:12.8.0-devel-ubuntu22.04 AS base

Docker Design Philosophy

This Docker setup follows these principles:

Disposable Containers: Containers can be destroyed and recreated at any time. All important data is stored in mounted volumes.
Data Separation: Training data, models, and outputs are kept outside the container in the dataset/ directory.
No Built-in File Picker: Due to container isolation, the GUI file picker is disabled. Use manual path entry instead.
Separate TensorBoard: TensorBoard runs in its own container for better resource isolation and easier updates.
Minimal Image Size: Only essential CUDA libraries are included to reduce image size from ~8GB to ~3GB.

Cloud Alternatives

If Docker on your local machine isn't suitable:

RunPod: See docs/installation_runpod.md
Novita: See docs/installation_novita.md
Colab: See README.md for free cloud-based option

Community Docker Builds

Alternative Docker implementations with different features:

P2Enjoy's Linux-optimized build: https://github.com/P2Enjoy/kohya_ss-docker
- Fewer limitations on Linux
- Different architecture
Ashley Kleynhans' RunPod templates:
- Standalone: https://github.com/ashleykleynhans/kohya-docker
- With Auto1111: https://github.com/ashleykleynhans/stable-diffusion-docker

Getting Help

If you encounter issues:

Check this troubleshooting guide
Review container logs: docker compose logs
Search existing issues: https://github.com/bmaltais/kohya_ss/issues
Open a new issue with:
- Your OS and Docker version
- Complete error logs
- Steps to reproduce

Performance Tips

Use SSD storage for dataset and model directories
Increase Docker memory limit in Docker Desktop settings (Windows/macOS)
Use tmpfs for temporary files (already configured in docker-compose.yaml)
Enable BuildKit for faster builds:
```
export DOCKER_BUILDKIT=1
```
Use pillow-simd (automatically enabled on x86_64 in Dockerfile)

Security Notes

The container runs as a non-root user (UID 1000 by default)
Only necessary ports are exposed
Sensitive data should not be included in the image build
Use .dockerignore to exclude credentials and secrets
Keep base images updated for security patches

12 KiB Raw Blame History

Docker Setup Guide for Kohya_ss

Table of Contents

Prerequisites

System Requirements

Required Software

Windows

Linux

macOS

Quick Start

Using Pre-built Images (Recommended)

Building Locally

Configuration

Environment Variables

User ID Configuration

Volume Mounts

Directory Setup

Usage

Starting the Services

Stopping the Services

Updating

Viewing Logs

Troubleshooting

GPU Not Detected

Permission Denied Errors

Out of Memory Errors

Container Won't Start

File Picker Not Working

TensorBoard Not Accessible

Advanced Configuration

Custom CUDA Version

Resource Limits

Multiple GPU Setup

Restart Policies

Using Different Base Images

Docker Design Philosophy

Cloud Alternatives

Community Docker Builds

Getting Help

Performance Tips

Security Notes

12 KiB

Raw Blame History