update

2025-10-23 15:10:47 +09:00 · 2025-10-23 15:10:47 +09:00 · 579a1e57bd
parent 4161d1d80a
commit 579a1e57bd
10 changed files with 1221 additions and 0 deletions
--- a/config_files/config-5080.json
+++ b/config_files/config-5080.json
@ -0,0 +1,42 @@
+{
+    "general": {
+      "shuffle_caption": true,
+      "caption_extension": ".txt",
+      "keep_tokens": 1,
+      "seed": 1234
+    },
+    "model": {
+      "pretrained_model_name_or_path": "stabilityai/stable-diffusion-xl-base-1.0",
+      "vae": "stabilityai/sd-vae-ft-mse"
+    },
+    "training": {
+      "resolution": "768,768",
+      "batch_size": 1,
+      "learning_rate": 0.00015,
+      "lr_scheduler": "cosine_with_restarts",
+      "max_train_steps": 4000,
+      "optimizer": "adamw8bit",
+      "mixed_precision": "fp16",
+      "gradient_checkpointing": true,
+      "clip_skip": 2,
+      "network_dim": 32,
+      "network_alpha": 16,
+      "save_precision": "fp16",
+      "save_every_n_steps": 1000
+    },
+    "folders": {
+      "train_data_dir": "./data/train",
+      "reg_data_dir": "./data/reg",
+      "output_dir": "./output_5080",
+      "logging_dir": "./logs_5080"
+    },
+    "advanced": {
+      "bucket_reso_steps": 64,
+      "bucket_no_upscale": true,
+      "xformers": true,
+      "cache_latents": true,
+      "min_bucket_reso": 320,
+      "max_bucket_reso": 768
+    }
+  }
+  
--- a/config_files/config-5090.json
+++ b/config_files/config-5090.json
@ -0,0 +1,41 @@
+{
+  "general": {
+    "shuffle_caption": true,
+    "caption_extension": ".txt",
+    "keep_tokens": 1,
+    "seed": 42
+  },
+  "model": {
+    "pretrained_model_name_or_path": "stabilityai/stable-diffusion-3.5",
+    "vae": "stabilityai/sd-vae-ft-mse"
+  },
+  "training": {
+    "resolution": "1024,1024",
+    "batch_size": 2,
+    "learning_rate": 0.0001,
+    "lr_scheduler": "cosine_with_restarts",
+    "max_train_steps": 6000,
+    "optimizer": "adamw8bit",
+    "mixed_precision": "bf16",
+    "gradient_checkpointing": false,
+    "clip_skip": 2,
+    "network_dim": 64,
+    "network_alpha": 32,
+    "save_precision": "bf16",
+    "save_every_n_steps": 1000
+  },
+  "folders": {
+    "train_data_dir": "./data/train",
+    "reg_data_dir": "./data/reg",
+    "output_dir": "./output_5090",
+    "logging_dir": "./logs_5090"
+  },
+  "advanced": {
+    "bucket_reso_steps": 64,
+    "bucket_no_upscale": true,
+    "xformers": true,
+    "cache_latents": true,
+    "min_bucket_reso": 512,
+    "max_bucket_reso": 1024
+  }
+}
--- a/epoch와-steps설정.md
+++ b/epoch와-steps설정.md
@ -0,0 +1,180 @@
+Max train epoch
+training epochs (overrides max_train_steps). 0 = no override
+```
+
+### **해석:**
+- **"Max train epoch"**: 최대 학습 에포크 수
+- **"overrides max_train_steps"**: 이 값을 설정하면 max_train_steps를 **무시함**
+- **"0 = no override"**: `0`으로 설정하면 max_train_steps를 **따름**
+
+---
+
+## 🎯 사용 방법
+
+### **케이스 1: Epoch 기준으로 학습** ⭐ 일반적
+```
+Max train epoch: 10
+Max train steps: 0 (또는 비워둠)
+```
+**결과:** 10 에포크 학습
+
+---
+
+### **케이스 2: Steps 기준으로 학습**
+```
+Max train epoch: 0
+Max train steps: 2000
+```
+**결과:** 2000 스텝 학습
+
+---
+
+### **케이스 3: 둘 다 설정 (Epoch 우선!)**
+```
+Max train epoch: 10
+Max train steps: 5000
+```
+**결과:** 10 에포크만 학습 (max_train_steps **무시됨**)
+
+---
+
+## 🔍 우선순위 정리
+```
+Max train epoch > 0  →  이것만 사용 (steps 무시)
+Max train epoch = 0  →  max_train_steps 사용
+```
+
+---
+
+## 💡 실전 설정
+
+### **일반적인 LoRA 학습:**
+```
+Max train epoch: 10           ← 여기만 설정
+Max train steps: 0            ← 0 또는 비워둠
+Save every N epochs: 1
+```
+
+### **정밀한 스텝 컨트롤이 필요할 때:**
+```
+Max train epoch: 0            ← 0으로 설정
+Max train steps: 2500         ← 여기 설정
+Save every N steps: 500
+```
+
+---
+
+## 📊 예시 계산
+
+### **50장, 4회 반복 기준:**
+
+#### **설정 A: Epoch 우선**
+```
+Max train epoch: 10
+Max train steps: 999999  ← 아무리 커도 무시됨
+```
+**실제 학습:** 50 × 4 × 10 = **2000 스텝**
+
+#### **설정 B: Steps 우선**
+```
+Max train epoch: 0
+Max train steps: 1500
+```
+**실제 학습:** **1500 스텝** (7.5 에포크)
+
+---
+
+## ⚠️ 흔한 실수
+
+### ❌ **틀린 설정:**
+```
+Max train epoch: 10
+Max train steps: 2000
+```
+→ Steps 값이 **무시됨!** (Epoch만 적용)
+
+### ✅ **올바른 설정:**
+
+**Epoch 쓰고 싶으면:**
+```
+Max train epoch: 10
+Max train steps: 0
+```
+
+**Steps 쓰고 싶으면:**
+```
+Max train epoch: 0
+Max train steps: 2000
+```
+
+---
+
+## 🎯 **최종 답변**
+
+### **같은 값 넣으면 되나요?**
+❌ **아니요!**
+
+### **어떻게 설정해야 하나요?**
+
+#### **대부분의 경우 (권장):**
+```
+Max train epoch: 10     ← 원하는 에포크 수
+Max train steps: 0      ← 0으로!
+```
+
+#### **스텝 수를 정확히 지정하고 싶으면:**
+```
+Max train epoch: 0      ← 0으로!
+Max train steps: 2500   ← 원하는 스텝 수
+
+
+----------------
+
+
+
+총 스텝 = 이미지 수 × 반복 횟수 × 에포크 수
+
+3000 = 100 × 2 × 에포크
+3000 = 200 × 에포크
+에포크 = 3000 ÷ 200
+에포크 = 15
+```
+
+---
+
+## ✅ 답: **15 에포크**
+
+### **설정:**
+```
+폴더명: 2_character_name
+이미지: 100장
+Max train epoch: 15
+Max train steps: 0
+```
+
+### **결과:**
+```
+1 에포크 = 100 × 2 = 200 스텝
+15 에포크 = 200 × 15 = 3000 스텝 ✅
+
+✅ 고정 Seed (추천!)
+시드: Seed: 42  (또는 1234, 777 등 아무 숫자)
+
+**이유:**
+
+### ✅ **고정 Seed (추천!)**
+```
+Seed: 42
+```
+**장점:**
+- **재현성** - 똑같은 결과 재생산 가능
+- **실험 비교** - 다른 하이퍼파라미터 테스트 시 공정한 비교
+- **디버깅** - 문제 발생 시 재현 가능
+- **협업** - 다른 사람도 같은 결과 얻을 수 있음
+
+**사용 케이스:**
+- 대부분의 경우 ✅
+- 하이퍼파라미터 튜닝
+- 안정적인 학습 원함
+
+---
--- a/generate-captions-standalone.py
+++ b/generate-captions-standalone.py
@ -0,0 +1,365 @@
+"""
+독립 실행형 BLIP + WD14 하이브리드 캡션 생성기
+실사 LoRA 학습을 위한 통합 캡션 생성 스크립트
+
+설치 필요:
+pip install transformers pillow torch torchvision onnxruntime-gpu huggingface_hub
+"""
+
+import os
+import sys
+from pathlib import Path
+from tqdm import tqdm
+import argparse
+from PIL import Image
+import torch
+
+
+# ==============================
+# ⚙️ 설정 (수정 가능)
+# ==============================
+
+class Config:
+    # 데이터셋 경로
+    DATASET_DIRS = [
+        "./dataset/mainchar",  # 메인 캐릭터
+        "./dataset/bg",        # 배경/보조
+    ]
+    
+    # 모델 설정
+    BLIP_MODEL = "Salesforce/blip-image-captioning-large"
+    WD14_MODEL = "SmilingWolf/wd-v1-4-moat-tagger-v2"
+    
+    # WD14 임계값
+    WD14_GENERAL_THRESHOLD = 0.35
+    WD14_CHARACTER_THRESHOLD = 0.85
+    
+    # BLIP 설정
+    BLIP_MAX_LENGTH = 75
+    BLIP_NUM_BEAMS = 1  # 1=greedy, >1=beam search
+    
+    # 제거할 WD14 메타 태그
+    REMOVE_TAGS = [
+        # 메타 태그
+        "1girl", "1boy", "solo", "2girls", "3girls", "multiple girls",
+        "looking at viewer", "facing viewer", "solo focus",
+        # 배경
+        "simple background", "white background", "grey background",
+        "transparent background", "gradient background",
+        # 품질/메타데이터
+        "highres", "absurdres", "lowres", "bad anatomy",
+        "signature", "watermark", "artist name", "dated",
+        "commentary", "username",
+        # Danbooru 메타
+        "rating:safe", "rating:questionable", "rating:explicit",
+        "safe", "questionable", "explicit",
+    ]
+    
+    # 출력 설정
+    OUTPUT_ENCODING = "utf-8"
+    OVERWRITE_EXISTING = False
+    CREATE_BACKUP = True
+    
+    # 디바이스
+    DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+    
+    # 캡션 포맷
+    # "blip_first": BLIP 문장이 먼저
+    # "tags_first": WD14 태그가 먼저
+    CAPTION_FORMAT = "blip_first"
+
+
+# ==============================
+# 🔧 유틸리티 함수
+# ==============================
+
+def normalize_tags(tags_str):
+    """태그 정규화: 소문자, 공백 정리, 중복 제거"""
+    if not tags_str:
+        return []
+    tags = [tag.strip().lower() for tag in tags_str.split(',')]
+    # 중복 제거 (순서 유지)
+    seen = set()
+    unique_tags = []
+    for tag in tags:
+        if tag and tag not in seen:
+            seen.add(tag)
+            unique_tags.append(tag)
+    return unique_tags
+
+
+def remove_unwanted_tags(tags_list, remove_list):
+    """불필요한 태그 제거"""
+    remove_set = set(tag.lower() for tag in remove_list)
+    return [tag for tag in tags_list if tag not in remove_set]
+
+
+def merge_captions(blip_caption, wd14_tags, remove_tags, format_type="blip_first"):
+    """
+    BLIP 캡션과 WD14 태그 병합
+    """
+    # BLIP 정규화
+    blip_normalized = blip_caption.strip().lower() if blip_caption else ""
+    
+    # WD14 태그 정규화 및 필터링
+    wd14_normalized = normalize_tags(wd14_tags)
+    wd14_filtered = remove_unwanted_tags(wd14_normalized, remove_tags)
+    
+    # BLIP 문장의 단어들 (중복 제거용)
+    blip_words = set(blip_normalized.replace(',', ' ').split()) if blip_normalized else set()
+    
+    # WD14에서 BLIP 중복 제거
+    wd14_deduped = []
+    for tag in wd14_filtered:
+        # 단순 중복 체크 (선택적으로 비활성화 가능)
+        tag_words = set(tag.replace('_', ' ').split())
+        if not tag_words.intersection(blip_words):
+            wd14_deduped.append(tag)
+    
+    # 최종 병합
+    if format_type == "blip_first":
+        # BLIP 문장 + WD14 태그
+        if blip_normalized and wd14_deduped:
+            merged = f"{blip_normalized}, {', '.join(wd14_deduped)}"
+        elif blip_normalized:
+            merged = blip_normalized
+        elif wd14_deduped:
+            merged = ', '.join(wd14_deduped)
+        else:
+            merged = ""
+    else:
+        # WD14 태그 + BLIP 문장
+        if wd14_deduped and blip_normalized:
+            merged = f"{', '.join(wd14_deduped)}, {blip_normalized}"
+        elif wd14_deduped:
+            merged = ', '.join(wd14_deduped)
+        elif blip_normalized:
+            merged = blip_normalized
+        else:
+            merged = ""
+    
+    return merged
+
+
+# ==============================
+# 🎨 BLIP 캡션 생성
+# ==============================
+
+class BLIPCaptioner:
+    def __init__(self, model_name, device, max_length=75, num_beams=1):
+        from transformers import BlipProcessor, BlipForConditionalGeneration
+        
+        print(f"  → BLIP 모델 로딩... ({model_name})")
+        self.processor = BlipProcessor.from_pretrained(model_name)
+        self.model = BlipForConditionalGeneration.from_pretrained(model_name).to(device)
+        self.model.eval()
+        self.device = device
+        self.max_length = max_length
+        self.num_beams = num_beams
+    
+    def generate(self, image_path):
+        try:
+            image = Image.open(image_path).convert("RGB")
+            inputs = self.processor(image, return_tensors="pt").to(self.device)
+            
+            with torch.no_grad():
+                outputs = self.model.generate(
+                    **inputs,
+                    max_length=self.max_length,
+                    num_beams=self.num_beams,
+                )
+            
+            caption = self.processor.decode(outputs[0], skip_special_tokens=True)
+            return caption.strip()
+        
+        except Exception as e:
+            print(f"⚠️ BLIP 실패 ({Path(image_path).name}): {e}")
+            return ""
+
+
+# ==============================
+# 🏷️ WD14 태그 생성
+# ==============================
+
+class WD14Tagger:
+    def __init__(self, model_name, device, general_thresh=0.35, character_thresh=0.85):
+        import onnxruntime as ort
+        from huggingface_hub import hf_hub_download
+        
+        print(f"  → WD14 모델 로딩... ({model_name})")
+        
+        # 모델 다운로드
+        model_path = hf_hub_download(model_name, filename="model.onnx")
+        tags_path = hf_hub_download(model_name, filename="selected_tags.csv")
+        
+        # ONNX 세션
+        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if device == "cuda" else ['CPUExecutionProvider']
+        self.session = ort.InferenceSession(model_path, providers=providers)
+        
+        # 태그 로드
+        import pandas as pd
+        self.tags_df = pd.read_csv(tags_path)
+        self.general_thresh = general_thresh
+        self.character_thresh = character_thresh
+    
+    def generate(self, image_path):
+        try:
+            import numpy as np
+            
+            # 이미지 전처리
+            image = Image.open(image_path).convert("RGB")
+            image = image.resize((448, 448))
+            image_array = np.array(image).astype(np.float32) / 255.0
+            image_array = np.expand_dims(image_array, axis=0)
+            
+            # 추론
+            input_name = self.session.get_inputs()[0].name
+            output = self.session.run(None, {input_name: image_array})[0]
+            
+            # 태그 필터링
+            tags = []
+            for i, score in enumerate(output[0]):
+                tag_type = self.tags_df.iloc[i]['category']
+                threshold = self.character_thresh if tag_type == 4 else self.general_thresh
+                
+                if score >= threshold:
+                    tag_name = self.tags_df.iloc[i]['name'].replace('_', ' ')
+                    tags.append(tag_name)
+            
+            return ', '.join(tags)
+        
+        except Exception as e:
+            print(f"⚠️ WD14 실패 ({Path(image_path).name}): {e}")
+            return ""
+
+
+# ==============================
+# 📁 파일 처리
+# ==============================
+
+def get_image_files(directory):
+    """이미지 파일 찾기"""
+    extensions = {'.jpg', '.jpeg', '.png', '.webp', '.bmp'}
+    image_files = []
+    
+    dir_path = Path(directory)
+    for ext in extensions:
+        image_files.extend(dir_path.glob(f"*{ext}"))
+        image_files.extend(dir_path.glob(f"*{ext.upper()}"))
+    
+    return sorted(image_files)
+
+
+def create_backup(caption_path):
+    """백업 생성"""
+    if caption_path.exists():
+        backup_dir = caption_path.parent / "caption_backup"
+        backup_dir.mkdir(exist_ok=True)
+        
+        import shutil
+        backup_path = backup_dir / caption_path.name
+        shutil.copy2(caption_path, backup_path)
+
+
+# ==============================
+# 🚀 메인 프로세스
+# ==============================
+
+def process_directory(directory, blip_captioner, wd14_tagger, config):
+    """디렉토리 처리"""
+    print(f"\n📁 처리 중: {directory}")
+    
+    image_files = get_image_files(directory)
+    
+    if not image_files:
+        print(f"⚠️ 이미지 없음: {directory}")
+        return 0
+    
+    print(f"📸 {len(image_files)}개 이미지 발견")
+    
+    success_count = 0
+    skip_count = 0
+    
+    for image_path in tqdm(image_files, desc="캡션 생성"):
+        caption_path = image_path.with_suffix('.txt')
+        
+        # 기존 파일 확인
+        if caption_path.exists() and not config.OVERWRITE_EXISTING:
+            skip_count += 1
+            continue
+        
+        # 백업
+        if config.CREATE_BACKUP and caption_path.exists():
+            create_backup(caption_path)
+        
+        try:
+            # BLIP 생성
+            blip_caption = blip_captioner.generate(image_path)
+            
+            # WD14 생성
+            wd14_tags = wd14_tagger.generate(image_path)
+            
+            # 병합
+            merged = merge_captions(
+                blip_caption, wd14_tags, 
+                config.REMOVE_TAGS, 
+                config.CAPTION_FORMAT
+            )
+            
+            # 저장
+            if merged:
+                with open(caption_path, 'w', encoding=config.OUTPUT_ENCODING) as f:
+                    f.write(merged)
+                success_count += 1
+            else:
+                print(f"⚠️ 빈 캡션: {image_path.name}")
+        
+        except Exception as e:
+            print(f"❌ 실패 ({image_path.name}): {e}")
+            continue
+    
+    print(f"✅ 완료: {success_count}개 생성, {skip_count}개 스킵")
+    return success_count
+
+
+def main():
+    parser = argparse.ArgumentParser(description="BLIP + WD14 하이브리드 캡션")
+    parser.add_argument("--dirs", nargs="+", help="처리할 디렉토리")
+    parser.add_argument("--overwrite", action="store_true", help="덮어쓰기")
+    parser.add_argument("--device", default=None, help="cuda/cpu")
+    parser.add_argument("--format", choices=["blip_first", "tags_first"], help="캡션 포맷")
+    
+    args = parser.parse_args()
+    
+    config = Config()
+    if args.dirs:
+        config.DATASET_DIRS = args.dirs
+    if args.overwrite:
+        config.OVERWRITE_EXISTING = True
+    if args.device:
+        config.DEVICE = args.device
+    if args.format:
+        config.CAPTION_FORMAT = args.format
+    
+    print("=" * 60)
+    print("🎨 BLIP + WD14 하이브리드 캡션 생성기")
+    print("=" * 60)
+    print(f"📁 대상: {config.DATASET_DIRS}")
+    print(f"💾 덮어쓰기: {config.OVERWRITE_EXISTING}")
+    print(f"🖥️ 디바이스: {config.DEVICE}")
+    print(f"📝 포맷: {config.CAPTION_FORMAT}")
+    print("=" * 60)
+    
+    # 모델 로드
+    print("\n🔄 모델 로딩 중...")
+    
+    try:
+        blip_captioner = BLIPCaptioner(
+            config.BLIP_MODEL,
+            config.DEVICE,
+            config.BLIP_MAX_LENGTH,
+            config.BLIP_NUM_BEAMS
+        )
+        
+        wd14_tagger = WD14Tagger(
+            config.WD14_MODEL,
--- a/generate-captions.cmd
+++ b/generate-captions.cmd
@ -0,0 +1 @@
+python generate-captions.py --dirs ./dataset/mainchar/2_karina --char "karina" --device 3
--- a/generate-captions.py
+++ b/generate-captions.py
@ -0,0 +1,369 @@
+"""
+BLIP + WD14 하이브리드 캡션 생성기
+실사 LoRA 학습을 위한 통합 캡션 생성 스크립트
+
+필요 환경: kohya_ss (sd-scripts)
+"""
+
+import os
+import sys
+from pathlib import Path
+from tqdm import tqdm
+import argparse
+
+# Kohya_ss 모듈 임포트
+try:
+    # BLIP 관련
+    from library.blip.blip import load_blip_model, generate_caption as blip_generate
+    # WD14 관련  
+    from library.train_util import load_model_from_onnx
+    from wd14_tagger import WD14Tagger
+except ImportError:
+    print("❌ Kohya_ss 환경에서 실행해주세요!")
+    print("경로: sd-scripts/ 폴더 안에서 실행")
+    sys.exit(1)
+
+
+# ==============================
+# ⚙️ 설정 (수정 가능)
+# ==============================
+
+class Config:
+    # 데이터셋 경로
+    DATASET_DIRS = [
+        "./dataset/mainchar",  # 메인 캐릭터
+        "./dataset/bg",        # 배경/보조
+    ]
+    
+    # 모델 설정
+    BLIP_MODEL_PATH = "Salesforce/blip-image-captioning-large"
+    WD14_MODEL_PATH = "SmilingWolf/wd-v1-4-moat-tagger-v2"
+    
+    # WD14 임계값
+    WD14_GENERAL_THRESHOLD = 0.35
+    WD14_CHARACTER_THRESHOLD = 0.85
+    
+    # BLIP 설정
+    BLIP_MAX_LENGTH = 75
+    BLIP_NUM_BEAMS = 1  # 1 = greedy, >1 = beam search
+    
+    # 제거할 WD14 메타 태그
+    REMOVE_TAGS = [
+        "1girl", "1boy", "solo", "looking at viewer",
+        "simple background", "white background", "grey background",
+        "highres", "absurdres", "lowres", "bad anatomy",
+        "signature", "watermark", "artist name", "dated",
+        "rating:safe", "rating:questionable", "rating:explicit",
+    ]
+    
+    # 출력 설정
+    OUTPUT_ENCODING = "utf-8"
+    OVERWRITE_EXISTING = False  # True면 기존 캡션 덮어쓰기
+    
+    # 디바이스
+    DEVICE = "cuda"  # 또는 "cpu"
+    
+    # 백업 생성
+    CREATE_BACKUP = True
+
+
+# ==============================
+# 🔧 유틸리티 함수
+# ==============================
+
+def normalize_tags(tags_str):
+    """태그 정규화: 소문자 변환, 공백 정리, 중복 제거"""
+    tags = [tag.strip().lower() for tag in tags_str.split(',')]
+    # 중복 제거 (순서 유지)
+    seen = set()
+    unique_tags = []
+    for tag in tags:
+        if tag and tag not in seen:
+            seen.add(tag)
+            unique_tags.append(tag)
+    return unique_tags
+
+
+def remove_unwanted_tags(tags_list, remove_list):
+    """불필요한 태그 제거"""
+    remove_set = set(tag.lower() for tag in remove_list)
+    return [tag for tag in tags_list if tag not in remove_set]
+
+
+def merge_captions(blip_caption, wd14_tags, remove_tags):
+    """
+    BLIP 캡션과 WD14 태그 병합
+    
+    형식: [BLIP 문장], [WD14 태그들]
+    """
+    # BLIP 정규화
+    blip_normalized = blip_caption.strip().lower()
+    
+    # WD14 태그 정규화 및 필터링
+    wd14_normalized = normalize_tags(wd14_tags)
+    wd14_filtered = remove_unwanted_tags(wd14_normalized, remove_tags)
+    
+    # BLIP 문장의 단어들 추출 (중복 제거용)
+    blip_words = set(blip_normalized.replace(',', ' ').split())
+    
+    # WD14에서 BLIP에 이미 포함된 단어 제거 (선택적)
+    # 예: BLIP "smiling girl" → WD14 "smile" 중복 제거
+    wd14_deduped = []
+    for tag in wd14_filtered:
+        # 태그가 BLIP 문장에 포함되지 않으면 추가
+        if not any(word in tag or tag in word for word in blip_words):
+            wd14_deduped.append(tag)
+    
+    # 최종 병합: BLIP (문장) + WD14 (태그)
+    if wd14_deduped:
+        merged = f"{blip_normalized}, {', '.join(wd14_deduped)}"
+    else:
+        merged = blip_normalized
+    
+    return merged
+
+
+# ==============================
+# 🎨 캡션 생성 함수
+# ==============================
+
+def generate_blip_caption(image_path, model, processor, config):
+    """BLIP으로 자연어 캡션 생성"""
+    from PIL import Image
+    
+    try:
+        image = Image.open(image_path).convert("RGB")
+        
+        inputs = processor(image, return_tensors="pt").to(config.DEVICE)
+        
+        outputs = model.generate(
+            **inputs,
+            max_length=config.BLIP_MAX_LENGTH,
+            num_beams=config.BLIP_NUM_BEAMS,
+        )
+        
+        caption = processor.decode(outputs[0], skip_special_tokens=True)
+        return caption.strip()
+        
+    except Exception as e:
+        print(f"⚠️ BLIP 생성 실패 ({image_path.name}): {e}")
+        return ""
+
+
+def generate_wd14_tags(image_path, tagger, config):
+    """WD14로 태그 생성"""
+    try:
+        tags = tagger.tag(
+            str(image_path),
+            general_threshold=config.WD14_GENERAL_THRESHOLD,
+            character_threshold=config.WD14_CHARACTER_THRESHOLD,
+        )
+        
+        # 태그를 콤마로 연결
+        tag_string = ", ".join(tags)
+        return tag_string
+        
+    except Exception as e:
+        print(f"⚠️ WD14 생성 실패 ({image_path.name}): {e}")
+        return ""
+
+
+# ==============================
+# 📁 파일 처리
+# ==============================
+
+def get_image_files(directory):
+    """디렉토리에서 이미지 파일 찾기"""
+    image_extensions = {'.jpg', '.jpeg', '.png', '.webp', '.bmp'}
+    
+    image_files = []
+    for ext in image_extensions:
+        image_files.extend(Path(directory).glob(f"*{ext}"))
+        image_files.extend(Path(directory).glob(f"*{ext.upper()}"))
+    
+    return sorted(image_files)
+
+
+def create_backup(caption_path):
+    """기존 캡션 파일 백업"""
+    if caption_path.exists():
+        backup_dir = caption_path.parent / "caption_backup"
+        backup_dir.mkdir(exist_ok=True)
+        
+        backup_path = backup_dir / caption_path.name
+        import shutil
+        shutil.copy2(caption_path, backup_path)
+
+
+# ==============================
+# 🚀 메인 프로세스
+# ==============================
+
+def process_directory(directory, blip_model, blip_processor, wd14_tagger, config):
+    """단일 디렉토리 처리"""
+    
+    print(f"\n📁 처리 중: {directory}")
+    
+    # 이미지 파일 찾기
+    image_files = get_image_files(directory)
+    
+    if not image_files:
+        print(f"⚠️ 이미지 파일을 찾을 수 없습니다: {directory}")
+        return 0
+    
+    print(f"📸 {len(image_files)}개 이미지 발견")
+    
+    success_count = 0
+    skip_count = 0
+    
+    for image_path in tqdm(image_files, desc="캡션 생성"):
+        
+        # 캡션 파일 경로
+        caption_path = image_path.with_suffix('.txt')
+        
+        # 기존 파일 존재 확인
+        if caption_path.exists() and not config.OVERWRITE_EXISTING:
+            skip_count += 1
+            continue
+        
+        # 백업 생성
+        if config.CREATE_BACKUP and caption_path.exists():
+            create_backup(caption_path)
+        
+        try:
+            # 1. BLIP 캡션 생성
+            blip_caption = generate_blip_caption(
+                image_path, blip_model, blip_processor, config
+            )
+            
+            # 2. WD14 태그 생성
+            wd14_tags = generate_wd14_tags(image_path, wd14_tagger, config)
+            
+            # 3. 병합
+            merged_caption = merge_captions(
+                blip_caption, wd14_tags, config.REMOVE_TAGS
+            )
+            # 캐릭터명 prefix 추가 (가중치 강조)
+            if getattr(config, "CHARACTER_PREFIX", ""):
+                char_token = config.CHARACTER_PREFIX.strip()
+                # LoRA 학습용 강조 토큰 처리
+                if not char_token.endswith(")"):
+                    char_token = f"{char_token} (1.3)"
+                merged_caption = f"{char_token}, {merged_caption}"
+            
+            # 4. 저장
+            if merged_caption:
+                with open(caption_path, 'w', encoding=config.OUTPUT_ENCODING) as f:
+                    f.write(merged_caption)
+                success_count += 1
+            else:
+                print(f"⚠️ 빈 캡션: {image_path.name}")
+            
+        except Exception as e:
+            print(f"❌ 처리 실패 ({image_path.name}): {e}")
+            continue
+    
+    print(f"✅ 완료: {success_count}개 생성, {skip_count}개 스킵")
+    return success_count
+
+
+def main():
+    parser = argparse.ArgumentParser(description="BLIP + WD14 하이브리드 캡션 생성")
+    parser.add_argument(
+        "--dirs", 
+        nargs="+", 
+        default=Config.DATASET_DIRS,
+        help="처리할 디렉토리 목록"
+    )
+    parser.add_argument(
+        "--overwrite",
+        action="store_true",
+        help="기존 캡션 덮어쓰기"
+    )
+    parser.add_argument(
+        "--device",
+        default=Config.DEVICE,
+        help="디바이스 (cuda/cpu)"
+    )
+    parser.add_argument(
+        "--char",
+        type=str,
+        default="",
+        help="모든 캡션 앞에 붙일 캐릭터명 (예: 'anya character')"
+    )
+
+    args = parser.parse_args()
+    config = Config()
+    config.DATASET_DIRS = args.dirs
+    config.OVERWRITE_EXISTING = args.overwrite
+    config.DEVICE = args.device
+    
+    print("=" * 60)
+    print("🎨 BLIP + WD14 하이브리드 캡션 생성기")
+    print("=" * 60)
+    print(f"📁 대상 디렉토리: {config.DATASET_DIRS}")
+    print(f"💾 덮어쓰기: {config.OVERWRITE_EXISTING}")
+    print(f"🖥️ 디바이스: {config.DEVICE}")
+    print("=" * 60)
+    
+    # 모델 로드
+    print("\n🔄 모델 로딩 중...")
+    
+    try:
+        # BLIP 로드
+        from transformers import BlipProcessor, BlipForConditionalGeneration
+        
+        print("  → BLIP 모델 로딩...")
+        blip_processor = BlipProcessor.from_pretrained(config.BLIP_MODEL_PATH)
+        blip_model = BlipForConditionalGeneration.from_pretrained(
+            config.BLIP_MODEL_PATH
+        ).to(config.DEVICE)
+        blip_model.eval()
+        
+        # WD14 로드
+        print("  → WD14 Tagger 로딩...")
+        wd14_tagger = WD14Tagger(
+            model_dir=config.WD14_MODEL_PATH,
+            device=config.DEVICE
+        )
+        
+        print("✅ 모델 로딩 완료!\n")
+        
+    except Exception as e:
+        print(f"❌ 모델 로딩 실패: {e}")
+        sys.exit(1)
+    
+    # 각 디렉토리 처리
+    total_success = 0
+    
+    for directory in config.DATASET_DIRS:
+        if not Path(directory).exists():
+            print(f"⚠️ 디렉토리 없음: {directory}")
+            continue
+        
+        count = process_directory(
+            directory, blip_model, blip_processor, wd14_tagger, config
+        )
+        total_success += count
+    
+    # 완료 메시지
+    print("\n" + "=" * 60)
+    print(f"🎉 전체 완료!")
+    print(f"✅ 총 {total_success}개 캡션 생성됨")
+    print("=" * 60)
+    
+    # 결과 예시 출력
+    print("\n📝 생성 예시:")
+    for directory in config.DATASET_DIRS:
+        txt_files = list(Path(directory).glob("*.txt"))
+        if txt_files:
+            example_file = txt_files[0]
+            with open(example_file, 'r', encoding='utf-8') as f:
+                content = f.read()
+            print(f"\n{example_file.name}:")
+            print(f"  {content[:100]}...")
+            break
+
+
+if __name__ == "__main__":
+    main()
--- a/run-train-extd.cmd
+++ b/run-train-extd.cmd
@ -0,0 +1,31 @@
+$env:CUDA_VISIBLE_DEVICES = "1"
+
+accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 ^
+  sdxl_train_network.py ^
+  --pretrained_model_name_or_path="./models/stable-diffusion-xl-base-1.0" ^
+  --train_data_dir="./train_data" ^
+  --output_dir="./output_model" ^
+  --logging_dir="./logs" ^
+  --output_name="karina" ^
+  --network_module=networks.lora ^
+  --network_dim=32 ^
+  --network_alpha=16 ^
+  --learning_rate=1e-4 ^
+  --optimizer_type="AdamW8bit" ^
+  --lr_scheduler="cosine" ^
+  --lr_warmup_steps=100 ^
+  --max_train_epochs=15 ^
+  --save_every_n_epochs=1 ^
+  --mixed_precision="bf16" ^
+  --save_precision="bf16" ^
+  --cache_latents ^
+  --cache_latents_to_disk ^
+  --cache_text_encoder_outputs ^
+  --gradient_checkpointing ^
+  --xformers ^
+  --seed=42 ^
+  --bucket_no_upscale ^
+  --min_bucket_reso=512 ^
+  --max_bucket_reso=2048 ^
+  --bucket_reso_steps=64 ^
+  --resolution="1024,1024"
--- a/run-train.cmd
+++ b/run-train.cmd
@ -0,0 +1 @@
+accelerate launch --num_cpu_threads_per_process 8 train_network.py --config_file=config_5080.json
--- a/run-venv.cmd
+++ b/run-venv.cmd
@ -0,0 +1 @@
+venv/Scripts/activate
--- a/로컬-설치가이드.md
+++ b/로컬-설치가이드.md
@ -0,0 +1,190 @@
+CUDA 12.4
+CUDNN 9.1.0
+
+# 1. 클론
+git clone https://github.com/kohya-ss/sd-scripts.git
+cd sd-scripts
+
+# 2. 가상환경 생성
+python -m venv venv
+.\venv\Scripts\activate  # Windows
+# 또는
+source venv/bin/activate  # Linux/Mac
+
+# 3. PyTorch 설치 (CUDA 11.8 기준)
+pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118
+
+# 4. 의존성 설치
+pip install --upgrade -r requirements.txt
+pip install xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118
+
+# 5. Accelerate 설정
+accelerate config
+
+질문에 답변:
+
+컴퓨팅 환경: This machine
+머신 타입: No distributed training
+CPU only?: NO
+torch dynamo?: NO
+DeepSpeed?: NO
+GPU ids: all (또는 0)
+Mixed precision: 8GB VRAM의 경우 fp16, 12GB 이상의 경우 bf16
+
+
+SDXL(Stable Diffusion XL) 모델 아키텍처는 여러 개의 대용량 파일로 구성되며 두 가지 주요 구성 요소는 기본 모델과 선택적 정제 모델입니다..safetensors. 모델 파일 크기는 사용된 파일 형식(예: 또는 .ckpt) 에 따라 달라질 수 있습니다 . 
+공식 SDXL 1.0 파일
+공식 SDXL 1.0 릴리스의 일반적인 파일 크기는 다음과 같습니다. 
+기본 모델: 약 6.94GB. 이 모델은 텍스트 프롬프트에서 초기 이미지를 생성하는 데 사용됩니다.
+리파이너 모델: 약 6.08GB. 리파이너는 기본 모델에서 생성된 이미지에 세부적인 정보를 추가하고 품질을 개선하는 두 번째 단계로 사용됩니다.
+전체 파이프라인의 총 크기: 기본 모델과 정제 모델의 결합된 크기는 약 13GB이지만, 많은 사용자는 이제 더 빠른 워크플로를 위해 기본 모델을 주로 사용합니다. 
+
+
+3. 이미지 크기 조정 및 버킷화
+최적 해상도: 훈련하려는 기본 모델에 따라 권장 해상도가 다릅니다.
+Stable Diffusion 1.5: 512x512, 512x768, 768x512 등
+SDXL: 1024x1024
+버킷화 (Bucketing): Kohya_ss는 여러 다른 해상도의 이미지를 효율적으로 처리하는 '버킷' 기능을 제공합니다.
+GUI에서 사전 처리 탭 선택: Kohya GUI의 Utilities 탭에서 Prepare training data를 선택합니다.
+폴더 설정: Source directory에 원본 이미지 폴더를, Destination directory에 처리된 이미지를 저장할 폴더를 지정합니다.
+최소/최대 해상도 설정: Min resolution과 Max resolution을 설정하고, Use buckets를 활성화합니다.
+자동 크기 조정: Process images를 실행하면 이미지가 지정된 버킷 해상도에 맞게 자동으로 리사이즈되고 크롭됩니다. 
+
+
+💡 추가 팁
+최대 3000 스텝 또는 30 에포크 권장 GitHub
+50장의 학습 이미지와 4회 반복 권장 GitHub
+VRAM별 배치 크기:
+
+8GB: batch_size 1-2 GitHub
+12GB: batch_size 2-3 GitHub
+더 높은 VRAM: batch_size 5+ GitHub
+
+
+## 💡 반복 횟수 선택 가이드
+
+| 이미지 수 | 권장 반복 | 폴더명 예시 | 10 에포크 시 총 스텝 |
+|----------|---------|------------|-------------------|
+| 10장 | 10회 | `10_character` | 1000 스텝 |
+| 20장 | 5회 | `5_character` | 1000 스텝 |
+| 50장 | 4회 | `4_character` | 2000 스텝 |
+| 100장 | 2회 | `2_character` | 2000 스텝 |
+| 200장 | 1회 | `1_character` | 2000 스텝 |
+
+**목표:** 총 스텝이 **1500~3000** 정도가 되도록 조절
+
+
+
+고품질 (일관된 스타일/포즈/의상):
+→ 4회 반복으로 충분
+
+저품질 (다양한 각도/의상/배경):
+→ 10~20회 필요할 수도
+```
+
+### **2. 목표가 다름**
+```
+Gemini 기준: 캐릭터 얼굴/특징 확실히 학습
+Kohya 문서: 과적합(overfitting) 방지
+```
+
+### **3. 총 스텝 수 계산 방식**
+```
+50장 × 4회 × 10 에포크 = 2000 스텝
+50장 × 10회 × 5 에포크 = 2500 스텝
+50장 × 20회 × 2 에포크 = 2000 스텝
+```
+결국 **총 스텝이 비슷**하면 결과도 비슷해요!
+
+---
+
+## 📊 실전 테스트 결과 (커뮤니티 경험)
+
+| 반복 횟수 | 이미지 수 | 에포크 | 총 스텝 | 결과 |
+|----------|----------|-------|---------|------|
+| 4회 | 50장 | 10 | 2000 | ⭐⭐⭐⭐ 균형 잡힘 |
+| 10회 | 50장 | 5 | 2500 | ⭐⭐⭐⭐⭐ 강한 학습 |
+| 20회 | 50장 | 3 | 3000 | ⚠️ 과적합 위험 |
+| 2회 | 100장 | 10 | 2000 | ⭐⭐⭐⭐ 자연스러움 |
+
+---
+
+## 🎯 실전 가이드
+
+### **상황별 권장 설정**
+
+#### **📷 고품질 데이터셋 (일관된 캐릭터/스타일)**
+```
+이미지: 50장
+반복: 4~6회
+에포크: 8~10
+총 스텝: 1600~3000
+
+예: 4_character_name
+```
+
+#### **🎨 다양한 데이터셋 (여러 포즈/의상/배경)**
+```
+이미지: 50장
+반복: 8~12회
+에포크: 5~8
+총 스텝: 2000~4800
+
+예: 10_character_name
+```
+
+#### **⚡ 빠른 테스트 (품질 확인용)**
+```
+이미지: 30장
+반복: 5회
+에포크: 5
+총 스텝: 750
+
+예: 5_test_character
+```
+
+#### **🏆 프로덕션 품질 (상업용/고품질)**
+```
+이미지: 100장+
+반복: 3~5회
+에포크: 10~15
+총 스텝: 3000~7500
+
+예: 3_character_name
+```
+
+---
+
+## 💡 **내 추천: 점진적 테스트**
+
+### **1단계: 낮게 시작**
+```
+폴더: 4_character
+에포크: 5
+→ 1000 스텝에서 결과 확인
+```
+
+### **2단계: 필요시 증가**
+```
+만족스럽지 않으면:
+폴더: 8_character
+에포크: 5
+→ 2000 스텝에서 재확인
+```
+
+### **3단계: 최적점 찾기**
+```
+과적합 보이면: 반복 횟수 줄이기
+언더피팅이면: 반복 횟수 늘리기
+```
+
+---
+
+## 🔬 과학적(?) 접근
+
+### **총 스텝 기준 가이드:**
+```
+1000~1500 스텝: 가벼운 스타일 LoRA
+2000~3000 스텝: 캐릭터 LoRA (일반)
+3000~5000 스텝: 디테일한 캐릭터
+5000+ 스텝: 복잡한 컨셉/다중 캐릭터
				`@ -0,0 +1 @@`
				`python generate-captions.py --dirs ./dataset/mainchar/2_karina --char "karina" --device 3`
				`@ -0,0 +1 @@`
				`accelerate launch --num_cpu_threads_per_process 8 train_network.py --config_file=config_5080.json`