- Remove _get_device_dtype() indirection, inline device/dtype at call sites
- Remove commented-out fallback blocks and try/finally wrappers
- Add modules/sharpfin to ruff and pylint excludes in pyproject.toml
- Fix import ordering in joytag.py and pixelart.py
Changes based on vladmandic and Disty0 feedback:
- Fix logging: use direct `from installer import log` instead of lazy _get_log()
- Remove unused is_available() function
- Remove defensive getattr() calls in _resolve_kernel/_resolve_linearize
- Simplify _get_device_dtype() to use devices module directly
- Refactor to_pil() with single Image.fromarray() call and explicit mode
- Add cross-platform fallback: sharpfin only runs on CUDA, falls back to
PIL/F.interpolate for other devices (CPU, MPS, OpenVINO)
- Replace lambdas with functools.partial in functional.py for torch.compile safety
- Add modules/sharpfin to pylint ignore-paths (vendored code)
- Remove superfluous SimpleNamespace import in cli/api-caption.py, use Map instead
- Drop _ prefix from internal helper functions in modules/api/caption.py
- Move DeepDanbooru model path to top-level models folder instead of nesting under CLIP
- Rename shadowing import in waifudiffusion batch to avoid F823/E0606
- Fix import order in cli/api-caption.py (stdlib before third-party)
- Rename local variable shadowing function name in cli/api-caption.py
- Remove unnecessary global statement in devices.bypass_sdpa_hijacks
- Add _load_blip_model helper with explicit cache_dir so downloads
go to hfcache_dir instead of default HF cache
- Pre-load BLIP model/processor before creating Interrogator config
to control download location and avoid redundant loads
- Set clip_model_path on config for CLIP model cache location
- Add cache_dir to Moondream model and tokenizer loading
Move all caption/interrogate/tagger/VQA API code out of the monolithic
endpoints.py and models.py into a new self-contained modules/api/caption.py,
following the loras.py / nudenet.py self-registering pattern.
- Move 15 Pydantic models (ReqCaption, ResCaption, ReqVQA, ResVQA,
ReqTagger, ResTagger, dispatch union types, etc.) from models.py
- Move 11 handler functions from endpoints.py
- Deduplicate ~150 lines via shared _do_openclip, _do_tagger, _do_vqa
core functions called by both direct and dispatch endpoints
- Add register_api() that registers all 8 caption routes
- Add promptgen field to ResVLMPrompts (bug fix: handler returned it
but response model silently dropped it)
- Improve all endpoint docstrings and Field descriptions for API docs
- Add use_safetensors=True to all 16 model from_pretrained calls to
avoid downloading redundant .bin files alongside safetensors
- Add device property to JoyTag VisionModel so move_model can relocate
it to CUDA (fixes 'ViT object has no attribute device')
- Fix Pix2Struct dtype mismatch by casting float inputs to model dtype
while preserving integer tensor types
- Patch AutoConfig.register with exist_ok=True during Ovis loading to
handle duplicate aimv2 registration on model reload
- Detect Qwen VL fine-tune architecture from config model_type instead
of repo name, fixing ToriiGate and similar third-party fine-tunes
- Change UI default task from Short Caption to Normal Caption, and
preserve it on model switch instead of resetting to Use Prompt
- Add dual-prefill testing across 5 VQA test methods using a shared
_check_prefill helper
- Fix pre-existing ruff W605 in strip_think_xml_tags docstring
update_caption_params() was setting caption_max_length, chunk_size, and
flavor_intermediate_count on the Interrogator instance, but the library
reads them from self.config. The overrides were silently ignored.
- Add parse_florence_detections() and format_florence_response() to
vqa_detection for handling Florence-2 detection output formats
- Add bypass_sdpa_hijacks() context manager to devices.py for models
incompatible with SageAttention or other SDPA hijacks
- Add OpenCLIP model offload support when caption_offload is enabled
- Remove caption_openclip_min_length from settings, API models, endpoints, and UI
(clip_interrogator library has no min_length support; parameter was never functional)
- Split vlm_prompts_florence into base Florence prompts and PromptGen-only prompts
(GENERATE_TAGS, Analyze, Mixed Caption require MiaoshouAI PromptGen fine-tune)
- Add 'promptgen' category to /vqa/prompts API endpoint
- Fix gaze detection: move DETECT_GAZE check before generic 'detect ' prefix
to prevent "Detect Gaze" matching as detect target="Gaze"
- Update test suite: remove min_length tests, fix min_flavors to use mode='best',
add acceptance-only notes, fix thinking trace detection, improve bracket/OCR tests,
split Florence/PromptGen test coverage
Move all caption-related modules from modules/interrogate/ to modules/caption/
for better naming consistency:
- Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger,
vqa, vqa_detection, waifudiffusion modules
- Add new caption.py dispatcher module
- Remove old interrogate.py (functionality moved to caption.py)
- Update cli/api-interrogate.py to use /sdapi/v1/tagger for DeepBooru
- Handle tagger response format (scores dict or tags string)
- Remove DeepBooru test from interrogate endpoint tests
- Update API model descriptions to reference tagger for anime tagging
DeepBooru/DeepDanbooru should only be accessed via the tagger endpoint.
The interrogate endpoint is now exclusively for OpenCLIP/BLIP.
- Remove DeepDanbooru handling from post_interrogate
- Update docstring to reference tagger endpoint for anime tagging
- Simplify code by removing if/else branching
Add prompt field to VQA endpoint and advanced settings to OpenCLIP endpoint
to achieve full parity between UI and API capabilities.
VLM endpoint changes:
- Add prompt field for custom text input (required for 'Use Prompt' task)
- Pass prompt to vqa.interrogate instead of hardcoded empty string
OpenCLIP endpoint changes:
- Add 7 optional per-request override fields: min_length, max_length,
chunk_size, min_flavors, max_flavors, flavor_count, num_beams
- Add get_clip_setting() helper for override support in openclip.py
- Apply overrides via update_interrogate_params() before interrogation
All new fields are optional with None defaults for backwards compatibility.
Update API model field descriptions to match the hints in locale_en.json
for consistency between UI and API documentation.
Updated models:
- ReqInterrogate: clip_model, blip_model, mode
- ReqVQA: model, question, system
- ReqTagger: model, threshold, character_threshold, max_tags,
include_rating, sort_alpha, use_spaces, escape_brackets,
exclude_tags, show_scores
Add comprehensive caption/interrogate API with documentation:
- GET /sdapi/v1/interrogate: List available interrogation models
- POST /sdapi/v1/interrogate: Interrogate with OpenCLIP/BLIP/DeepDanbooru
- POST /sdapi/v1/vqa: Caption with Vision-Language Models (VLM)
- GET /sdapi/v1/vqa: List available VLM models
- POST /sdapi/v1/vqa/batch: Batch caption multiple images
- POST /sdapi/v1/tagger: Tag images with WaifuDiffusion/DeepBooru
Updates:
- Add detailed docstrings with usage examples
- Fix analyze_image response parsing for Gradio update dicts
- Add request/response models for all endpoints
Add comprehensive tooltips to Caption tab UI elements in locale_en.json:
- Add new "llm" section for shared LLM/VLM parameters:
System prompt, Prefill, Top-K, Top-P, Temperature, Num Beams,
Use Samplers, Thinking Mode, Keep Thinking Trace, Keep Prefill
- Add new "caption" section for caption-specific settings:
VLM, OpenCLiP, Tagger tab labels and all their parameters
including thresholds, tag formatting, batch options
- Consolidate accordion labels in ui_caption.py:
"Caption: Advanced Options" and "Caption: Batch" shared across
VLM, OpenCLiP, and Tagger tabs (localized to "Advanced Options"
and "Batch" in UI)
- Remove duplicate entries from missing section
- Remove GFPGAN pip install from installer.py optional requirements
- Remove 'gfpgan' from modules_to_remove cleanup list in launch.py
- Remove --codeformer-models-path and --gfpgan-models-path CLI args
- Remove GFPGAN model directory migration from modelloader.py
- Remove codeformer, restoreformer, GFPGANv1.4, and GPEN-BFR ONNX
model URLs from the predefined list
- Remove the .fp16 ONNX restorer code path that bypassed detailer
processing to run face restoration directly
- Remove /sdapi/v1/face-restorers route from api.py
- Remove get_restorers() function from endpoints.py
- Remove gfpgan_visibility, codeformer_visibility, codeformer_weight
fields from ReqProcess model
- Remove GFPGAN and CodeFormer entries from run_extras() signature
and create_args_for_run dict in postprocessing.py
- Remove CodeFormer/GFPGAN import and setup from webui.py initialize()
- Remove face_restorers list, codeformer/gfpgan model path settings,
and face restore UI settings section from shared.py
- Remove restore_faces parameter from StableDiffusionProcessing
- Remove face_restoration import and restore_faces processing block
from processing.py
Remove all vendored face restoration code that is no longer maintained:
- modules/postprocess/codeformer_model.py, codeformer_arch.py, vqgan_arch.py
- modules/postprocess/gfpgan_model.py, restorer.py
- modules/face_restoration.py (base class and dispatcher)
- scripts/postprocessing_codeformer.py, postprocessing_gfpgan.py
- modules/facelib/ (vendored face detection/parsing library)
These were the only two backends registered in shared.face_restorers,
making the entire face restoration infrastructure dead code.
Nunchaku's SDXL UNet does not support offloading and raises
NotImplementedError when offload=True is passed. Skip the parameter
for SDXL and log a warning instead of crashing.
Filter out reference entries tagged "nunchaku" from Extra Networks
when the active backend is not CUDA, since Nunchaku requires NVIDIA
GPUs. Entries remain in shared.reference_models for programmatic
lookup but are not yielded to the UI.
- Rename HuggingFace org from nunchaku-tech to nunchaku-ai across all
nunchaku model repos (flux, sdxl, sana, z-image, qwen, t5)
- Add per-torch-version nunchaku version mapping instead of single global
version, with robust torch version parsing
- Add 'Fill (Nunchaku)' and 'Depth (Nunchaku)' options to Flux Tools
dropdown, loading models with +nunchaku suffix for SVDQuant quantization
- Mark Fill and Depth nunchaku reference entries as hidden so they remain
available for check_nunchaku() lookup but don't appear in Extra Networks
- Filter hidden reference models in ui_extra_networks_checkpoints
Replace manual Model/TE checkboxes in Quantization Settings with a
dedicated "Nunchaku" tab in the Extra Networks menu where users can
directly select nunchaku-quantized model variants. Detection is now
using a +nunchaku path marker for disambiguation.
- Relax sd_detect to match 'anima' without requiring 'cosmos' in name
- Use hf_hub_download for custom pipeline.py and adapter modules
- Register custom modules in sys.modules for Diffusers trust_remote_code
- Pass trust_remote_code=True to from_pretrained
- Map AnimaTextToImage to 'cosmos' model type for TAESD preview support
Anima replaces the Cosmos T5-11B text encoder with Qwen3-0.6B + a
6-layer LLM adapter and uses CONST preconditioning instead of EDM.
- Add pipelines/model_anima.py loader with dynamic import of custom
AnimaTextToImagePipeline and AnimaLLMAdapter from model repo
- Register 'Anima' pipeline in shared_items.py
- Add name-based detection in sd_detect.py
- Fix list-format _class_name handling in guess_by_diffusers()
- Wire loader in sd_models.py load_diffuser_force()
- Skip noise_pred callback injection for Anima (uses velocity instead)
- Add output_type='np' override in processing_args.py
Hide all CLiP, VLM, and Tagger settings from Settings > Interrogate page
while keeping them in shared.opts for persistence. Caption Tab UI becomes
the single control point with change handlers that save directly to config.
Changes:
- Hide OpenCLiP, VLM, and Tagger settings with visible=False
- Add change handlers to save settings when UI controls change
- Rename "Booru Tags" tab to "Tagger", update choice labels
- Update interrogate.py to use unified tagger interface with all settings
Add DeepBooru as a model option alongside WD14 models in the Booru Tags
tab, with dynamic UI that disables inapplicable controls.
Changes:
- Create modules/interrogate/tagger.py as unified adapter module
- Add batch, load/unload, get_models functions to deepbooru.py
- Update ui_caption.py to use unified tagger interface
- Consolidate shared tagger settings in shared.py
- Add implementation plan for future settings consolidation
UI behavior:
- Model dropdown shows DeepBooru + all WD14 models
- Character threshold and include rating disabled for DeepBooru
- All controls re-enable when WD14 model selected
Add SmilingWolf's WD14/WaifuDiffusion tagger models for anime/illustration
tagging as a new "Booru Tags" tab in the Caption panel.
- Support 9 models (v2 and v3 variants) via HuggingFace
- ONNX backend chosen due to safetensors v3 variants exhibiting
unacceptable accuracy loss
- Separate thresholds for general/character tags
- Batch processing with progress bar
- Consolidate debug env var to SD_INTERROGATE_DEBUG
The "Restore from metadata: skip params" setting previously required
exact metadata parameter names (e.g., "Batch-2" instead of "batch_size").
This was confusing because metadata names differ from Python variables
and UI labels.
Changes:
- Auto-populate param_aliases from component labels and elem_ids
- Expand user input with aliases in should_skip()
- Support normalized names so "Batch" skips both "Batch-1" and "Batch-2"
Users can now enter any of these formats (case-insensitive):
- Python variable names: batch_size, cfg_scale, clip_skip
- UI labels: Batch size, CFG scale, Clip skip
- Metadata names: Batch-2, CFG scale, Clip skip
- Normalized names: Batch (skips both Batch-1 and Batch-2)