- Remove _get_device_dtype() indirection, inline device/dtype at call sites
- Remove commented-out fallback blocks and try/finally wrappers
- Add modules/sharpfin to ruff and pylint excludes in pyproject.toml
- Fix import ordering in joytag.py and pixelart.py
Changes based on vladmandic and Disty0 feedback:
- Fix logging: use direct `from installer import log` instead of lazy _get_log()
- Remove unused is_available() function
- Remove defensive getattr() calls in _resolve_kernel/_resolve_linearize
- Simplify _get_device_dtype() to use devices module directly
- Refactor to_pil() with single Image.fromarray() call and explicit mode
- Add cross-platform fallback: sharpfin only runs on CUDA, falls back to
PIL/F.interpolate for other devices (CPU, MPS, OpenVINO)
- Replace lambdas with functools.partial in functional.py for torch.compile safety
- Add modules/sharpfin to pylint ignore-paths (vendored code)
- Remove superfluous SimpleNamespace import in cli/api-caption.py, use Map instead
- Drop _ prefix from internal helper functions in modules/api/caption.py
- Move DeepDanbooru model path to top-level models folder instead of nesting under CLIP
- Rename shadowing import in waifudiffusion batch to avoid F823/E0606
- Fix import order in cli/api-caption.py (stdlib before third-party)
- Rename local variable shadowing function name in cli/api-caption.py
- Remove unnecessary global statement in devices.bypass_sdpa_hijacks
- Add _load_blip_model helper with explicit cache_dir so downloads
go to hfcache_dir instead of default HF cache
- Pre-load BLIP model/processor before creating Interrogator config
to control download location and avoid redundant loads
- Set clip_model_path on config for CLIP model cache location
- Add cache_dir to Moondream model and tokenizer loading
Move all caption/interrogate/tagger/VQA API code out of the monolithic
endpoints.py and models.py into a new self-contained modules/api/caption.py,
following the loras.py / nudenet.py self-registering pattern.
- Move 15 Pydantic models (ReqCaption, ResCaption, ReqVQA, ResVQA,
ReqTagger, ResTagger, dispatch union types, etc.) from models.py
- Move 11 handler functions from endpoints.py
- Deduplicate ~150 lines via shared _do_openclip, _do_tagger, _do_vqa
core functions called by both direct and dispatch endpoints
- Add register_api() that registers all 8 caption routes
- Add promptgen field to ResVLMPrompts (bug fix: handler returned it
but response model silently dropped it)
- Improve all endpoint docstrings and Field descriptions for API docs
- Add use_safetensors=True to all 16 model from_pretrained calls to
avoid downloading redundant .bin files alongside safetensors
- Add device property to JoyTag VisionModel so move_model can relocate
it to CUDA (fixes 'ViT object has no attribute device')
- Fix Pix2Struct dtype mismatch by casting float inputs to model dtype
while preserving integer tensor types
- Patch AutoConfig.register with exist_ok=True during Ovis loading to
handle duplicate aimv2 registration on model reload
- Detect Qwen VL fine-tune architecture from config model_type instead
of repo name, fixing ToriiGate and similar third-party fine-tunes
- Change UI default task from Short Caption to Normal Caption, and
preserve it on model switch instead of resetting to Use Prompt
- Add dual-prefill testing across 5 VQA test methods using a shared
_check_prefill helper
- Fix pre-existing ruff W605 in strip_think_xml_tags docstring
update_caption_params() was setting caption_max_length, chunk_size, and
flavor_intermediate_count on the Interrogator instance, but the library
reads them from self.config. The overrides were silently ignored.
- Add parse_florence_detections() and format_florence_response() to
vqa_detection for handling Florence-2 detection output formats
- Add bypass_sdpa_hijacks() context manager to devices.py for models
incompatible with SageAttention or other SDPA hijacks
- Add OpenCLIP model offload support when caption_offload is enabled
- Remove caption_openclip_min_length from settings, API models, endpoints, and UI
(clip_interrogator library has no min_length support; parameter was never functional)
- Split vlm_prompts_florence into base Florence prompts and PromptGen-only prompts
(GENERATE_TAGS, Analyze, Mixed Caption require MiaoshouAI PromptGen fine-tune)
- Add 'promptgen' category to /vqa/prompts API endpoint
- Fix gaze detection: move DETECT_GAZE check before generic 'detect ' prefix
to prevent "Detect Gaze" matching as detect target="Gaze"
- Update test suite: remove min_length tests, fix min_flavors to use mode='best',
add acceptance-only notes, fix thinking trace detection, improve bracket/OCR tests,
split Florence/PromptGen test coverage
Move all caption-related modules from modules/interrogate/ to modules/caption/
for better naming consistency:
- Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger,
vqa, vqa_detection, waifudiffusion modules
- Add new caption.py dispatcher module
- Remove old interrogate.py (functionality moved to caption.py)
- Update cli/api-interrogate.py to use /sdapi/v1/tagger for DeepBooru
- Handle tagger response format (scores dict or tags string)
- Remove DeepBooru test from interrogate endpoint tests
- Update API model descriptions to reference tagger for anime tagging
DeepBooru/DeepDanbooru should only be accessed via the tagger endpoint.
The interrogate endpoint is now exclusively for OpenCLIP/BLIP.
- Remove DeepDanbooru handling from post_interrogate
- Update docstring to reference tagger endpoint for anime tagging
- Simplify code by removing if/else branching
Add prompt field to VQA endpoint and advanced settings to OpenCLIP endpoint
to achieve full parity between UI and API capabilities.
VLM endpoint changes:
- Add prompt field for custom text input (required for 'Use Prompt' task)
- Pass prompt to vqa.interrogate instead of hardcoded empty string
OpenCLIP endpoint changes:
- Add 7 optional per-request override fields: min_length, max_length,
chunk_size, min_flavors, max_flavors, flavor_count, num_beams
- Add get_clip_setting() helper for override support in openclip.py
- Apply overrides via update_interrogate_params() before interrogation
All new fields are optional with None defaults for backwards compatibility.
Update API model field descriptions to match the hints in locale_en.json
for consistency between UI and API documentation.
Updated models:
- ReqInterrogate: clip_model, blip_model, mode
- ReqVQA: model, question, system
- ReqTagger: model, threshold, character_threshold, max_tags,
include_rating, sort_alpha, use_spaces, escape_brackets,
exclude_tags, show_scores
Add comprehensive caption/interrogate API with documentation:
- GET /sdapi/v1/interrogate: List available interrogation models
- POST /sdapi/v1/interrogate: Interrogate with OpenCLIP/BLIP/DeepDanbooru
- POST /sdapi/v1/vqa: Caption with Vision-Language Models (VLM)
- GET /sdapi/v1/vqa: List available VLM models
- POST /sdapi/v1/vqa/batch: Batch caption multiple images
- POST /sdapi/v1/tagger: Tag images with WaifuDiffusion/DeepBooru
Updates:
- Add detailed docstrings with usage examples
- Fix analyze_image response parsing for Gradio update dicts
- Add request/response models for all endpoints
Add comprehensive tooltips to Caption tab UI elements in locale_en.json:
- Add new "llm" section for shared LLM/VLM parameters:
System prompt, Prefill, Top-K, Top-P, Temperature, Num Beams,
Use Samplers, Thinking Mode, Keep Thinking Trace, Keep Prefill
- Add new "caption" section for caption-specific settings:
VLM, OpenCLiP, Tagger tab labels and all their parameters
including thresholds, tag formatting, batch options
- Consolidate accordion labels in ui_caption.py:
"Caption: Advanced Options" and "Caption: Batch" shared across
VLM, OpenCLiP, and Tagger tabs (localized to "Advanced Options"
and "Batch" in UI)
- Remove duplicate entries from missing section
- Remove GFPGAN pip install from installer.py optional requirements
- Remove 'gfpgan' from modules_to_remove cleanup list in launch.py
- Remove --codeformer-models-path and --gfpgan-models-path CLI args
- Remove GFPGAN model directory migration from modelloader.py
- Remove codeformer, restoreformer, GFPGANv1.4, and GPEN-BFR ONNX
model URLs from the predefined list
- Remove the .fp16 ONNX restorer code path that bypassed detailer
processing to run face restoration directly
- Remove /sdapi/v1/face-restorers route from api.py
- Remove get_restorers() function from endpoints.py
- Remove gfpgan_visibility, codeformer_visibility, codeformer_weight
fields from ReqProcess model
- Remove GFPGAN and CodeFormer entries from run_extras() signature
and create_args_for_run dict in postprocessing.py
- Remove CodeFormer/GFPGAN import and setup from webui.py initialize()
- Remove face_restorers list, codeformer/gfpgan model path settings,
and face restore UI settings section from shared.py
- Remove restore_faces parameter from StableDiffusionProcessing
- Remove face_restoration import and restore_faces processing block
from processing.py
Remove all vendored face restoration code that is no longer maintained:
- modules/postprocess/codeformer_model.py, codeformer_arch.py, vqgan_arch.py
- modules/postprocess/gfpgan_model.py, restorer.py
- modules/face_restoration.py (base class and dispatcher)
- scripts/postprocessing_codeformer.py, postprocessing_gfpgan.py
- modules/facelib/ (vendored face detection/parsing library)
These were the only two backends registered in shared.face_restorers,
making the entire face restoration infrastructure dead code.
Nunchaku's SDXL UNet does not support offloading and raises
NotImplementedError when offload=True is passed. Skip the parameter
for SDXL and log a warning instead of crashing.
Filter out reference entries tagged "nunchaku" from Extra Networks
when the active backend is not CUDA, since Nunchaku requires NVIDIA
GPUs. Entries remain in shared.reference_models for programmatic
lookup but are not yielded to the UI.
- Rename HuggingFace org from nunchaku-tech to nunchaku-ai across all
nunchaku model repos (flux, sdxl, sana, z-image, qwen, t5)
- Add per-torch-version nunchaku version mapping instead of single global
version, with robust torch version parsing
- Add 'Fill (Nunchaku)' and 'Depth (Nunchaku)' options to Flux Tools
dropdown, loading models with +nunchaku suffix for SVDQuant quantization
- Mark Fill and Depth nunchaku reference entries as hidden so they remain
available for check_nunchaku() lookup but don't appear in Extra Networks
- Filter hidden reference models in ui_extra_networks_checkpoints
Replace manual Model/TE checkboxes in Quantization Settings with a
dedicated "Nunchaku" tab in the Extra Networks menu where users can
directly select nunchaku-quantized model variants. Detection is now
using a +nunchaku path marker for disambiguation.