Commit Graph

33 Commits (master)

Author SHA1 Message Date
CalamitousFelicitousness c68ab0f75e fix(ui): constrain batch file list height in caption tab
Add max-height and overflow-y scroll to batch file upload components
so uploading many files doesn't push the entire UI down.
2026-03-07 02:50:23 +00:00
Vladimir Mandic e5c494f999 cleanup logger 2026-02-19 11:09:13 +01:00
Vladimir Mandic a3074baf8b unified logger 2026-02-19 09:46:42 +01:00
CalamitousFelicitousness e2cdbe47fa fix(caption): safetensors-only downloads, model load fixes, UI default, prefill tests
- Add use_safetensors=True to all 16 model from_pretrained calls to
  avoid downloading redundant .bin files alongside safetensors
- Add device property to JoyTag VisionModel so move_model can relocate
  it to CUDA (fixes 'ViT object has no attribute device')
- Fix Pix2Struct dtype mismatch by casting float inputs to model dtype
  while preserving integer tensor types
- Patch AutoConfig.register with exist_ok=True during Ovis loading to
  handle duplicate aimv2 registration on model reload
- Detect Qwen VL fine-tune architecture from config model_type instead
  of repo name, fixing ToriiGate and similar third-party fine-tunes
- Change UI default task from Short Caption to Normal Caption, and
  preserve it on model switch instead of resetting to Use Prompt
- Add dual-prefill testing across 5 VQA test methods using a shared
  _check_prefill helper
- Fix pre-existing ruff W605 in strip_think_xml_tags docstring
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness bf7a72f12e fix(caption): remove dead min_length param, split Florence/PromptGen prompts, fix gaze detection
- Remove caption_openclip_min_length from settings, API models, endpoints, and UI
  (clip_interrogator library has no min_length support; parameter was never functional)
- Split vlm_prompts_florence into base Florence prompts and PromptGen-only prompts
  (GENERATE_TAGS, Analyze, Mixed Caption require MiaoshouAI PromptGen fine-tune)
- Add 'promptgen' category to /vqa/prompts API endpoint
- Fix gaze detection: move DETECT_GAZE check before generic 'detect ' prefix
  to prevent "Detect Gaze" matching as detect target="Gaze"
- Update test suite: remove min_length tests, fix min_flavors to use mode='best',
  add acceptance-only notes, fix thinking trace detection, improve bracket/OCR tests,
  split Florence/PromptGen test coverage
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 61b031ada5 refactor: update imports for caption module rename
Update all imports from modules.interrogate to modules.caption across:
- modules/shared.py, modules/shared_legacy.py
- modules/ui_caption.py, modules/ui_common.py
- modules/ui_control.py, modules/ui_control_helpers.py
- modules/ui_img2img.py, modules/ui_sections.py
- modules/ui_symbols.py, modules/ui_video_vlm.py
2026-02-11 02:47:41 +00:00
CalamitousFelicitousness 6b89cc8463 feat(ui): add tooltips/hints to Caption tab
Add comprehensive tooltips to Caption tab UI elements in locale_en.json:

- Add new "llm" section for shared LLM/VLM parameters:
  System prompt, Prefill, Top-K, Top-P, Temperature, Num Beams,
  Use Samplers, Thinking Mode, Keep Thinking Trace, Keep Prefill

- Add new "caption" section for caption-specific settings:
  VLM, OpenCLiP, Tagger tab labels and all their parameters
  including thresholds, tag formatting, batch options

- Consolidate accordion labels in ui_caption.py:
  "Caption: Advanced Options" and "Caption: Batch" shared across
  VLM, OpenCLiP, and Tagger tabs (localized to "Advanced Options"
  and "Batch" in UI)

- Remove duplicate entries from missing section
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness 6b10f0df4f refactor(caption): address PR review feedback
Rename WD14 module and settings to WaifuDiffusion:
- Rename wd14.py to waifudiffusion.py
- Rename WD14Tagger class to WaifuDiffusionTagger
- Rename WD14_MODELS constant to WAIFUDIFFUSION_MODELS
- Rename settings: wd14_model -> waifudiffusion_model,
  wd14_character_threshold -> waifudiffusion_character_threshold
- Update all log messages from "WD14" to "WaifuDiffusion"

Code quality improvements:
- Simplify threshold parameter defaulting using `or` operator
- Extract save_output logic into _save_tags_to_file() helper with
  isolated error handling to prevent single file failures from
  impacting entire batch
- Fix timing log format consistency (remove 's' suffix)
2026-01-21 11:56:07 +00:00
CalamitousFelicitousness becb19319d refactor(caption): unify tagger settings and reorganize Caption Tab UI
Consolidate WD14 and DeepBooru tagger settings into unified options:
- Merge wd14_general_threshold + deepbooru_score_threshold → tagger_threshold
- Merge wd14_include_rating + deepbooru_include_rating → tagger_include_rating
- Rename interrogate_score → tagger_show_scores
- Rename tagger_escape → tagger_escape_brackets
- Rename CLiP → OpenCLiP in caption type choices

UI reorganization:
- Add Interrogate tab to Caption Tab with default caption type selector
- Move interrogate_offload to Model Offloading section as "Offload caption models"
- Hide Interrogate settings section (all settings now in Caption Tab UI)
- Update locale_en.json for OpenCLiP naming

Code improvements:
- DeepBooru tag_multi() now accepts same parameters as WD14 for unified interface
- Fix setting references in interrogate.py for consolidated settings
- Add comprehensive tagger test suite (cli/test-tagger.py)
2026-01-21 11:56:07 +00:00
CalamitousFelicitousness 656e86a962 refactor(caption): consolidate interrogate settings into Caption Tab UI
Hide all CLiP, VLM, and Tagger settings from Settings > Interrogate page
while keeping them in shared.opts for persistence. Caption Tab UI becomes
the single control point with change handlers that save directly to config.

Changes:
- Hide OpenCLiP, VLM, and Tagger settings with visible=False
- Add change handlers to save settings when UI controls change
- Rename "Booru Tags" tab to "Tagger", update choice labels
- Update interrogate.py to use unified tagger interface with all settings
2026-01-21 11:56:07 +00:00
CalamitousFelicitousness 09b8fe9761 feat(caption): integrate DeepBooru into unified Booru Tagger UI
Add DeepBooru as a model option alongside WD14 models in the Booru Tags
tab, with dynamic UI that disables inapplicable controls.

Changes:
- Create modules/interrogate/tagger.py as unified adapter module
- Add batch, load/unload, get_models functions to deepbooru.py
- Update ui_caption.py to use unified tagger interface
- Consolidate shared tagger settings in shared.py
- Add implementation plan for future settings consolidation

UI behavior:
- Model dropdown shows DeepBooru + all WD14 models
- Character threshold and include rating disabled for DeepBooru
- All controls re-enable when WD14 model selected
2026-01-21 11:56:07 +00:00
CalamitousFelicitousness db97c42320 feat(caption): add WD14 tagger with Booru Tags tab
Add SmilingWolf's WD14/WaifuDiffusion tagger models for anime/illustration
tagging as a new "Booru Tags" tab in the Caption panel.

- Support 9 models (v2 and v3 variants) via HuggingFace
- ONNX backend chosen due to safetensors v3 variants exhibiting
  unacceptable accuracy loss
- Separate thresholds for general/character tags
- Batch processing with progress bar
- Consolidate debug env var to SD_INTERROGATE_DEBUG
2026-01-21 11:56:07 +00:00
awsr 0faabffc14
Simplify options init/save/load 2026-01-10 13:27:38 -08:00
vladmandic a72b98848c cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-10 10:17:37 +01:00
CalamitousFelicitousness d277392103 feat(ui): caption tab label styling and CLIP analysis text output
Add clip_labels_text component for CLIP analysis results and standardize
label capitalization across VLM and CLiP sections for consistency.
2025-12-09 18:54:44 +00:00
CalamitousFelicitousness 5193285bc7 refactor(vqa): convert to class-based singleton
Refactor VQA module from module-level globals to a VQA class singleton
  pattern with self-contained per-model loading methods.

Changes:
- Add VQA class with model/processor state and detection data storage
- Extract load methods for clean model pre-loading via UI
- Interrogate to return string only; store detection data on instance
- Add vqa_draw.py for bounding box/point annotation utilities
    Stub, further transfer of drawing functions to follow
- Update moondream3.py to store detection data on VQA singleton
- Update endpoints.py and ui_caption.py for new return type
2025-12-05 20:53:18 +00:00
CalamitousFelicitousness 2b6226b62b feat(vqa): persist thinking mode and improve reasoning output formatting
- Add interrogate_vlm_thinking_mode setting to save checkbox state
- Update ui_caption to restore Thinking Mode preference on load
- Add blank line before 'Answer:' label for visual separation
- Remove '\n\n' replacement in clean() that stripped blank lines
- Fix Qwen reasoning detection when <think> tag is in prompt, not response
- Add reasoning icon to Moondream 2 and 3 model names
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness 506515b018 feat(vqa): add load/unload model buttons to Caption tab
- Add load_model() function to pre-load VLM into memory
- Add unload_model() function to free VLM from memory
- Add Load/Unload buttons to Caption tab UI
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness a90d85ddfd feat(ui): add dynamic task selection based on VLM model
- Rename "Predefined question" to "Task"
- Task dropdown updates choices when model changes
- Prompt placeholder updates based on selected task
- Model-specific tasks: Florence-2 gets detection tasks, Moondream gets point/detect
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness 4df6aa7944 fix(ui): set prefill text to empty by default 2025-12-05 00:00:25 +00:00
CalamitousFelicitousness 0d88fcd396 feat(ui): add prefill and thinking controls to Caption tab
Add minimal UI controls to expose new VQA functionality:
- Prefill Text input for guiding VLM responses
- Thinking Mode checkbox for reasoning models
- Keep Thinking Trace checkbox for output retention
- Keep Prefill checkbox for output retention
- Annotated Image output panel for detection visualization
- Updated button handlers to pass new parameters
2025-12-05 00:00:24 +00:00
CalamitousFelicitousness 78711fb1d4
Merge branch 'dev' into patch-2 2025-10-01 20:58:58 +01:00
CalamitousFelicitousness 78820a14dc
Allow VLM temp setting temperature to 0
Allow VLM temp setting temperature to 0
2025-10-01 20:52:04 +01:00
Vladimir Mandic cd79f92dff add opts models_not_to_offload
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-09-19 11:21:54 -04:00
Vladimir Mandic 05dd0096c9 set default vqa model
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-09-04 08:38:29 -04:00
Vladimir Mandic b2dbef53e5 restyled all toolbuttons to be modernui native
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-08-31 15:01:50 -04:00
Vladimir Mandic 8473bae0fc 1000 papercuts
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-05-13 21:51:33 -04:00
Vladimir Mandic 9bf6838962 update video tab
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-03-20 14:39:38 -04:00
Vladimir Mandic dbfd59434f add gemma3
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-03-15 15:30:57 -04:00
Vladimir Mandic b6990151c4 caption tab modernui support
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-02-17 10:59:22 -05:00
Vladimir Mandic a4b3dc269e modernize clip interrogate
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-02-16 19:37:09 -05:00
Vladimir Mandic f3dd9b9646 vlm advanced settings and batch processing
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-02-15 14:34:28 -05:00
Vladimir Mandic e95bd93f67 caption ui redesign
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2025-02-15 12:57:19 -05:00