Commit Graph

12419 Commits (78c58e0d70bb7a4c37a8ccb963b155ba3e4468d2)

Author SHA1 Message Date
vladmandic 78c58e0d70 update precommit
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 11:12:21 +01:00
vladmandic b4e5b563c6 update lint rules
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 10:47:07 +01:00
vladmandic 73a5d55022 cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 10:12:37 +01:00
vladmandic 8561da6f8c cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 10:02:41 +01:00
vladmandic 967974ade7 merge cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 09:57:37 +01:00
vladmandic 3ae9909b2a update sharpfin usage
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness dc8ecb0a64 refactor: address remaining PR #4640 review comments
- Remove _get_device_dtype() indirection, inline device/dtype at call sites
- Remove commented-out fallback blocks and try/finally wrappers
- Add modules/sharpfin to ruff and pylint excludes in pyproject.toml
- Fix import ordering in joytag.py and pixelart.py
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness 162651cbdb refactor: address PR #4640 review comments
Changes based on vladmandic and Disty0 feedback:

- Fix logging: use direct `from installer import log` instead of lazy _get_log()
- Remove unused is_available() function
- Remove defensive getattr() calls in _resolve_kernel/_resolve_linearize
- Simplify _get_device_dtype() to use devices module directly
- Refactor to_pil() with single Image.fromarray() call and explicit mode
- Add cross-platform fallback: sharpfin only runs on CUDA, falls back to
  PIL/F.interpolate for other devices (CPU, MPS, OpenVINO)
- Replace lambdas with functools.partial in functional.py for torch.compile safety
- Add modules/sharpfin to pylint ignore-paths (vendored code)
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness 76aa949a26 refactor: integrate sharpfin for high-quality image resize
Vendor sharpfin library (Apache 2.0) and add centralized wrapper
module (images_sharpfin.py) replacing torchvision tensor/PIL
conversion and resize operations throughout the codebase.

- Add modules/sharpfin/ vendored library with MKS2021, Lanczos3,
  Mitchell, Catmull-Rom kernels and optional Triton sparse acceleration
- Add modules/images_sharpfin.py wrapper with to_tensor(), to_pil(),
  pil_to_tensor(), normalize(), resize(), resize_tensor()
- Add resize_quality and resize_linearize_srgb settings
- Add MKS2021 and Lanczos3 upscaler entries
- Replace torchvision.transforms.functional imports across 18 files
- to_pil() auto-detects HWC/BHWC layout, adds .round() before uint8
- Sparse Triton path falls back to dense GPU on compilation failure
- Mixed-axis resize splits into two single-axis scale() calls
- Masks and non-sRGB data always use linearize=False
2026-02-11 09:57:37 +01:00
Vladimir Mandic 2c4d0751d9
Merge pull request #4613 from CalamitousFelicitousness/feat/caption-improvements-v2_backup
Caption system overhaul V2
2026-02-11 09:33:29 +01:00
CalamitousFelicitousness 80014fac7c fix(caption): address PR review feedback
- Remove superfluous SimpleNamespace import in cli/api-caption.py, use Map instead
- Drop _ prefix from internal helper functions in modules/api/caption.py
- Move DeepDanbooru model path to top-level models folder instead of nesting under CLIP
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 139e331d80 style(caption): fix lint warnings across caption module
- Rename shadowing import in waifudiffusion batch to avoid F823/E0606
- Fix import order in cli/api-caption.py (stdlib before third-party)
- Rename local variable shadowing function name in cli/api-caption.py
- Remove unnecessary global statement in devices.bypass_sdpa_hijacks
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 8d67debdfd fix(caption): use cache_dir for BLIP and Moondream model downloads
- Add _load_blip_model helper with explicit cache_dir so downloads
  go to hfcache_dir instead of default HF cache
- Pre-load BLIP model/processor before creating Interrogator config
  to control download location and avoid redundant loads
- Set clip_model_path on config for CLIP model cache location
- Add cache_dir to Moondream model and tokenizer loading
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 6c20e49897 refactor(caption): extract caption API into standalone module
Move all caption/interrogate/tagger/VQA API code out of the monolithic
endpoints.py and models.py into a new self-contained modules/api/caption.py,
following the loras.py / nudenet.py self-registering pattern.

- Move 15 Pydantic models (ReqCaption, ResCaption, ReqVQA, ResVQA,
  ReqTagger, ResTagger, dispatch union types, etc.) from models.py
- Move 11 handler functions from endpoints.py
- Deduplicate ~150 lines via shared _do_openclip, _do_tagger, _do_vqa
  core functions called by both direct and dispatch endpoints
- Add register_api() that registers all 8 caption routes
- Add promptgen field to ResVLMPrompts (bug fix: handler returned it
  but response model silently dropped it)
- Improve all endpoint docstrings and Field descriptions for API docs
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness e2cdbe47fa fix(caption): safetensors-only downloads, model load fixes, UI default, prefill tests
- Add use_safetensors=True to all 16 model from_pretrained calls to
  avoid downloading redundant .bin files alongside safetensors
- Add device property to JoyTag VisionModel so move_model can relocate
  it to CUDA (fixes 'ViT object has no attribute device')
- Fix Pix2Struct dtype mismatch by casting float inputs to model dtype
  while preserving integer tensor types
- Patch AutoConfig.register with exist_ok=True during Ovis loading to
  handle duplicate aimv2 registration on model reload
- Detect Qwen VL fine-tune architecture from config model_type instead
  of repo name, fixing ToriiGate and similar third-party fine-tunes
- Change UI default task from Short Caption to Normal Caption, and
  preserve it on model switch instead of resetting to Use Prompt
- Add dual-prefill testing across 5 VQA test methods using a shared
  _check_prefill helper
- Fix pre-existing ruff W605 in strip_think_xml_tags docstring
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 57659ab642 fix(caption): set clip_interrogator params on config, not instance
update_caption_params() was setting caption_max_length, chunk_size, and
flavor_intermediate_count on the Interrogator instance, but the library
reads them from self.config. The overrides were silently ignored.
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 17b03ed8e4 feat(caption): add Florence detection parsing, SDPA bypass, and offload support
- Add parse_florence_detections() and format_florence_response() to
  vqa_detection for handling Florence-2 detection output formats
- Add bypass_sdpa_hijacks() context manager to devices.py for models
  incompatible with SageAttention or other SDPA hijacks
- Add OpenCLIP model offload support when caption_offload is enabled
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 443a73b740 refactor(caption): code review fixes for offload, inference, and maintainability
Comprehensive review of modules/caption/ addressing memory management,
consistency, and code quality:

Inference correctness:
- Add devices.inference_context() to _qwen(), _smol(), _sa2() handlers
- Remove redundant @torch.no_grad() decorator from joycaption predict()
- Remove dead dtype=torch.bfloat16 kwarg from Florence loader

Memory management:
- Bound moondream3 image cache with LRU eviction (max 8 entries)
- Replace fragile id(image) cache keys with content-based md5 hash
- Add devices.torch_gc() after model loading in deepseek
- Move deepbooru model to CPU before dropping reference on unload
- Add external handler delegation to VQA.unload() (moondream3,
  joycaption, joytag, deepseek)
- Protect batch offload mutation with try/finally

Code deduplication:
- Extract strip_think_xml_tags() shared helper for Qwen/Gemma/SmolVLM
- Extract save_tags_to_file() into tagger.py from deepbooru and
  waifudiffusion

Documentation and clarity:
- Document deepseek global monkey-patches (LlamaFlashAttention2, attrdict)
- Document Florence task="task" as intentional design choice
- Add vendored-code comment to joytag.py
- Document openclip direct .to() usage vs sd_models.move_model
- Comment model.eval() calls that are required (trust_remote_code,
  custom loaders) vs removed where redundant (standard from_pretrained)

API robustness:
- Add HTTP 422 error response for VQA caption error strings in API
  endpoints (post_vqa, _dispatch_vlm)
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness bf7a72f12e fix(caption): remove dead min_length param, split Florence/PromptGen prompts, fix gaze detection
- Remove caption_openclip_min_length from settings, API models, endpoints, and UI
  (clip_interrogator library has no min_length support; parameter was never functional)
- Split vlm_prompts_florence into base Florence prompts and PromptGen-only prompts
  (GENERATE_TAGS, Analyze, Mixed Caption require MiaoshouAI PromptGen fine-tune)
- Add 'promptgen' category to /vqa/prompts API endpoint
- Fix gaze detection: move DETECT_GAZE check before generic 'detect ' prefix
  to prevent "Detect Gaze" matching as detect target="Gaze"
- Update test suite: remove min_length tests, fix min_flavors to use mode='best',
  add acceptance-only notes, fix thinking trace detection, improve bracket/OCR tests,
  split Florence/PromptGen test coverage
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness fba942b25e feat(caption): add debug logging for Florence-2 handler 2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 0c45e58e80 docs: update localization and README for caption module
- Update html/locale_en.json with caption-related strings
- Update README.md documentation
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 588222f2d1 test: update caption API tests
Update cli/test-caption-api.py:
- Update test structure for new caption API endpoints
- Fix Moondream gaze detection test prompt to use 'Detect Gaze'
  instead of 'Where is the person looking?' to match handler trigger
- Improve test result categorization and tracking
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness d78c5c1cd0 refactor: update CLI tools for caption module
- Rename cli/api-interrogate.py to cli/api-caption.py
- Update cli/options.py, cli/process.py for new module paths
- Update cli/test-tagger.py for caption module imports
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness f4b5abde68 refactor: update API for caption module
Update API endpoints and models for caption module rename:
- modules/api/api.py - update imports and endpoint handlers
- modules/api/endpoints.py - update endpoint definitions
- modules/api/models.py - update request/response models
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 61b031ada5 refactor: update imports for caption module rename
Update all imports from modules.interrogate to modules.caption across:
- modules/shared.py, modules/shared_legacy.py
- modules/ui_caption.py, modules/ui_common.py
- modules/ui_control.py, modules/ui_control_helpers.py
- modules/ui_img2img.py, modules/ui_sections.py
- modules/ui_symbols.py, modules/ui_video_vlm.py
2026-02-11 02:47:41 +00:00
CalamitousFelicitousness 5183ebec58 refactor: rename interrogate module to caption
Move all caption-related modules from modules/interrogate/ to modules/caption/
for better naming consistency:
- Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger,
  vqa, vqa_detection, waifudiffusion modules
- Add new caption.py dispatcher module
- Remove old interrogate.py (functionality moved to caption.py)
2026-02-11 02:47:41 +00:00
CalamitousFelicitousness 83fa8e39ba refactor(api): update cli tools for DeepBooru tagger migration
- Update cli/api-interrogate.py to use /sdapi/v1/tagger for DeepBooru
- Handle tagger response format (scores dict or tags string)
- Remove DeepBooru test from interrogate endpoint tests
- Update API model descriptions to reference tagger for anime tagging
2026-02-11 02:47:41 +00:00
CalamitousFelicitousness 7825f44581 refactor(api): remove DeepBooru from interrogate endpoint
DeepBooru/DeepDanbooru should only be accessed via the tagger endpoint.
The interrogate endpoint is now exclusively for OpenCLIP/BLIP.

- Remove DeepDanbooru handling from post_interrogate
- Update docstring to reference tagger endpoint for anime tagging
- Simplify code by removing if/else branching
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness 0559651b1b fix(vqa): fix infinite recursion and Florence-2 generation
- Fix get_keep_thinking() infinite recursion (was calling itself)
- Fix get_keep_prefill() infinite recursion (was calling itself)
- Fix Florence-2 to use beam search instead of sampling
  Sampling causes probability tensor errors with Florence-2
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness 3208067259 test(api): improve caption API test coverage and validation
Add model architecture coverage tests:
- VQA model family detection for 19 architectures
- Florence special prompts test (<OD>, <OCR>, <CAPTION>, etc.)
- Moondream detection features test
- VQA architecture capabilities test
- Tagger model types and WD version comparison tests

Improve test validation:
- Add is_meaningful_answer() to reject responses like "."
- Verify parameters have actual effect (not just accepted)
- Show actual output traces in PASS/FAIL messages
- Fix prefill tests to verify keep_prefill behavior

Add configurable timeout:
- Default timeout increased to 300s for slow models
- Add --timeout CLI argument for customization

Other improvements:
- Add JoyCaption to recognized model families
- Reduce BLIP models to avoid reloading large models
- Better detection result validation for annotated images
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness a04ba1e482 feat(api): add missing caption API parameters for UI parity
Add prompt field to VQA endpoint and advanced settings to OpenCLIP endpoint
to achieve full parity between UI and API capabilities.

VLM endpoint changes:
- Add prompt field for custom text input (required for 'Use Prompt' task)
- Pass prompt to vqa.interrogate instead of hardcoded empty string

OpenCLIP endpoint changes:
- Add 7 optional per-request override fields: min_length, max_length,
  chunk_size, min_flavors, max_flavors, flavor_count, num_beams
- Add get_clip_setting() helper for override support in openclip.py
- Apply overrides via update_interrogate_params() before interrogation

All new fields are optional with None defaults for backwards compatibility.
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness 5fc46c042e docs(api): synchronize API descriptions with UI hints
Update API model field descriptions to match the hints in locale_en.json
for consistency between UI and API documentation.

Updated models:
- ReqInterrogate: clip_model, blip_model, mode
- ReqVQA: model, question, system
- ReqTagger: model, threshold, character_threshold, max_tags,
  include_rating, sort_alpha, use_spaces, escape_brackets,
  exclude_tags, show_scores
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness f431141d2f feat(api): add LLM generation parameters to VQA endpoint
Add optional LLM generation parameters to the VQA API request model,
allowing per-request override of settings:

- max_tokens, temperature, top_k, top_p, num_beams, do_sample
- thinking_mode, prefill, keep_thinking, keep_prefill

Changes:
- Add 10 new optional fields to ReqVQA model with descriptive docs
- Update get_kwargs() to support per-request overrides via singleton
- Add helper functions get_keep_thinking(), get_keep_prefill()
- Update post_vqa endpoint to pass generation kwargs
- Add _generation_overrides instance variable to VQA class
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness f3c4fae440 test(api): add caption API test suite
Comprehensive test script for all Caption API endpoints:
- GET/POST /sdapi/v1/interrogate (OpenCLiP/DeepBooru)
- POST /sdapi/v1/vqa (VLM captioning)
- GET /sdapi/v1/vqa/models, /sdapi/v1/vqa/prompts
- POST /sdapi/v1/tagger
- GET /sdapi/v1/tagger/models

Usage: python cli/test-caption-api.py [--url URL] [--image PATH]
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness ef797169a3 refactor(interrogate): use configurable clip_models_path
- Remove unused paths import from deepbooru.py and openclip.py
- Use shared.opts.clip_models_path instead of hardcoded paths
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness ec7934799e feat(api): add caption API endpoints and documentation
Add comprehensive caption/interrogate API with documentation:

- GET /sdapi/v1/interrogate: List available interrogation models
- POST /sdapi/v1/interrogate: Interrogate with OpenCLIP/BLIP/DeepDanbooru
- POST /sdapi/v1/vqa: Caption with Vision-Language Models (VLM)
- GET /sdapi/v1/vqa: List available VLM models
- POST /sdapi/v1/vqa/batch: Batch caption multiple images
- POST /sdapi/v1/tagger: Tag images with WaifuDiffusion/DeepBooru

Updates:
- Add detailed docstrings with usage examples
- Fix analyze_image response parsing for Gradio update dicts
- Add request/response models for all endpoints
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness 6b89cc8463 feat(ui): add tooltips/hints to Caption tab
Add comprehensive tooltips to Caption tab UI elements in locale_en.json:

- Add new "llm" section for shared LLM/VLM parameters:
  System prompt, Prefill, Top-K, Top-P, Temperature, Num Beams,
  Use Samplers, Thinking Mode, Keep Thinking Trace, Keep Prefill

- Add new "caption" section for caption-specific settings:
  VLM, OpenCLiP, Tagger tab labels and all their parameters
  including thresholds, tag formatting, batch options

- Consolidate accordion labels in ui_caption.py:
  "Caption: Advanced Options" and "Caption: Batch" shared across
  VLM, OpenCLiP, and Tagger tabs (localized to "Advanced Options"
  and "Batch" in UI)

- Remove duplicate entries from missing section
2026-02-11 02:47:40 +00:00
vladmandic 7eb9b1cc5c create tests folder
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-10 14:31:53 +01:00
vladmandic d602a093fb lint
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-10 13:54:13 +01:00
vladmandic bd61633e14 switch to pyproject.toml for tool config
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-10 13:51:51 +01:00
vladmandic 684d77d871 update diffusers
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-10 11:49:01 +01:00
vladmandic e907a0a573 update graphics
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-09 22:46:32 +01:00
vladmandic 363cb175aa allow different lora in hires
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-09 22:31:00 +01:00
vladmandic 42d8ad498e add ftfy
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-09 19:54:26 +01:00
Vladimir Mandic 4e7b5c0b70
Merge pull request #4638 from vladmandic/revert-4629-public-re-export
Revert "Mark public re-exports"
2026-02-09 18:30:46 +01:00
Vladimir Mandic e3ca883cbd
Revert "Mark public re-exports" 2026-02-09 18:30:18 +01:00
vladmandic 0d2e9fbf62 cleanup and update changelog
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-09 18:20:10 +01:00
Vladimir Mandic d0f9e25906
Merge pull request #4634 from CalamitousFelicitousness/nunchaku-reference
Nunchaku reference
2026-02-09 18:06:34 +01:00
Vladimir Mandic 480b58e994
Merge pull request #4636 from CalamitousFelicitousness/fix/installer-uv-clip
fix(installer): handle setuptools 82 removing pkg_resources and uv broken fallback
2026-02-09 17:52:02 +01:00
Vladimir Mandic b454fa9748
Merge pull request #4633 from awsr/patch-2
Linting rules: TCH -> TC
2026-02-09 17:50:58 +01:00