Commit Graph

25 Commits (09e41b65d3749618c5ae5d06e0fa73847874ecf7)

Author SHA1 Message Date
vladmandic 756b4599be merge: modules/caption
Signed-off-by: vladmandic <mandic00@live.com>
2026-03-13 12:22:46 +01:00
CalamitousFelicitousness 2f30e466e1 fix(caption): tagger batch only processes first uploaded file
Align tagger batch file collection with the working VQA/OpenCLIP
pattern. The previous implementation used Path wrapping and resolve()
deduplication which broke multi-file uploads from the Gradio File
component. Now all four batch modes (VQA, OpenCLIP, WaifuDiffusion,
DeepBooru) use the same f.name file collection approach.
2026-03-07 02:26:48 +00:00
vladmandic 9df9ed1b05 update gemini models
Signed-off-by: vladmandic <mandic00@live.com>
2026-03-04 07:31:36 +01:00
vladmandic 1ddd0bf33a add gemini to prompt enhance
Signed-off-by: vladmandic <mandic00@live.com>
2026-03-02 11:01:17 +01:00
vladmandic 5ff73b61a4 add google gemini to captioning
Signed-off-by: vladmandic <mandic00@live.com>
2026-03-02 09:34:40 +01:00
Vladimir Mandic d65a2d1ebc ruff lint 2026-02-19 11:13:44 +01:00
Vladimir Mandic e5c494f999 cleanup logger 2026-02-19 11:09:13 +01:00
Vladimir Mandic a3074baf8b unified logger 2026-02-19 09:46:42 +01:00
Vladimir Mandic bfe014f5da modernize typing 2026-02-19 09:15:37 +01:00
vladmandic 88db926ecd remove clip as requirement
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-12 08:40:10 +01:00
vladmandic da1cf2f996 refactor image methods
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 12:29:00 +01:00
vladmandic 8561da6f8c cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 10:02:41 +01:00
vladmandic 967974ade7 merge cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness dc8ecb0a64 refactor: address remaining PR #4640 review comments
- Remove _get_device_dtype() indirection, inline device/dtype at call sites
- Remove commented-out fallback blocks and try/finally wrappers
- Add modules/sharpfin to ruff and pylint excludes in pyproject.toml
- Fix import ordering in joytag.py and pixelart.py
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness 76aa949a26 refactor: integrate sharpfin for high-quality image resize
Vendor sharpfin library (Apache 2.0) and add centralized wrapper
module (images_sharpfin.py) replacing torchvision tensor/PIL
conversion and resize operations throughout the codebase.

- Add modules/sharpfin/ vendored library with MKS2021, Lanczos3,
  Mitchell, Catmull-Rom kernels and optional Triton sparse acceleration
- Add modules/images_sharpfin.py wrapper with to_tensor(), to_pil(),
  pil_to_tensor(), normalize(), resize(), resize_tensor()
- Add resize_quality and resize_linearize_srgb settings
- Add MKS2021 and Lanczos3 upscaler entries
- Replace torchvision.transforms.functional imports across 18 files
- to_pil() auto-detects HWC/BHWC layout, adds .round() before uint8
- Sparse Triton path falls back to dense GPU on compilation failure
- Mixed-axis resize splits into two single-axis scale() calls
- Masks and non-sRGB data always use linearize=False
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness 80014fac7c fix(caption): address PR review feedback
- Remove superfluous SimpleNamespace import in cli/api-caption.py, use Map instead
- Drop _ prefix from internal helper functions in modules/api/caption.py
- Move DeepDanbooru model path to top-level models folder instead of nesting under CLIP
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 139e331d80 style(caption): fix lint warnings across caption module
- Rename shadowing import in waifudiffusion batch to avoid F823/E0606
- Fix import order in cli/api-caption.py (stdlib before third-party)
- Rename local variable shadowing function name in cli/api-caption.py
- Remove unnecessary global statement in devices.bypass_sdpa_hijacks
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 8d67debdfd fix(caption): use cache_dir for BLIP and Moondream model downloads
- Add _load_blip_model helper with explicit cache_dir so downloads
  go to hfcache_dir instead of default HF cache
- Pre-load BLIP model/processor before creating Interrogator config
  to control download location and avoid redundant loads
- Set clip_model_path on config for CLIP model cache location
- Add cache_dir to Moondream model and tokenizer loading
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness e2cdbe47fa fix(caption): safetensors-only downloads, model load fixes, UI default, prefill tests
- Add use_safetensors=True to all 16 model from_pretrained calls to
  avoid downloading redundant .bin files alongside safetensors
- Add device property to JoyTag VisionModel so move_model can relocate
  it to CUDA (fixes 'ViT object has no attribute device')
- Fix Pix2Struct dtype mismatch by casting float inputs to model dtype
  while preserving integer tensor types
- Patch AutoConfig.register with exist_ok=True during Ovis loading to
  handle duplicate aimv2 registration on model reload
- Detect Qwen VL fine-tune architecture from config model_type instead
  of repo name, fixing ToriiGate and similar third-party fine-tunes
- Change UI default task from Short Caption to Normal Caption, and
  preserve it on model switch instead of resetting to Use Prompt
- Add dual-prefill testing across 5 VQA test methods using a shared
  _check_prefill helper
- Fix pre-existing ruff W605 in strip_think_xml_tags docstring
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 57659ab642 fix(caption): set clip_interrogator params on config, not instance
update_caption_params() was setting caption_max_length, chunk_size, and
flavor_intermediate_count on the Interrogator instance, but the library
reads them from self.config. The overrides were silently ignored.
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 17b03ed8e4 feat(caption): add Florence detection parsing, SDPA bypass, and offload support
- Add parse_florence_detections() and format_florence_response() to
  vqa_detection for handling Florence-2 detection output formats
- Add bypass_sdpa_hijacks() context manager to devices.py for models
  incompatible with SageAttention or other SDPA hijacks
- Add OpenCLIP model offload support when caption_offload is enabled
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 443a73b740 refactor(caption): code review fixes for offload, inference, and maintainability
Comprehensive review of modules/caption/ addressing memory management,
consistency, and code quality:

Inference correctness:
- Add devices.inference_context() to _qwen(), _smol(), _sa2() handlers
- Remove redundant @torch.no_grad() decorator from joycaption predict()
- Remove dead dtype=torch.bfloat16 kwarg from Florence loader

Memory management:
- Bound moondream3 image cache with LRU eviction (max 8 entries)
- Replace fragile id(image) cache keys with content-based md5 hash
- Add devices.torch_gc() after model loading in deepseek
- Move deepbooru model to CPU before dropping reference on unload
- Add external handler delegation to VQA.unload() (moondream3,
  joycaption, joytag, deepseek)
- Protect batch offload mutation with try/finally

Code deduplication:
- Extract strip_think_xml_tags() shared helper for Qwen/Gemma/SmolVLM
- Extract save_tags_to_file() into tagger.py from deepbooru and
  waifudiffusion

Documentation and clarity:
- Document deepseek global monkey-patches (LlamaFlashAttention2, attrdict)
- Document Florence task="task" as intentional design choice
- Add vendored-code comment to joytag.py
- Document openclip direct .to() usage vs sd_models.move_model
- Comment model.eval() calls that are required (trust_remote_code,
  custom loaders) vs removed where redundant (standard from_pretrained)

API robustness:
- Add HTTP 422 error response for VQA caption error strings in API
  endpoints (post_vqa, _dispatch_vlm)
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness bf7a72f12e fix(caption): remove dead min_length param, split Florence/PromptGen prompts, fix gaze detection
- Remove caption_openclip_min_length from settings, API models, endpoints, and UI
  (clip_interrogator library has no min_length support; parameter was never functional)
- Split vlm_prompts_florence into base Florence prompts and PromptGen-only prompts
  (GENERATE_TAGS, Analyze, Mixed Caption require MiaoshouAI PromptGen fine-tune)
- Add 'promptgen' category to /vqa/prompts API endpoint
- Fix gaze detection: move DETECT_GAZE check before generic 'detect ' prefix
  to prevent "Detect Gaze" matching as detect target="Gaze"
- Update test suite: remove min_length tests, fix min_flavors to use mode='best',
  add acceptance-only notes, fix thinking trace detection, improve bracket/OCR tests,
  split Florence/PromptGen test coverage
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness fba942b25e feat(caption): add debug logging for Florence-2 handler 2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 5183ebec58 refactor: rename interrogate module to caption
Move all caption-related modules from modules/interrogate/ to modules/caption/
for better naming consistency:
- Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger,
  vqa, vqa_detection, waifudiffusion modules
- Add new caption.py dispatcher module
- Remove old interrogate.py (functionality moved to caption.py)
2026-02-11 02:47:41 +00:00