Commit Graph

9 Commits (fdc2f464579a9d036b1f87f21abc86e8e097ed26)

Author SHA1 Message Date
Vladimir Mandic d65a2d1ebc ruff lint 2026-02-19 11:13:44 +01:00
Vladimir Mandic e5c494f999 cleanup logger 2026-02-19 11:09:13 +01:00
Vladimir Mandic a3074baf8b unified logger 2026-02-19 09:46:42 +01:00
Vladimir Mandic bfe014f5da modernize typing 2026-02-19 09:15:37 +01:00
vladmandic 88db926ecd remove clip as requirement
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-12 08:40:10 +01:00
CalamitousFelicitousness 8d67debdfd fix(caption): use cache_dir for BLIP and Moondream model downloads
- Add _load_blip_model helper with explicit cache_dir so downloads
  go to hfcache_dir instead of default HF cache
- Pre-load BLIP model/processor before creating Interrogator config
  to control download location and avoid redundant loads
- Set clip_model_path on config for CLIP model cache location
- Add cache_dir to Moondream model and tokenizer loading
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 57659ab642 fix(caption): set clip_interrogator params on config, not instance
update_caption_params() was setting caption_max_length, chunk_size, and
flavor_intermediate_count on the Interrogator instance, but the library
reads them from self.config. The overrides were silently ignored.
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 443a73b740 refactor(caption): code review fixes for offload, inference, and maintainability
Comprehensive review of modules/caption/ addressing memory management,
consistency, and code quality:

Inference correctness:
- Add devices.inference_context() to _qwen(), _smol(), _sa2() handlers
- Remove redundant @torch.no_grad() decorator from joycaption predict()
- Remove dead dtype=torch.bfloat16 kwarg from Florence loader

Memory management:
- Bound moondream3 image cache with LRU eviction (max 8 entries)
- Replace fragile id(image) cache keys with content-based md5 hash
- Add devices.torch_gc() after model loading in deepseek
- Move deepbooru model to CPU before dropping reference on unload
- Add external handler delegation to VQA.unload() (moondream3,
  joycaption, joytag, deepseek)
- Protect batch offload mutation with try/finally

Code deduplication:
- Extract strip_think_xml_tags() shared helper for Qwen/Gemma/SmolVLM
- Extract save_tags_to_file() into tagger.py from deepbooru and
  waifudiffusion

Documentation and clarity:
- Document deepseek global monkey-patches (LlamaFlashAttention2, attrdict)
- Document Florence task="task" as intentional design choice
- Add vendored-code comment to joytag.py
- Document openclip direct .to() usage vs sd_models.move_model
- Comment model.eval() calls that are required (trust_remote_code,
  custom loaders) vs removed where redundant (standard from_pretrained)

API robustness:
- Add HTTP 422 error response for VQA caption error strings in API
  endpoints (post_vqa, _dispatch_vlm)
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 5183ebec58 refactor: rename interrogate module to caption
Move all caption-related modules from modules/interrogate/ to modules/caption/
for better naming consistency:
- Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger,
  vqa, vqa_detection, waifudiffusion modules
- Add new caption.py dispatcher module
- Remove old interrogate.py (functionality moved to caption.py)
2026-02-11 02:47:41 +00:00