CalamitousFelicitousness
443a73b740
refactor(caption): code review fixes for offload, inference, and maintainability
...
Comprehensive review of modules/caption/ addressing memory management,
consistency, and code quality:
Inference correctness:
- Add devices.inference_context() to _qwen(), _smol(), _sa2() handlers
- Remove redundant @torch.no_grad() decorator from joycaption predict()
- Remove dead dtype=torch.bfloat16 kwarg from Florence loader
Memory management:
- Bound moondream3 image cache with LRU eviction (max 8 entries)
- Replace fragile id(image) cache keys with content-based md5 hash
- Add devices.torch_gc() after model loading in deepseek
- Move deepbooru model to CPU before dropping reference on unload
- Add external handler delegation to VQA.unload() (moondream3,
joycaption, joytag, deepseek)
- Protect batch offload mutation with try/finally
Code deduplication:
- Extract strip_think_xml_tags() shared helper for Qwen/Gemma/SmolVLM
- Extract save_tags_to_file() into tagger.py from deepbooru and
waifudiffusion
Documentation and clarity:
- Document deepseek global monkey-patches (LlamaFlashAttention2, attrdict)
- Document Florence task="task" as intentional design choice
- Add vendored-code comment to joytag.py
- Document openclip direct .to() usage vs sd_models.move_model
- Comment model.eval() calls that are required (trust_remote_code,
custom loaders) vs removed where redundant (standard from_pretrained)
API robustness:
- Add HTTP 422 error response for VQA caption error strings in API
endpoints (post_vqa, _dispatch_vlm)
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness
5183ebec58
refactor: rename interrogate module to caption
...
Move all caption-related modules from modules/interrogate/ to modules/caption/
for better naming consistency:
- Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger,
vqa, vqa_detection, waifudiffusion modules
- Add new caption.py dispatcher module
- Remove old interrogate.py (functionality moved to caption.py)
2026-02-11 02:47:41 +00:00