automatic

Commit Graph

Author	SHA1	Message	Date
vladmandic	756b4599be	merge: modules/caption Signed-off-by: vladmandic <mandic00@live.com>	2026-03-13 12:22:46 +01:00
Vladimir Mandic	e5c494f999	cleanup logger	2026-02-19 11:09:13 +01:00
Vladimir Mandic	a3074baf8b	unified logger	2026-02-19 09:46:42 +01:00
Vladimir Mandic	bfe014f5da	modernize typing	2026-02-19 09:15:37 +01:00
CalamitousFelicitousness	e2cdbe47fa	fix(caption): safetensors-only downloads, model load fixes, UI default, prefill tests - Add use_safetensors=True to all 16 model from_pretrained calls to avoid downloading redundant .bin files alongside safetensors - Add device property to JoyTag VisionModel so move_model can relocate it to CUDA (fixes 'ViT object has no attribute device') - Fix Pix2Struct dtype mismatch by casting float inputs to model dtype while preserving integer tensor types - Patch AutoConfig.register with exist_ok=True during Ovis loading to handle duplicate aimv2 registration on model reload - Detect Qwen VL fine-tune architecture from config model_type instead of repo name, fixing ToriiGate and similar third-party fine-tunes - Change UI default task from Short Caption to Normal Caption, and preserve it on model switch instead of resetting to Use Prompt - Add dual-prefill testing across 5 VQA test methods using a shared _check_prefill helper - Fix pre-existing ruff W605 in strip_think_xml_tags docstring	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	443a73b740	refactor(caption): code review fixes for offload, inference, and maintainability Comprehensive review of modules/caption/ addressing memory management, consistency, and code quality: Inference correctness: - Add devices.inference_context() to _qwen(), _smol(), _sa2() handlers - Remove redundant @torch.no_grad() decorator from joycaption predict() - Remove dead dtype=torch.bfloat16 kwarg from Florence loader Memory management: - Bound moondream3 image cache with LRU eviction (max 8 entries) - Replace fragile id(image) cache keys with content-based md5 hash - Add devices.torch_gc() after model loading in deepseek - Move deepbooru model to CPU before dropping reference on unload - Add external handler delegation to VQA.unload() (moondream3, joycaption, joytag, deepseek) - Protect batch offload mutation with try/finally Code deduplication: - Extract strip_think_xml_tags() shared helper for Qwen/Gemma/SmolVLM - Extract save_tags_to_file() into tagger.py from deepbooru and waifudiffusion Documentation and clarity: - Document deepseek global monkey-patches (LlamaFlashAttention2, attrdict) - Document Florence task="task" as intentional design choice - Add vendored-code comment to joytag.py - Document openclip direct .to() usage vs sd_models.move_model - Comment model.eval() calls that are required (trust_remote_code, custom loaders) vs removed where redundant (standard from_pretrained) API robustness: - Add HTTP 422 error response for VQA caption error strings in API endpoints (post_vqa, _dispatch_vlm)	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	5183ebec58	refactor: rename interrogate module to caption Move all caption-related modules from modules/interrogate/ to modules/caption/ for better naming consistency: - Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger, vqa, vqa_detection, waifudiffusion modules - Add new caption.py dispatcher module - Remove old interrogate.py (functionality moved to caption.py)	2026-02-11 02:47:41 +00:00

7 Commits (09e41b65d3749618c5ae5d06e0fa73847874ecf7)