automatic

Commit Graph

Author	SHA1	Message	Date
vladmandic	9df9ed1b05	update gemini models Signed-off-by: vladmandic <mandic00@live.com>	2026-03-04 07:31:36 +01:00
vladmandic	5ff73b61a4	add google gemini to captioning Signed-off-by: vladmandic <mandic00@live.com>	2026-03-02 09:34:40 +01:00
Vladimir Mandic	d65a2d1ebc	ruff lint	2026-02-19 11:13:44 +01:00
Vladimir Mandic	e5c494f999	cleanup logger	2026-02-19 11:09:13 +01:00
Vladimir Mandic	a3074baf8b	unified logger	2026-02-19 09:46:42 +01:00
Vladimir Mandic	bfe014f5da	modernize typing	2026-02-19 09:15:37 +01:00
CalamitousFelicitousness	8d67debdfd	fix(caption): use cache_dir for BLIP and Moondream model downloads - Add _load_blip_model helper with explicit cache_dir so downloads go to hfcache_dir instead of default HF cache - Pre-load BLIP model/processor before creating Interrogator config to control download location and avoid redundant loads - Set clip_model_path on config for CLIP model cache location - Add cache_dir to Moondream model and tokenizer loading	2026-02-11 02:50:06 +00:00
CalamitousFelicitousness	e2cdbe47fa	fix(caption): safetensors-only downloads, model load fixes, UI default, prefill tests - Add use_safetensors=True to all 16 model from_pretrained calls to avoid downloading redundant .bin files alongside safetensors - Add device property to JoyTag VisionModel so move_model can relocate it to CUDA (fixes 'ViT object has no attribute device') - Fix Pix2Struct dtype mismatch by casting float inputs to model dtype while preserving integer tensor types - Patch AutoConfig.register with exist_ok=True during Ovis loading to handle duplicate aimv2 registration on model reload - Detect Qwen VL fine-tune architecture from config model_type instead of repo name, fixing ToriiGate and similar third-party fine-tunes - Change UI default task from Short Caption to Normal Caption, and preserve it on model switch instead of resetting to Use Prompt - Add dual-prefill testing across 5 VQA test methods using a shared _check_prefill helper - Fix pre-existing ruff W605 in strip_think_xml_tags docstring	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	443a73b740	refactor(caption): code review fixes for offload, inference, and maintainability Comprehensive review of modules/caption/ addressing memory management, consistency, and code quality: Inference correctness: - Add devices.inference_context() to _qwen(), _smol(), _sa2() handlers - Remove redundant @torch.no_grad() decorator from joycaption predict() - Remove dead dtype=torch.bfloat16 kwarg from Florence loader Memory management: - Bound moondream3 image cache with LRU eviction (max 8 entries) - Replace fragile id(image) cache keys with content-based md5 hash - Add devices.torch_gc() after model loading in deepseek - Move deepbooru model to CPU before dropping reference on unload - Add external handler delegation to VQA.unload() (moondream3, joycaption, joytag, deepseek) - Protect batch offload mutation with try/finally Code deduplication: - Extract strip_think_xml_tags() shared helper for Qwen/Gemma/SmolVLM - Extract save_tags_to_file() into tagger.py from deepbooru and waifudiffusion Documentation and clarity: - Document deepseek global monkey-patches (LlamaFlashAttention2, attrdict) - Document Florence task="task" as intentional design choice - Add vendored-code comment to joytag.py - Document openclip direct .to() usage vs sd_models.move_model - Comment model.eval() calls that are required (trust_remote_code, custom loaders) vs removed where redundant (standard from_pretrained) API robustness: - Add HTTP 422 error response for VQA caption error strings in API endpoints (post_vqa, _dispatch_vlm)	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	bf7a72f12e	fix(caption): remove dead min_length param, split Florence/PromptGen prompts, fix gaze detection - Remove caption_openclip_min_length from settings, API models, endpoints, and UI (clip_interrogator library has no min_length support; parameter was never functional) - Split vlm_prompts_florence into base Florence prompts and PromptGen-only prompts (GENERATE_TAGS, Analyze, Mixed Caption require MiaoshouAI PromptGen fine-tune) - Add 'promptgen' category to /vqa/prompts API endpoint - Fix gaze detection: move DETECT_GAZE check before generic 'detect ' prefix to prevent "Detect Gaze" matching as detect target="Gaze" - Update test suite: remove min_length tests, fix min_flavors to use mode='best', add acceptance-only notes, fix thinking trace detection, improve bracket/OCR tests, split Florence/PromptGen test coverage	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	fba942b25e	feat(caption): add debug logging for Florence-2 handler	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	5183ebec58	refactor: rename interrogate module to caption Move all caption-related modules from modules/interrogate/ to modules/caption/ for better naming consistency: - Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger, vqa, vqa_detection, waifudiffusion modules - Add new caption.py dispatcher module - Remove old interrogate.py (functionality moved to caption.py)	2026-02-11 02:47:41 +00:00

12 Commits (0bdbf300ac347640e5b81dff2fa187bcfe05574c)