automatic

Commit Graph

Author	SHA1	Message	Date
vladmandic	756b4599be	merge: modules/caption Signed-off-by: vladmandic <mandic00@live.com>	2026-03-13 12:22:46 +01:00
CalamitousFelicitousness	2f30e466e1	fix(caption): tagger batch only processes first uploaded file Align tagger batch file collection with the working VQA/OpenCLIP pattern. The previous implementation used Path wrapping and resolve() deduplication which broke multi-file uploads from the Gradio File component. Now all four batch modes (VQA, OpenCLIP, WaifuDiffusion, DeepBooru) use the same f.name file collection approach.	2026-03-07 02:26:48 +00:00
vladmandic	9df9ed1b05	update gemini models Signed-off-by: vladmandic <mandic00@live.com>	2026-03-04 07:31:36 +01:00
vladmandic	1ddd0bf33a	add gemini to prompt enhance Signed-off-by: vladmandic <mandic00@live.com>	2026-03-02 11:01:17 +01:00
vladmandic	5ff73b61a4	add google gemini to captioning Signed-off-by: vladmandic <mandic00@live.com>	2026-03-02 09:34:40 +01:00
Vladimir Mandic	d65a2d1ebc	ruff lint	2026-02-19 11:13:44 +01:00
Vladimir Mandic	e5c494f999	cleanup logger	2026-02-19 11:09:13 +01:00
Vladimir Mandic	a3074baf8b	unified logger	2026-02-19 09:46:42 +01:00
Vladimir Mandic	bfe014f5da	modernize typing	2026-02-19 09:15:37 +01:00
vladmandic	88db926ecd	remove clip as requirement Signed-off-by: vladmandic <mandic00@live.com>	2026-02-12 08:40:10 +01:00
vladmandic	da1cf2f996	refactor image methods Signed-off-by: vladmandic <mandic00@live.com>	2026-02-11 12:29:00 +01:00
vladmandic	8561da6f8c	cleanup Signed-off-by: vladmandic <mandic00@live.com>	2026-02-11 10:02:41 +01:00
vladmandic	967974ade7	merge cleanup Signed-off-by: vladmandic <mandic00@live.com>	2026-02-11 09:57:37 +01:00
CalamitousFelicitousness	dc8ecb0a64	refactor: address remaining PR #4640 review comments - Remove _get_device_dtype() indirection, inline device/dtype at call sites - Remove commented-out fallback blocks and try/finally wrappers - Add modules/sharpfin to ruff and pylint excludes in pyproject.toml - Fix import ordering in joytag.py and pixelart.py	2026-02-11 09:57:37 +01:00
CalamitousFelicitousness	76aa949a26	refactor: integrate sharpfin for high-quality image resize Vendor sharpfin library (Apache 2.0) and add centralized wrapper module (images_sharpfin.py) replacing torchvision tensor/PIL conversion and resize operations throughout the codebase. - Add modules/sharpfin/ vendored library with MKS2021, Lanczos3, Mitchell, Catmull-Rom kernels and optional Triton sparse acceleration - Add modules/images_sharpfin.py wrapper with to_tensor(), to_pil(), pil_to_tensor(), normalize(), resize(), resize_tensor() - Add resize_quality and resize_linearize_srgb settings - Add MKS2021 and Lanczos3 upscaler entries - Replace torchvision.transforms.functional imports across 18 files - to_pil() auto-detects HWC/BHWC layout, adds .round() before uint8 - Sparse Triton path falls back to dense GPU on compilation failure - Mixed-axis resize splits into two single-axis scale() calls - Masks and non-sRGB data always use linearize=False	2026-02-11 09:57:37 +01:00
CalamitousFelicitousness	80014fac7c	fix(caption): address PR review feedback - Remove superfluous SimpleNamespace import in cli/api-caption.py, use Map instead - Drop _ prefix from internal helper functions in modules/api/caption.py - Move DeepDanbooru model path to top-level models folder instead of nesting under CLIP	2026-02-11 02:50:06 +00:00
CalamitousFelicitousness	139e331d80	style(caption): fix lint warnings across caption module - Rename shadowing import in waifudiffusion batch to avoid F823/E0606 - Fix import order in cli/api-caption.py (stdlib before third-party) - Rename local variable shadowing function name in cli/api-caption.py - Remove unnecessary global statement in devices.bypass_sdpa_hijacks	2026-02-11 02:50:06 +00:00
CalamitousFelicitousness	8d67debdfd	fix(caption): use cache_dir for BLIP and Moondream model downloads - Add _load_blip_model helper with explicit cache_dir so downloads go to hfcache_dir instead of default HF cache - Pre-load BLIP model/processor before creating Interrogator config to control download location and avoid redundant loads - Set clip_model_path on config for CLIP model cache location - Add cache_dir to Moondream model and tokenizer loading	2026-02-11 02:50:06 +00:00
CalamitousFelicitousness	e2cdbe47fa	fix(caption): safetensors-only downloads, model load fixes, UI default, prefill tests - Add use_safetensors=True to all 16 model from_pretrained calls to avoid downloading redundant .bin files alongside safetensors - Add device property to JoyTag VisionModel so move_model can relocate it to CUDA (fixes 'ViT object has no attribute device') - Fix Pix2Struct dtype mismatch by casting float inputs to model dtype while preserving integer tensor types - Patch AutoConfig.register with exist_ok=True during Ovis loading to handle duplicate aimv2 registration on model reload - Detect Qwen VL fine-tune architecture from config model_type instead of repo name, fixing ToriiGate and similar third-party fine-tunes - Change UI default task from Short Caption to Normal Caption, and preserve it on model switch instead of resetting to Use Prompt - Add dual-prefill testing across 5 VQA test methods using a shared _check_prefill helper - Fix pre-existing ruff W605 in strip_think_xml_tags docstring	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	57659ab642	fix(caption): set clip_interrogator params on config, not instance update_caption_params() was setting caption_max_length, chunk_size, and flavor_intermediate_count on the Interrogator instance, but the library reads them from self.config. The overrides were silently ignored.	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	17b03ed8e4	feat(caption): add Florence detection parsing, SDPA bypass, and offload support - Add parse_florence_detections() and format_florence_response() to vqa_detection for handling Florence-2 detection output formats - Add bypass_sdpa_hijacks() context manager to devices.py for models incompatible with SageAttention or other SDPA hijacks - Add OpenCLIP model offload support when caption_offload is enabled	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	443a73b740	refactor(caption): code review fixes for offload, inference, and maintainability Comprehensive review of modules/caption/ addressing memory management, consistency, and code quality: Inference correctness: - Add devices.inference_context() to _qwen(), _smol(), _sa2() handlers - Remove redundant @torch.no_grad() decorator from joycaption predict() - Remove dead dtype=torch.bfloat16 kwarg from Florence loader Memory management: - Bound moondream3 image cache with LRU eviction (max 8 entries) - Replace fragile id(image) cache keys with content-based md5 hash - Add devices.torch_gc() after model loading in deepseek - Move deepbooru model to CPU before dropping reference on unload - Add external handler delegation to VQA.unload() (moondream3, joycaption, joytag, deepseek) - Protect batch offload mutation with try/finally Code deduplication: - Extract strip_think_xml_tags() shared helper for Qwen/Gemma/SmolVLM - Extract save_tags_to_file() into tagger.py from deepbooru and waifudiffusion Documentation and clarity: - Document deepseek global monkey-patches (LlamaFlashAttention2, attrdict) - Document Florence task="task" as intentional design choice - Add vendored-code comment to joytag.py - Document openclip direct .to() usage vs sd_models.move_model - Comment model.eval() calls that are required (trust_remote_code, custom loaders) vs removed where redundant (standard from_pretrained) API robustness: - Add HTTP 422 error response for VQA caption error strings in API endpoints (post_vqa, _dispatch_vlm)	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	bf7a72f12e	fix(caption): remove dead min_length param, split Florence/PromptGen prompts, fix gaze detection - Remove caption_openclip_min_length from settings, API models, endpoints, and UI (clip_interrogator library has no min_length support; parameter was never functional) - Split vlm_prompts_florence into base Florence prompts and PromptGen-only prompts (GENERATE_TAGS, Analyze, Mixed Caption require MiaoshouAI PromptGen fine-tune) - Add 'promptgen' category to /vqa/prompts API endpoint - Fix gaze detection: move DETECT_GAZE check before generic 'detect ' prefix to prevent "Detect Gaze" matching as detect target="Gaze" - Update test suite: remove min_length tests, fix min_flavors to use mode='best', add acceptance-only notes, fix thinking trace detection, improve bracket/OCR tests, split Florence/PromptGen test coverage	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	fba942b25e	feat(caption): add debug logging for Florence-2 handler	2026-02-11 02:48:11 +00:00
CalamitousFelicitousness	5183ebec58	refactor: rename interrogate module to caption Move all caption-related modules from modules/interrogate/ to modules/caption/ for better naming consistency: - Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger, vqa, vqa_detection, waifudiffusion modules - Add new caption.py dispatcher module - Remove old interrogate.py (functionality moved to caption.py)	2026-02-11 02:47:41 +00:00

25 Commits (09e41b65d3749618c5ae5d06e0fa73847874ecf7)