Commit Graph

7603 Commits (3a65d561a70f60d2c67f607d2b00a944c7c427ed)

Author SHA1 Message Date
vladmandic 3a65d561a7 add google-veo-3.1
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-09 19:14:08 +01:00
vladmandic acca58f50c add kandinsky5
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-09 09:47:22 +01:00
vladmandic f91af19094 update video models
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-09 09:22:28 +01:00
Disty0 1c2a81ee2d Make SDNQDequantizer a dataclass 2025-12-08 22:29:45 +03:00
vladmandic 3f161b5532 lint moondream
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-08 18:16:00 +01:00
vladmandic 69f0d6bf5d lint
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-08 18:12:47 +01:00
Vladimir Mandic 5a1d60e1b9
Merge pull request #4448 from CalamitousFelicitousness/feat/vqa-prefill-thinking-moondream3
VQA Refactor
2025-12-08 17:43:48 +01:00
Disty0 d4e2cbb826 SDNQ fix torch.compile always being active 2025-12-08 18:15:08 +03:00
Disty0 3ae7ecdbad SDNQ fix quantization_device getting ignored on post load quant 2025-12-08 01:29:52 +03:00
Disty0 064b64c76c cleanup 2025-12-08 01:14:19 +03:00
Disty0 6e05a12a49 SDNQ post process pre-quants after load 2025-12-08 01:08:53 +03:00
Disty0 0835ca6f66 SDNQ add explicit model.quantization_method = QuantizationMethod.SDNQ 2025-12-08 00:46:40 +03:00
Disty0 7a6356f8eb SDNQ fix transformers v5 and check for torch._dynamo.config.disable 2025-12-08 00:36:15 +03:00
Disty0 4f90054bf7 SDNQ transformers v5 support 2025-12-07 21:37:41 +03:00
Vladimir Mandic 469962cc9c
Merge pull request #4453 from awsr/python-datetime-compat
Fix timestamp formatting for thumbnails
2025-12-07 06:49:38 +01:00
awsr f01e977695
Fix timestamp formatting for thumbnails 2025-12-06 18:34:15 -08:00
vladmandic 7bd04e0b5c add /detailers api endpoint
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-06 12:33:52 +01:00
CalamitousFelicitousness a51e1501d6 fix(vqa): no moondream3 compile during explicit load
- Initialize KV caches before moving model to device
- Disable flex_attention decoding to avoid torch.compile hang
- Remove unused compile step (controlled by cuda_compile setting)

The flex_attention's create_block_mask triggers torch compilation
which can hang the system when called during model preload.
2025-12-06 02:26:34 +00:00
CalamitousFelicitousness 7714f71994 feat(vqa): un/load support and extract detection
Make external VQA handlers (moondream3, joytag, joycaption, deepseek)
compatible with VQA load/unload mechanism for consistent model lifecycle.

- Added vqa_detection.py, add shared detection helpers
- Add load and unload functions to all external handlers
- Replace device_map="auto" with sd_models.move_model in joycaption
- Update dispatcher and moondream handlers to use shared helpers
2025-12-05 23:52:02 +00:00
CalamitousFelicitousness 5193285bc7 refactor(vqa): convert to class-based singleton
Refactor VQA module from module-level globals to a VQA class singleton
  pattern with self-contained per-model loading methods.

Changes:
- Add VQA class with model/processor state and detection data storage
- Extract load methods for clean model pre-loading via UI
- Interrogate to return string only; store detection data on instance
- Add vqa_draw.py for bounding box/point annotation utilities
    Stub, further transfer of drawing functions to follow
- Update moondream3.py to store detection data on VQA singleton
- Update endpoints.py and ui_caption.py for new return type
2025-12-05 20:53:18 +00:00
Disty0 1cfb61809f cleanup 2025-12-05 18:40:49 +03:00
Disty0 5b86bef796 SDNQ add longcat keys 2025-12-05 18:37:20 +03:00
CalamitousFelicitousness d1b1d574a6 fix(vqa): add graceful error for empty "Use Prompt" task
Replace silent fallback to "Describe the image" with explicit error
when user selects "Use Prompt" but leaves the prompt field empty.
Follows the same pattern as missing image validation.
2025-12-05 01:48:07 +00:00
CalamitousFelicitousness a8a9e6d836 fix(vqa): separate Moondream 2 and 3 task prompts
Moondream 3 does not support gaze detection (detect_gaze method),
so "Detect Gaze" task is now only shown for Moondream 2.
2025-12-05 01:38:28 +00:00
CalamitousFelicitousness 195161c436 fix(settings): hide VLM prefill/thinking settings from Settings UI
These settings are accessible from the Caption tab and can be saved
as defaults via "Set UI defaults", so they don't need to appear in
Settings > Interrogate.
2025-12-05 00:54:24 +00:00
CalamitousFelicitousness 2b6226b62b feat(vqa): persist thinking mode and improve reasoning output formatting
- Add interrogate_vlm_thinking_mode setting to save checkbox state
- Update ui_caption to restore Thinking Mode preference on load
- Add blank line before 'Answer:' label for visual separation
- Remove '\n\n' replacement in clean() that stripped blank lines
- Fix Qwen reasoning detection when <think> tag is in prompt, not response
- Add reasoning icon to Moondream 2 and 3 model names
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness a4b5e84a13 feat(vqa): enhance Moondream 2 with reasoning mode, gaze detection, and annotations
- Add thinking_mode/reasoning parameter to enable reasoning mode
- Add Detect Gaze task with placeholder hint
- Parse point/detect results to return annotation data for visualization
- Handle keep_thinking setting: format as "Reasoning:\n...\nAnswer:\n..." or discard
- Add comprehensive debug logging throughout handler
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness c75a09be83 fix(vqa): handle Moondream point and detect tasks
Add handlers for "Point at..." and "Detect..." tasks in moondream()
that were falling through to answer_question() and failing.
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness 506515b018 feat(vqa): add load/unload model buttons to Caption tab
- Add load_model() function to pre-load VLM into memory
- Add unload_model() function to free VLM from memory
- Add Load/Unload buttons to Caption tab UI
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness a90d85ddfd feat(ui): add dynamic task selection based on VLM model
- Rename "Predefined question" to "Task"
- Task dropdown updates choices when model changes
- Prompt placeholder updates based on selected task
- Model-specific tasks: Florence-2 gets detection tasks, Moondream gets point/detect
2025-12-05 00:00:25 +00:00
CalamitousFelicitousness 4df6aa7944 fix(ui): set prefill text to empty by default 2025-12-05 00:00:25 +00:00
CalamitousFelicitousness 0d88fcd396 feat(ui): add prefill and thinking controls to Caption tab
Add minimal UI controls to expose new VQA functionality:
- Prefill Text input for guiding VLM responses
- Thinking Mode checkbox for reasoning models
- Keep Thinking Trace checkbox for output retention
- Keep Prefill checkbox for output retention
- Annotated Image output panel for detection visualization
- Updated button handlers to pass new parameters
2025-12-05 00:00:24 +00:00
CalamitousFelicitousness c2810dfee2 fix(api): update VQA API endpoint for tuple return format
Update interrogate API endpoint to handle the new (text, image)
tuple return format from VQA interrogate function.
2025-12-05 00:00:24 +00:00
CalamitousFelicitousness 27fa48cc99 feat(vqa): major VQA handler refactor with prefill, thinking, and visualization
Comprehensive overhaul of the VQA interrogation system including:
- Prefill text support for guiding VLM responses
- Thinking mode support with tag cleanup/retention
- Dynamic prompt/task selection based on model type
- Bounding box visualization for detection results
- Debug infrastructure (SD_VQA_DEBUG env var)
- New model support: MiMo-VL, Nidum Gemma, Allura Gemma
- Model-specific prompt lists (Florence, Moondream)
2025-12-05 00:00:24 +00:00
CalamitousFelicitousness 0a322c0faf feat(vqa): add Moondream 3 Preview handler
Add support for Moondream 3 Preview VLM with:
- Text query, caption, point, and detect capabilities
- Bounding box visualization for object detection
- Max pixels setting for resolution control
- Device offloading support
2025-12-05 00:00:24 +00:00
CalamitousFelicitousness c024c0c9c6 feat(settings): add VLM prefill and thinking retention options
Add new VLM configuration options:
- interrogate_vlm_keep_prefill: Keep prefill text in output
- interrogate_vlm_keep_thinking: Keep reasoning trace in output

Also adjust defaults:
- Change interrogate_clip_flavor_count: 16 -> 1024 with updated range
- Change interrogate_vlm_prompt default to first item ("Use Prompt")
2025-12-05 00:00:24 +00:00
CalamitousFelicitousness 85cd222793 fix(vqa): sort CLiP analysis results and add text output
Improvements to the OpenCLIP interrogation:
- Sort all ranking dicts by similarity score (descending)
- Add format_category() helper for text formatting
- Add formatted text output for CLIP labels textbox
- Return additional text update in analyze_image()
2025-12-02 21:48:09 +00:00
CalamitousFelicitousness eb832a4850 fix(vqa): respect offload setting in JoyCaption, add max_pixels
Two fixes for the JoyCaption handler:
- Only offload model if shared.opts.interrogate_offload is True
- Add max_pixels=1024*1024 to AutoProcessor for consistent image handling
2025-12-02 21:46:09 +00:00
CalamitousFelicitousness 766cb49928 feat(ui): add vision and reasoning symbols, fix dropdown fonts
Add new Font Awesome symbols for model capability indicators:
- vision symbol (eye icon) for vision-capable VLM models
- reasoning symbol (lightbulb icon) for thinking/reasoning models

Also fix dropdown font styling by adding NotoSans font-family.
2025-12-02 21:43:13 +00:00
vladmandic d3a2f6c7ed fix loading local prequant models
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-02 20:53:19 +01:00
vladmandic 0ad40d2b8b lint
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-02 12:25:04 +01:00
vladmandic 39bced0987 Merge branch 'dev' of https://github.com/vladmandic/sdnext into dev 2025-12-02 10:40:31 +01:00
vladmandic 903d47f9e6 add zimage and f2 to lora overrides
Signed-off-by: vladmandic <mandic00@live.com>
2025-12-02 10:40:27 +01:00
Vladimir Mandic 3b4f909862
Merge pull request #4436 from CalamitousFelicitousness/runai-update
Update runai-model-streamer logging integration
2025-12-02 03:59:38 -05:00
Vladimir Mandic 1673380b94
Merge pull request #4430 from awsr/fix_show_progress
show_progress requires "full", "minimal", or "hidden"
2025-12-02 03:50:34 -05:00
Vladimir Mandic de3ebf470d
Merge pull request #4428 from awsr/revert-for-now
Revert changes that require at least Python version 3.12
2025-12-02 03:49:20 -05:00
CalamitousFelicitousness 55c089ae48 Update runai-model-streamer logging integration
- Remove stdout redirect monkeypatch (fixed in runai v0.15.1 via PR #97)
- Add RUNAI_STREAMER_LOG_LEVEL controlled by SD_LOAD_DEBUG
- Add one-time runai config log when hijack is activated
- Add `loader=runai|default` to model loading logs
- Remove per-file logging clutter from sd_hijack_safetensors.py
2025-12-02 02:01:51 +00:00
Disty0 7aa1bfdc70 Add get_modules_to_not_convert from transformers v5 2025-12-02 01:01:51 +03:00
Disty0 d9bc31e7da Cleanup 2025-11-29 01:46:04 +03:00
Disty0 01a0f6b356 Warn and disable quantized matmul if triton is not available 2025-11-29 01:34:54 +03:00