Commit Graph

12597 Commits (0e0b607cfaabf59167e3ab2a44d491b4e4e1b08e)

Author SHA1 Message Date
Vladimir Mandic 31ec4cb6e0
Merge pull request #4644 from awsr/patch-2
Add progress bar fallback if hashing fails
2026-02-12 12:46:29 +01:00
awsr e683884d5f
Move trigger back to correct location 2026-02-12 03:41:44 -08:00
awsr 9f9d67713d
Fix error message display 2026-02-12 03:41:00 -08:00
awsr 6971f3438c
Implement error method for gallery progress
Handle instances when Web Crypto API is not available.
2026-02-12 03:40:02 -08:00
Vladimir Mandic 63c5e493be add software version of sha256
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2026-02-12 11:34:28 +00:00
Vladimir Mandic c25c35ebb3 live preview ui sizing
Signed-off-by: Vladimir Mandic <mandic00@live.com>
2026-02-12 10:15:30 +00:00
vladmandic ae8a6257c4 typo
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-12 10:37:40 +01:00
vladmandic ccd3e2e489 cleanup settings
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-12 10:30:39 +01:00
vladmandic 88db926ecd remove clip as requirement
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-12 08:40:10 +01:00
vladmandic 3ee816888e reduce logging on prompt apply
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 18:37:35 +01:00
vladmandic cf5e1e0df2 cleanup convert
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 18:09:57 +01:00
vladmandic da1cf2f996 refactor image methods
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 12:29:00 +01:00
vladmandic 0ed64ec195 cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 11:34:40 +01:00
vladmandic 3a4efcc444 update pyproject.toml
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 11:28:01 +01:00
vladmandic 1b4f94660f cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 11:15:59 +01:00
Vladimir Mandic 41f206dec9
Merge pull request #4637 from CalamitousFelicitousness/refactor/remove-face-restoration
Refactor/remove face restoration
2026-02-11 11:12:34 +01:00
vladmandic 78c58e0d70 update precommit
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 11:12:21 +01:00
CalamitousFelicitousness 8563e2a853 refactor: restore codeformer_model.py to avoid merge conflicts 2026-02-11 09:59:55 +00:00
vladmandic b4e5b563c6 update lint rules
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 10:47:07 +01:00
vladmandic 73a5d55022 cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 10:12:37 +01:00
vladmandic 8561da6f8c cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 10:02:41 +01:00
vladmandic 967974ade7 merge cleanup
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 09:57:37 +01:00
vladmandic 3ae9909b2a update sharpfin usage
Signed-off-by: vladmandic <mandic00@live.com>
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness dc8ecb0a64 refactor: address remaining PR #4640 review comments
- Remove _get_device_dtype() indirection, inline device/dtype at call sites
- Remove commented-out fallback blocks and try/finally wrappers
- Add modules/sharpfin to ruff and pylint excludes in pyproject.toml
- Fix import ordering in joytag.py and pixelart.py
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness 162651cbdb refactor: address PR #4640 review comments
Changes based on vladmandic and Disty0 feedback:

- Fix logging: use direct `from installer import log` instead of lazy _get_log()
- Remove unused is_available() function
- Remove defensive getattr() calls in _resolve_kernel/_resolve_linearize
- Simplify _get_device_dtype() to use devices module directly
- Refactor to_pil() with single Image.fromarray() call and explicit mode
- Add cross-platform fallback: sharpfin only runs on CUDA, falls back to
  PIL/F.interpolate for other devices (CPU, MPS, OpenVINO)
- Replace lambdas with functools.partial in functional.py for torch.compile safety
- Add modules/sharpfin to pylint ignore-paths (vendored code)
2026-02-11 09:57:37 +01:00
CalamitousFelicitousness 76aa949a26 refactor: integrate sharpfin for high-quality image resize
Vendor sharpfin library (Apache 2.0) and add centralized wrapper
module (images_sharpfin.py) replacing torchvision tensor/PIL
conversion and resize operations throughout the codebase.

- Add modules/sharpfin/ vendored library with MKS2021, Lanczos3,
  Mitchell, Catmull-Rom kernels and optional Triton sparse acceleration
- Add modules/images_sharpfin.py wrapper with to_tensor(), to_pil(),
  pil_to_tensor(), normalize(), resize(), resize_tensor()
- Add resize_quality and resize_linearize_srgb settings
- Add MKS2021 and Lanczos3 upscaler entries
- Replace torchvision.transforms.functional imports across 18 files
- to_pil() auto-detects HWC/BHWC layout, adds .round() before uint8
- Sparse Triton path falls back to dense GPU on compilation failure
- Mixed-axis resize splits into two single-axis scale() calls
- Masks and non-sRGB data always use linearize=False
2026-02-11 09:57:37 +01:00
Vladimir Mandic 2c4d0751d9
Merge pull request #4613 from CalamitousFelicitousness/feat/caption-improvements-v2_backup
Caption system overhaul V2
2026-02-11 09:33:29 +01:00
CalamitousFelicitousness 80014fac7c fix(caption): address PR review feedback
- Remove superfluous SimpleNamespace import in cli/api-caption.py, use Map instead
- Drop _ prefix from internal helper functions in modules/api/caption.py
- Move DeepDanbooru model path to top-level models folder instead of nesting under CLIP
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 139e331d80 style(caption): fix lint warnings across caption module
- Rename shadowing import in waifudiffusion batch to avoid F823/E0606
- Fix import order in cli/api-caption.py (stdlib before third-party)
- Rename local variable shadowing function name in cli/api-caption.py
- Remove unnecessary global statement in devices.bypass_sdpa_hijacks
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 8d67debdfd fix(caption): use cache_dir for BLIP and Moondream model downloads
- Add _load_blip_model helper with explicit cache_dir so downloads
  go to hfcache_dir instead of default HF cache
- Pre-load BLIP model/processor before creating Interrogator config
  to control download location and avoid redundant loads
- Set clip_model_path on config for CLIP model cache location
- Add cache_dir to Moondream model and tokenizer loading
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness 6c20e49897 refactor(caption): extract caption API into standalone module
Move all caption/interrogate/tagger/VQA API code out of the monolithic
endpoints.py and models.py into a new self-contained modules/api/caption.py,
following the loras.py / nudenet.py self-registering pattern.

- Move 15 Pydantic models (ReqCaption, ResCaption, ReqVQA, ResVQA,
  ReqTagger, ResTagger, dispatch union types, etc.) from models.py
- Move 11 handler functions from endpoints.py
- Deduplicate ~150 lines via shared _do_openclip, _do_tagger, _do_vqa
  core functions called by both direct and dispatch endpoints
- Add register_api() that registers all 8 caption routes
- Add promptgen field to ResVLMPrompts (bug fix: handler returned it
  but response model silently dropped it)
- Improve all endpoint docstrings and Field descriptions for API docs
2026-02-11 02:50:06 +00:00
CalamitousFelicitousness e2cdbe47fa fix(caption): safetensors-only downloads, model load fixes, UI default, prefill tests
- Add use_safetensors=True to all 16 model from_pretrained calls to
  avoid downloading redundant .bin files alongside safetensors
- Add device property to JoyTag VisionModel so move_model can relocate
  it to CUDA (fixes 'ViT object has no attribute device')
- Fix Pix2Struct dtype mismatch by casting float inputs to model dtype
  while preserving integer tensor types
- Patch AutoConfig.register with exist_ok=True during Ovis loading to
  handle duplicate aimv2 registration on model reload
- Detect Qwen VL fine-tune architecture from config model_type instead
  of repo name, fixing ToriiGate and similar third-party fine-tunes
- Change UI default task from Short Caption to Normal Caption, and
  preserve it on model switch instead of resetting to Use Prompt
- Add dual-prefill testing across 5 VQA test methods using a shared
  _check_prefill helper
- Fix pre-existing ruff W605 in strip_think_xml_tags docstring
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 57659ab642 fix(caption): set clip_interrogator params on config, not instance
update_caption_params() was setting caption_max_length, chunk_size, and
flavor_intermediate_count on the Interrogator instance, but the library
reads them from self.config. The overrides were silently ignored.
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 17b03ed8e4 feat(caption): add Florence detection parsing, SDPA bypass, and offload support
- Add parse_florence_detections() and format_florence_response() to
  vqa_detection for handling Florence-2 detection output formats
- Add bypass_sdpa_hijacks() context manager to devices.py for models
  incompatible with SageAttention or other SDPA hijacks
- Add OpenCLIP model offload support when caption_offload is enabled
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 443a73b740 refactor(caption): code review fixes for offload, inference, and maintainability
Comprehensive review of modules/caption/ addressing memory management,
consistency, and code quality:

Inference correctness:
- Add devices.inference_context() to _qwen(), _smol(), _sa2() handlers
- Remove redundant @torch.no_grad() decorator from joycaption predict()
- Remove dead dtype=torch.bfloat16 kwarg from Florence loader

Memory management:
- Bound moondream3 image cache with LRU eviction (max 8 entries)
- Replace fragile id(image) cache keys with content-based md5 hash
- Add devices.torch_gc() after model loading in deepseek
- Move deepbooru model to CPU before dropping reference on unload
- Add external handler delegation to VQA.unload() (moondream3,
  joycaption, joytag, deepseek)
- Protect batch offload mutation with try/finally

Code deduplication:
- Extract strip_think_xml_tags() shared helper for Qwen/Gemma/SmolVLM
- Extract save_tags_to_file() into tagger.py from deepbooru and
  waifudiffusion

Documentation and clarity:
- Document deepseek global monkey-patches (LlamaFlashAttention2, attrdict)
- Document Florence task="task" as intentional design choice
- Add vendored-code comment to joytag.py
- Document openclip direct .to() usage vs sd_models.move_model
- Comment model.eval() calls that are required (trust_remote_code,
  custom loaders) vs removed where redundant (standard from_pretrained)

API robustness:
- Add HTTP 422 error response for VQA caption error strings in API
  endpoints (post_vqa, _dispatch_vlm)
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness bf7a72f12e fix(caption): remove dead min_length param, split Florence/PromptGen prompts, fix gaze detection
- Remove caption_openclip_min_length from settings, API models, endpoints, and UI
  (clip_interrogator library has no min_length support; parameter was never functional)
- Split vlm_prompts_florence into base Florence prompts and PromptGen-only prompts
  (GENERATE_TAGS, Analyze, Mixed Caption require MiaoshouAI PromptGen fine-tune)
- Add 'promptgen' category to /vqa/prompts API endpoint
- Fix gaze detection: move DETECT_GAZE check before generic 'detect ' prefix
  to prevent "Detect Gaze" matching as detect target="Gaze"
- Update test suite: remove min_length tests, fix min_flavors to use mode='best',
  add acceptance-only notes, fix thinking trace detection, improve bracket/OCR tests,
  split Florence/PromptGen test coverage
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness fba942b25e feat(caption): add debug logging for Florence-2 handler 2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 0c45e58e80 docs: update localization and README for caption module
- Update html/locale_en.json with caption-related strings
- Update README.md documentation
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 588222f2d1 test: update caption API tests
Update cli/test-caption-api.py:
- Update test structure for new caption API endpoints
- Fix Moondream gaze detection test prompt to use 'Detect Gaze'
  instead of 'Where is the person looking?' to match handler trigger
- Improve test result categorization and tracking
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness d78c5c1cd0 refactor: update CLI tools for caption module
- Rename cli/api-interrogate.py to cli/api-caption.py
- Update cli/options.py, cli/process.py for new module paths
- Update cli/test-tagger.py for caption module imports
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness f4b5abde68 refactor: update API for caption module
Update API endpoints and models for caption module rename:
- modules/api/api.py - update imports and endpoint handlers
- modules/api/endpoints.py - update endpoint definitions
- modules/api/models.py - update request/response models
2026-02-11 02:48:11 +00:00
CalamitousFelicitousness 61b031ada5 refactor: update imports for caption module rename
Update all imports from modules.interrogate to modules.caption across:
- modules/shared.py, modules/shared_legacy.py
- modules/ui_caption.py, modules/ui_common.py
- modules/ui_control.py, modules/ui_control_helpers.py
- modules/ui_img2img.py, modules/ui_sections.py
- modules/ui_symbols.py, modules/ui_video_vlm.py
2026-02-11 02:47:41 +00:00
CalamitousFelicitousness 5183ebec58 refactor: rename interrogate module to caption
Move all caption-related modules from modules/interrogate/ to modules/caption/
for better naming consistency:
- Rename deepbooru, deepseek, joycaption, joytag, moondream3, openclip, tagger,
  vqa, vqa_detection, waifudiffusion modules
- Add new caption.py dispatcher module
- Remove old interrogate.py (functionality moved to caption.py)
2026-02-11 02:47:41 +00:00
CalamitousFelicitousness 83fa8e39ba refactor(api): update cli tools for DeepBooru tagger migration
- Update cli/api-interrogate.py to use /sdapi/v1/tagger for DeepBooru
- Handle tagger response format (scores dict or tags string)
- Remove DeepBooru test from interrogate endpoint tests
- Update API model descriptions to reference tagger for anime tagging
2026-02-11 02:47:41 +00:00
CalamitousFelicitousness 7825f44581 refactor(api): remove DeepBooru from interrogate endpoint
DeepBooru/DeepDanbooru should only be accessed via the tagger endpoint.
The interrogate endpoint is now exclusively for OpenCLIP/BLIP.

- Remove DeepDanbooru handling from post_interrogate
- Update docstring to reference tagger endpoint for anime tagging
- Simplify code by removing if/else branching
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness 0559651b1b fix(vqa): fix infinite recursion and Florence-2 generation
- Fix get_keep_thinking() infinite recursion (was calling itself)
- Fix get_keep_prefill() infinite recursion (was calling itself)
- Fix Florence-2 to use beam search instead of sampling
  Sampling causes probability tensor errors with Florence-2
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness 3208067259 test(api): improve caption API test coverage and validation
Add model architecture coverage tests:
- VQA model family detection for 19 architectures
- Florence special prompts test (<OD>, <OCR>, <CAPTION>, etc.)
- Moondream detection features test
- VQA architecture capabilities test
- Tagger model types and WD version comparison tests

Improve test validation:
- Add is_meaningful_answer() to reject responses like "."
- Verify parameters have actual effect (not just accepted)
- Show actual output traces in PASS/FAIL messages
- Fix prefill tests to verify keep_prefill behavior

Add configurable timeout:
- Default timeout increased to 300s for slow models
- Add --timeout CLI argument for customization

Other improvements:
- Add JoyCaption to recognized model families
- Reduce BLIP models to avoid reloading large models
- Better detection result validation for annotated images
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness a04ba1e482 feat(api): add missing caption API parameters for UI parity
Add prompt field to VQA endpoint and advanced settings to OpenCLIP endpoint
to achieve full parity between UI and API capabilities.

VLM endpoint changes:
- Add prompt field for custom text input (required for 'Use Prompt' task)
- Pass prompt to vqa.interrogate instead of hardcoded empty string

OpenCLIP endpoint changes:
- Add 7 optional per-request override fields: min_length, max_length,
  chunk_size, min_flavors, max_flavors, flavor_count, num_beams
- Add get_clip_setting() helper for override support in openclip.py
- Apply overrides via update_interrogate_params() before interrogation

All new fields are optional with None defaults for backwards compatibility.
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness 5fc46c042e docs(api): synchronize API descriptions with UI hints
Update API model field descriptions to match the hints in locale_en.json
for consistency between UI and API documentation.

Updated models:
- ReqInterrogate: clip_model, blip_model, mode
- ReqVQA: model, question, system
- ReqTagger: model, threshold, character_threshold, max_tags,
  include_rating, sort_alpha, use_spaces, escape_brackets,
  exclude_tags, show_scores
2026-02-11 02:47:40 +00:00
CalamitousFelicitousness f431141d2f feat(api): add LLM generation parameters to VQA endpoint
Add optional LLM generation parameters to the VQA API request model,
allowing per-request override of settings:

- max_tokens, temperature, top_k, top_p, num_beams, do_sample
- thinking_mode, prefill, keep_thinking, keep_prefill

Changes:
- Add 10 new optional fields to ReqVQA model with descriptive docs
- Update get_kwargs() to support per-request overrides via singleton
- Add helper functions get_keep_thinking(), get_keep_prefill()
- Update post_vqa endpoint to pass generation kwargs
- Add _generation_overrides instance variable to VQA class
2026-02-11 02:47:40 +00:00