diff --git a/.pylintrc b/.pylintrc
index a24604e3c..5bd9493dd 100644
--- a/.pylintrc
+++ b/.pylintrc
@@ -38,6 +38,7 @@ ignore-paths=/usr/lib/.*$,
              pipelines/flex2,
              pipelines/f_lite,
              pipelines/hidream,
+             pipelines/hdm,
              pipelines/meissonic,
              pipelines/omnigen2,
              pipelines/segmoe,
diff --git a/.ruff.toml b/.ruff.toml
index d6c39bb1a..f5b4d3c2b 100644
--- a/.ruff.toml
+++ b/.ruff.toml
@@ -20,6 +20,7 @@ exclude = [
 
     "pipelines/meissonic",
     "pipelines/omnigen2",
+    "pipelines/hdm",
     "pipelines/segmoe",
 
     "scripts/lbm",
diff --git a/CHANGELOG.md b/CHANGELOG.md
index c6d5695b0..9bb328452 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,62 @@
 # Change Log for SD.Next
 
+## Update for 2025-08-30
+
+- **Models**  
+  - **Chroma** final versions: [Chroma1-HD](https://huggingface.co/lodestones/Chroma1-HD), [Chroma1-Base](https://huggingface.co/lodestones/Chroma1-Base) and [Chroma1-Flash](https://huggingface.co/lodestones/Chroma1-Flash)  
+  - **Qwen-Image** [InstantX ControlNet Union](https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union) support  
+    *note* qwen-image is already a very large model and controlnet adds 3.5GB on top of that so quantization and offloading are highly recommended!  
+  - [Nunchaku-Qwen-Image-Lightning](https://huggingface.co/nunchaku-tech/nunchaku-qwen-image)  
+    if you have a compatible nVidia GPU, Nunchaku is the fastest quantization engine, currently available for Flux.1, SANA and Qwen-Image models  
+    *note*: release version of `nunchaku==0.3.2` does NOT include support, so you need to build [nunchaku](https://nunchaku.tech/docs/nunchaku/installation/installation.html) from source  
+  - [HunyuanDiT ControlNet](https://huggingface.co/Tencent-Hunyuan/HYDiT-ControlNet-v1.2) Canny, Depth, Pose  
+  - [KBlueLeaf/HDM-xut-340M-anime](https://huggingface.co/KBlueLeaf/HDM-xut-340M-anime)  
+    highly experimental: HDM *Home-made-Diffusion-Model* is a project to investigate specialized training recipe/scheme for pretraining T2I model at home based on super-light architecture  
+    requires: generator=cpu, dtype=float16, offload=none  
+  - updated [SD.Next Model Samples Gallery](https://vladmandic.github.io/sd-samples/compare.html)  
+- **UI**  
+  - default to **ModernUI**  
+    standard ui is still available via *settings -> user interface -> theme type*  
+  - mobile-friendly!  
+  - make hints touch-friendly: hold touch to display hint  
+  - improved image scaling in img2img and control interfaces  
+  - add base model type to networks display, thanks @Artheriax  
+  - additional hints to ui, thanks @Artheriax  
+  - add video support to gallery, thanks @CalamitousFelicitousness  
+  - additional artwork for reference models in networks, thanks @liutyi  
+  - improve ui hints display  
+  - restyled all toolbuttons to be modernui native  
+  - reodered system settings  
+  - configurable horizontal vs vertical panel layout  
+    in settings -> user interface -> panel min width  
+    *example*: if panel width is less than specified value, layout switches to verical  
+  - configurable grid images size  
+    in *settings -> user interface -> grid image size*  
+- **Offloading**
+  - enable offload during pre-forward by default  
+  - improve offloading of models with multiple dits  
+  - improve offloading of models with impliciy vae processing  
+  - improve offloading of models with controlnet  
+- **SDNQ**
+  - add quantized matmul support for all quantization types and group sizes  
+- **Other**
+  - refactor reuse-seed and add functionality to all tabs  
+- **Fixes**
+  - normalize path hanlding when deleting images  
+  - remove samplers filtering  
+  - fix hidden model tags in networks display  
+  - fix networks reference models display on windows  
+  - fix handling of pre-quantized `flux` models  
+  - fix `wan` use correct pipeline for i2v models  
+  - fix `qwen-image` with hires  
+  - fix `omnigen-2` failure  
+  - fix `auraflow` quantization  
+  - fix `kandinsky-3` noise  
+  - fix `infiniteyou` pipeline offloading  
+  - fix `skyreels-v2` image-to-video  
+  - fix `flex2` img2img denoising strength  
+  - fix segfault on startup with rocm 6.4.3 and torch 2.8  
+
 ## Update for 2025-08-20
 
 A quick service release with several important hotfixes, improved localization support and adding new **Qwen** model variants...
diff --git a/README.md b/README.md
index 1095c1c20..e37f77479 100644
--- a/README.md
+++ b/README.md
@@ -43,12 +43,15 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
 
 <br>
 
-*Main interface using **StandardUI***:  
-![screenshot-standardui](https://github.com/user-attachments/assets/cab47fe3-9adb-4d67-aea9-9ee738df5dcc)
+**Desktop** interface  
+<div align="center">
+<img src="https://github.com/user-attachments/assets/d6119a63-6ee5-4597-95f6-29ed0701d3b5" alt="screenshot-modernui-desktop" width="90%">
+</div>
 
-*Main interface using **ModernUI***:  
-
-![screenshot-modernui](https://github.com/user-attachments/assets/39e3bc9a-a9f7-4cda-ba33-7da8def08032)
+**Mobile** interface  
+<div align="center">
+<img src="https://github.com/user-attachments/assets/ced9fe0c-d2c2-46d1-94a7-8f9f2307ce38" alt="screenshot-modernui-mobile" width="35%">
+</div>
 
 For screenshots and informations on other available themes, see [Themes](https://vladmandic.github.io/sdnext-docs/Themes/)
 
diff --git a/TODO.md b/TODO.md
index 2b99c8c72..c480e4f67 100644
--- a/TODO.md
+++ b/TODO.md
@@ -7,19 +7,16 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
 - Remote TE  
 - Mobile ModernUI  
 - [Canvas](https://konvajs.org/)  
-
 - [Modular pipelines and guiders](https://github.com/huggingface/diffusers/issues/11915)  
 - Refactor: Sampler options  
 - Refactor: [GGUF](https://huggingface.co/docs/diffusers/main/en/quantization/gguf)  
 - Feature: Diffusers [group offloading](https://github.com/vladmandic/sdnext/issues/4049)  
-- Feature: Common repo for `T5` and `CLiP`  
 - Feature: LoRA add OMI format support for SD35/FLUX.1  
 - Video: Generic API support  
 - Video: LTX TeaCache and others  
 - Video: LTX API  
 - Video: LTX PromptEnhance
 - Video: LTX Conditioning preprocess
-- [WanAI-2.1 VACE](https://huggingface.co/Wan-AI/Wan2.1-VACE-14B)(https://github.com/huggingface/diffusers/pull/11582)  
 - [Cosmos-Predict2-Video](https://huggingface.co/nvidia/Cosmos-Predict2-2B-Video2World)(https://github.com/huggingface/diffusers/pull/11695)  
 
 ### Blocked items
@@ -30,6 +27,8 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
 
 ### Under Consideration
 
+- [X-Omni](https://github.com/X-Omni-Team/X-Omni/blob/main/README.md)
+- [DiffSynth Studio](https://github.com/modelscope/DiffSynth-Studio)
 - [IPAdapter negative guidance](https://github.com/huggingface/diffusers/discussions/7167)  
 - [IPAdapter composition](https://huggingface.co/ostris/ip-composition-adapter)  
 - [STG](https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#spatiotemporal-skip-guidance)  
diff --git a/cli/localize.js b/cli/localize.js
index e1b0aba5b..da870d455 100755
--- a/cli/localize.js
+++ b/cli/localize.js
@@ -8,6 +8,7 @@ const { GoogleGenerativeAI } = require('@google/generative-ai');
 const api_key = process.env.GOOGLE_AI_API_KEY;
 const model = 'gemini-2.5-flash';
 const prompt = `
+// eslint-disable-next-line max-len
 Translate attached JSON from English to {language} using following rules: fields id, label and reload should be preserved from original, field localized should be a translated version of field label and field hint should be translated in-place. if field is less than 3 characters, do not translate it and keep it as is. Every JSON entry should have id, label, localized, reload and hint fields. Output should be pure JSON without any additional text. To better match translation, context of the text is related to Stable Diffusion and topic of Generative AI.`;
 const languages = {
   hr: 'Croatian',
diff --git a/extensions-builtin/sdnext-modernui b/extensions-builtin/sdnext-modernui
index da4ccd4aa..3a5df9fc0 160000
--- a/extensions-builtin/sdnext-modernui
+++ b/extensions-builtin/sdnext-modernui
@@ -1 +1 @@
-Subproject commit da4ccd4aa75e3b42937674ba23d406a02783df4f
+Subproject commit 3a5df9fc03c1d61d7d70413f7a2f78a4b4552ae2
diff --git a/html/amethyst-nightfall.jpg b/html/amethyst-nightfall.jpg
deleted file mode 100644
index 216f2f765..000000000
Binary files a/html/amethyst-nightfall.jpg and /dev/null differ
diff --git a/html/black-orange.jpg b/html/black-orange.jpg
deleted file mode 100644
index 90c4bca0a..000000000
Binary files a/html/black-orange.jpg and /dev/null differ
diff --git a/html/black-teal.jpg b/html/black-teal.jpg
deleted file mode 100644
index 06f604f86..000000000
Binary files a/html/black-teal.jpg and /dev/null differ
diff --git a/html/emerald-paradise.jpg b/html/emerald-paradise.jpg
deleted file mode 100644
index 73a4b7a5f..000000000
Binary files a/html/emerald-paradise.jpg and /dev/null differ
diff --git a/html/gradio-base.jpg b/html/gradio-base.jpg
deleted file mode 100644
index 97a29c107..000000000
Binary files a/html/gradio-base.jpg and /dev/null differ
diff --git a/html/gradio-default.jpg b/html/gradio-default.jpg
deleted file mode 100644
index bb9097be4..000000000
Binary files a/html/gradio-default.jpg and /dev/null differ
diff --git a/html/gradio-glass.jpg b/html/gradio-glass.jpg
deleted file mode 100644
index 984a509bf..000000000
Binary files a/html/gradio-glass.jpg and /dev/null differ
diff --git a/html/gradio-monochrome.jpg b/html/gradio-monochrome.jpg
deleted file mode 100644
index ab2134fb4..000000000
Binary files a/html/gradio-monochrome.jpg and /dev/null differ
diff --git a/html/gradio-soft.jpg b/html/gradio-soft.jpg
deleted file mode 100644
index 1ba487a7c..000000000
Binary files a/html/gradio-soft.jpg and /dev/null differ
diff --git a/html/image-update.svg b/html/image-update.svg
deleted file mode 100644
index 3abf12df0..000000000
--- a/html/image-update.svg
+++ /dev/null
@@ -1,7 +0,0 @@
-<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
-  <filter id='shadow' color-interpolation-filters="sRGB">
-    <feDropShadow flood-color="black" dx="0" dy="0" flood-opacity="0.9" stdDeviation="0.5"/>
-    <feDropShadow flood-color="black" dx="0" dy="0" flood-opacity="0.9" stdDeviation="0.5"/>
-  </filter>
-  <path style="filter:url(#shadow);" fill="#FFFFFF" d="M13.18 19C13.35 19.72 13.64 20.39 14.03 21H5C3.9 21 3 20.11 3 19V5C3 3.9 3.9 3 5 3H19C20.11 3 21 3.9 21 5V11.18C20.5 11.07 20 11 19.5 11C19.33 11 19.17 11 19 11.03V5H5V19H13.18M11.21 15.83L9.25 13.47L6.5 17H13.03C13.14 15.54 13.73 14.22 14.64 13.19L13.96 12.29L11.21 15.83M19 13.5V12L16.75 14.25L19 16.5V15C20.38 15 21.5 16.12 21.5 17.5C21.5 17.9 21.41 18.28 21.24 18.62L22.33 19.71C22.75 19.08 23 18.32 23 17.5C23 15.29 21.21 13.5 19 13.5M19 20C17.62 20 16.5 18.88 16.5 17.5C16.5 17.1 16.59 16.72 16.76 16.38L15.67 15.29C15.25 15.92 15 16.68 15 17.5C15 19.71 16.79 21.5 19 21.5V23L21.25 20.75L19 18.5V20Z" />
-</svg>
diff --git a/html/invoked.jpg b/html/invoked.jpg
deleted file mode 100644
index fd0c1a965..000000000
Binary files a/html/invoked.jpg and /dev/null differ
diff --git a/html/light-teal.jpg b/html/light-teal.jpg
deleted file mode 100644
index f09546a02..000000000
Binary files a/html/light-teal.jpg and /dev/null differ
diff --git a/html/locale_en.json b/html/locale_en.json
index 04dc20618..8113a811b 100644
--- a/html/locale_en.json
+++ b/html/locale_en.json
@@ -32,6 +32,7 @@
   {"id":"","label":"","localized":"","reload":"","hint":"Sort by time, descending"}
 ],
 "main": [
+  {"id":"","label":"SD.Next","localized":"","reload":"","hint":"SD.Next<br>All-in-one WebUI for AI generative image and video creation"},
   {"id":"","label":"Prompt","localized":"","reload":"","hint":"Describe image you want to generate"},
   {"id":"","label":"Start","localized":"","reload":"","hint":"Start"},
   {"id":"","label":"End","localized":"","reload":"","hint":"End"},
@@ -41,10 +42,14 @@
   {"id":"","label":"Text","localized":"","reload":"","hint":"Create image from text"},
   {"id":"","label":"Image","localized":"","reload":"","hint":"Create image from image"},
   {"id":"","label":"Control","localized":"","reload":"","hint":"Create image with full guidance"},
-  {"id":"","label":"Process","localized":"","reload":"","hint":"Process existing image"},
+  {"id":"","label":"Images","localized":"","reload":"","hint":"Create images<br>Unified interface<br>Supports T2I and I2I<br>With optional control guidance"},
+  {"id":"","label":"T2I","localized":"","reload":"","hint":"Create image from text<br>Legacy interface that mimics original text-to-image interface and behavior"},
+  {"id":"","label":"I2I","localized":"","reload":"","hint":"Create image from image<br>Legacy interface that mimics original image-to-image interface and behavior"},
+  {"id":"","label":"Process","localized":"","reload":"","hint":"Process existing image<br>Can be used to upscale images, remove backgrounds, obfuscate NSFW content, apply various filters and effects"},
   {"id":"","label":"Caption","localized":"","reload":"","hint":"Analyze existing images and create text descriptions"},
   {"id":"","label":"Interrogate","localized":"","reload":"","hint":"Run interrogate to get description of your image"},
   {"id":"","label":"Models","localized":"","reload":"","hint":"Download, convert or merge your models and manage models metadata"},
+  {"id":"","label":"Sampler","localized":"","reload":"","hint":"Settings related to sampler and seed selection and configuration. Samplers guide the process of turning noise into an image over multiple steps."},
   {"id":"","label":"Agent Scheduler","localized":"","reload":"","hint":"Enqueue your generate requests and run them in the background"},
   {"id":"","label":"AgentScheduler","localized":"","reload":"","hint":"Enqueue your generate requests and run them in the background"},
   {"id":"","label":"System","localized":"","reload":"","hint":"System settings and information"},
@@ -97,7 +102,7 @@
   {"id":"","label":"Denoise","localized":"","reload":"","hint":"Denoising settings. Higher denoise means that more of existing image content is allowed to change during generate"},
   {"id":"","label":"Mask","localized":"","reload":"","hint":"Image masking and mask options"},
   {"id":"","label":"Input","localized":"","reload":"","hint":"Selection of input media"},
-  {"id":"","label":"Video","localized":"","reload":"","hint":"Create video using guidance"},
+  {"id":"","label":"Video","localized":"","reload":"","hint":"Create videos using different methods<br>Supports text-to-image, image-to-image first-last-frame, etc."},
   {"id":"","label":"Control elements","localized":"","reload":"","hint":"Control elements are advanced models that can guide generation towards desired outcome"},
   {"id":"","label":"IP adapter","localized":"","reload":"","hint":"Guide generation towards desired outcome using IP adapters plugin models"},
   {"id":"","label":"IP adapters","localized":"","reload":"","hint":"IP adapters are plugin models that can guide generation towards desired outcome"},
@@ -192,7 +197,14 @@
   {"id":"","label":"Control Only","localized":"","reload":"","hint":"This uses only the Control input below as the source for any ControlNet or IP Adapter type tasks based on any of our various options."},
   {"id":"","label":"Init Image Same As Control","localized":"","reload":"","hint":"Will additionally treat any image placed into the Control input window as a source for img2img type tasks, an image to modify for example."},
   {"id":"","label":"Separate Init Image","localized":"","reload":"","hint":"Creates an additional window next to Control input labeled Init input, so you can have a separate image for both Control operations and an init source."},
-  {"id":"","label":"Override settings","localized":"","reload":"","hint":"If generation parameters deviate from your system settings override settings populated with those settings to override your system configuration for this workflow"}
+  {"id":"","label":"Override settings","localized":"","reload":"","hint":"If generation parameters deviate from your system settings override settings populated with those settings to override your system configuration for this workflow"},
+  {"id":"","label":"sigma method","localized":"","reload":"","hint":"Controls how noise levels (sigmas) are distributed across diffusion steps. Options:\n- default: the model default\n- karras: smoother noise schedule, higher quality with fewer steps\n- beta: based on beta schedule values\n- exponential: exponential decay of noise\n- lambdas: experimental, balances signal-to-noise\n- flowmatch: tuned for flow-matching models"},
+  {"id":"","label":"timestep spacing","localized":"","reload":"","hint":"Determines how timesteps are spaced across the diffusion process. Options:\n- default: the model default\n- leading: creates evenly spaced steps\n- linspace: includes the first and last steps and evenly selects the remaining intermediate steps\n- trailing: only includes the last step and evenly selects the remaining intermediate steps starting from the end"},
+  {"id":"","label":"beta schedule","localized":"","reload":"","hint":"Defines how beta (noise strength per step) grows. Options:\n- default: the model default\n- linear: evenly decays noise per step\n- scaled: squared version of linear, used only by Stable Diffusion\n- cosine: smoother decay, often better results with fewer steps\n- sigmoid: sharp transition, experimental"},
+  {"id":"","label":"prediction method","localized":"","reload":"","hint":"Defines what the model predicts at each step. Options:\n- default: the model default\n- epsilon: noise (most common for Stable Diffusion)\n- sample: direct denoised image prediction, also called as x0 prediction\n- v_prediction: velocity prediction, used by CosXL and NoobAI VPred models\n- flow_prediction: used with newer flow-matching models like SD3 and Flux"},
+  {"id":"","label":"sampler order","localized":"","reload":"","hint":"Order of solver updates in the sampler. Higher order improves stability/accuracy but increases compute cost."},
+  {"id":"","label":"flow shift","localized":"","reload":"","hint":"Adjustment for flow-based samplers. Shifts noise distribution during generation, useful for fine-tuning balance between detail and consistency."},
+  {"id":"","label":"resize mode","localized":"","reload":"","hint":"Defines how the input is resized or adapted in second-pass refinement:\n- none: no resizing, keep original resolution\n- fixed: force resize to target resolution (may distort)\n- crop: center-crop to fit target while keeping aspect ratio\n- fill: resize to fit and pad empty space with borders\n- outpaint: extend canvas beyond image borders\n- context aware: smart resize that blends or adapts surrounding areas"}
 ],
 "other": [
   {"id":"","label":"Install","localized":"","reload":"","hint":"Install"},
@@ -358,15 +370,26 @@
   {"id":"","label":"ONNX allow fallback to CPU","localized":"","reload":"","hint":"Allow fallback to CPU when selected execution provider failed"},
   {"id":"","label":"ONNX cache converted models","localized":"","reload":"","hint":"Save the models that are converted to ONNX format as a cache. You can manage them in ONNX tab"},
   {"id":"","label":"ONNX unload base model when processing refiner","localized":"","reload":"","hint":"Unload base model when the refiner is being converted/optimized/processed"},
-  {"id":"","label":"Inference-mode","localized":"","reload":"","hint":"Use torch.inference_mode"},
-  {"id":"","label":"no-grad","localized":"","reload":"","hint":"Use torch.no_grad"},
   {"id":"","label":"Model compile precompile","localized":"","reload":"","hint":"Run model compile immediately on model load instead of first use"},
   {"id":"","label":"Use zeros for prompt padding","localized":"","reload":"","hint":"Force full zero tensor when prompt is empty to remove any residual noise"},
   {"id":"","label":"Include invisible watermark","localized":"","reload":"","hint":"Add invisible watermark to image by altering some pixel values"},
   {"id":"","label":"invisible watermark string","localized":"","reload":"","hint":"Watermark string to add to image. Keep very short to avoid image corruption."},
   {"id":"","label":"show log view","localized":"","reload":"","hint":"Show log view at the bottom of the main window"},
   {"id":"","label":"Log view update period","localized":"","reload":"","hint":"Log view update period, in milliseconds"},
-  {"id":"","label":"PAG layer names","localized":"","reload":"","hint":"Space separated list of layers<br>Available: d[0-5], m[0], u[0-8]<br>Default: m0"}
+  {"id":"","label":"PAG layer names","localized":"","reload":"","hint":"Space separated list of layers<br>Available: d[0-5], m[0], u[0-8]<br>Default: m0"},
+  {"id":"","label":"prompt attention normalization","localized":"","reload":"","hint":"Balances prompt token weights to avoid overly strong/weak influence. Helps stabilize outputs."},
+  {"id":"","label":"ck flash attention","localized":"","reload":"","hint":"Custom Flash Attention kernel. Very fast, but may be unstable or hardware-dependent."},
+  {"id":"","label":"flash attention","localized":"","reload":"","hint":"Highly optimized attention algorithm. Greatly reduces VRAM use and speeds up inference, but can be non-deterministic."},
+  {"id":"","label":"memory attention","localized":"","reload":"","hint":"Uses less VRAM by chunking attention computation. Slower but allows bigger batches or images."},
+  {"id":"","label":"math attention","localized":"","reload":"","hint":"Fallback pure-math attention implementation. Stable and predictable but very slow."},
+  {"id":"","label":"dynamic attention","localized":"","reload":"","hint":"Adjusts attention computation dynamically per step. Saves VRAM but slows generation."},
+  {"id":"","label":"sage attention","localized":"","reload":"","hint":"Experimental attention optimization method. May improve speed but less tested and can cause bugs."},
+  {"id":"","label":"batch matrix-matrix","localized":"","reload":"","hint":"Standard batched matrix multiplication for attention. Reliable but not VRAM-efficient."},
+  {"id":"","label":"split attention","localized":"","reload":"","hint":"Splits attention layers into smaller chunks. Helps with very large images at the cost of slower inference."},
+  {"id":"","label":"deterministic mode","localized":"","reload":"","hint":"Forces deterministic output across runs. Useful for reproducibility, but may disable some optimizations."},
+  {"id":"","label":"no-grad","localized":"","reload":"","hint":"Disables gradient tracking with torch.no_grad. Reduces memory usage and speeds up inference."},
+  {"id":"","label":"Inference-mode","localized":"","reload":"","hint":"Like no-grad but stricter. Ensures model runs only in inference mode for safety and speed."},
+  {"id":"","label":"cudamallocasync","localized":"","reload":"","hint":"Uses CUDA async memory allocator. Improves performance and VRAM fragmentation, but may cause instability on some GPUs."}
 ],
 "missing": [
   {"id":"","label":"1st stage","localized":"","reload":"","hint":"1st stage"},
@@ -455,7 +478,6 @@
   {"id":"","label":"batch interogate","localized":"","reload":"","hint":"batch interogate"},
   {"id":"","label":"batch interrogate","localized":"","reload":"","hint":"batch interrogate"},
   {"id":"","label":"batch mask directory","localized":"","reload":"","hint":"batch mask directory"},
-  {"id":"","label":"batch matrix-matrix","localized":"","reload":"","hint":"batch matrix-matrix"},
   {"id":"","label":"batch mode uses sequential seeds","localized":"","reload":"","hint":"batch mode uses sequential seeds"},
   {"id":"","label":"batch output directory","localized":"","reload":"","hint":"batch output directory"},
   {"id":"","label":"batch uses original name","localized":"","reload":"","hint":"batch uses original name"},
@@ -466,7 +488,6 @@
   {"id":"","label":"beta block weight preset","localized":"","reload":"","hint":"beta block weight preset"},
   {"id":"","label":"beta end","localized":"","reload":"","hint":"beta end"},
   {"id":"","label":"beta ratio","localized":"","reload":"","hint":"beta ratio"},
-  {"id":"","label":"beta schedule","localized":"","reload":"","hint":"beta schedule"},
   {"id":"","label":"beta start","localized":"","reload":"","hint":"beta start"},
   {"id":"","label":"bh1","localized":"","reload":"","hint":"bh1"},
   {"id":"","label":"bh2","localized":"","reload":"","hint":"bh2"},
@@ -495,7 +516,6 @@
   {"id":"","label":"chunk size","localized":"","reload":"","hint":"chunk size"},
   {"id":"","label":"civitai model type","localized":"","reload":"","hint":"civitai model type"},
   {"id":"","label":"civitai token","localized":"","reload":"","hint":"civitai token"},
-  {"id":"","label":"ck flash attention","localized":"","reload":"","hint":"ck flash attention"},
   {"id":"","label":"ckpt","localized":"","reload":"","hint":"ckpt"},
   {"id":"","label":"cleanup temporary folder on startup","localized":"","reload":"","hint":"cleanup temporary folder on startup"},
   {"id":"","label":"clip model","localized":"","reload":"","hint":"clip model"},
@@ -563,7 +583,6 @@
   {"id":"","label":"create zip archive","localized":"","reload":"","hint":"create zip archive"},
   {"id":"","label":"cross-attention","localized":"","reload":"","hint":"cross-attention"},
   {"id":"","label":"cudagraphs","localized":"","reload":"","hint":"cudagraphs"},
-  {"id":"","label":"cudamallocasync","localized":"","reload":"","hint":"cudamallocasync"},
   {"id":"","label":"custom pipeline","localized":"","reload":"","hint":"custom pipeline"},
   {"id":"","label":"dark","localized":"","reload":"","hint":"dark"},
   {"id":"","label":"dc solver","localized":"","reload":"","hint":"dc solver"},
@@ -591,7 +610,6 @@
   {"id":"","label":"depth threshold","localized":"","reload":"","hint":"depth threshold"},
   {"id":"","label":"description","localized":"","reload":"","hint":"description"},
   {"id":"","label":"details","localized":"","reload":"","hint":"details"},
-  {"id":"","label":"deterministic mode","localized":"","reload":"","hint":"deterministic mode"},
   {"id":"","label":"device info","localized":"","reload":"","hint":"device info"},
   {"id":"","label":"diffusers","localized":"","reload":"","hint":"diffusers"},
   {"id":"","label":"dilate","localized":"","reload":"","hint":"dilate"},
@@ -635,7 +653,6 @@
   {"id":"","label":"duration","localized":"","reload":"","hint":"duration"},
   {"id":"","label":"dwpose","localized":"","reload":"","hint":"dwpose"},
   {"id":"","label":"dynamic","localized":"","reload":"","hint":"dynamic"},
-  {"id":"","label":"dynamic attention","localized":"","reload":"","hint":"dynamic attention"},
   {"id":"","label":"dynamic attention slicing rate in gb","localized":"","reload":"","hint":"dynamic attention slicing rate in gb"},
   {"id":"","label":"dynamic attention trigger rate in gb","localized":"","reload":"","hint":"dynamic attention trigger rate in gb"},
   {"id":"","label":"edge","localized":"","reload":"","hint":"edge"},
@@ -682,9 +699,7 @@
   {"id":"","label":"filename","localized":"","reload":"","hint":"filename"},
   {"id":"","label":"first-block cache enabled","localized":"","reload":"","hint":"first-block cache enabled"},
   {"id":"","label":"fixed unet precision","localized":"","reload":"","hint":"fixed unet precision"},
-  {"id":"","label":"flash attention","localized":"","reload":"","hint":"flash attention"},
   {"id":"","label":"flavors","localized":"","reload":"","hint":"flavors"},
-  {"id":"","label":"flow shift","localized":"","reload":"","hint":"flow shift"},
   {"id":"","label":"folder","localized":"","reload":"","hint":"folder"},
   {"id":"","label":"folder for control generate","localized":"","reload":"","hint":"folder for control generate"},
   {"id":"","label":"folder for control grids","localized":"","reload":"","hint":"folder for control grids"},
@@ -848,7 +863,6 @@
   {"id":"","label":"mask only","localized":"","reload":"","hint":"mask only"},
   {"id":"","label":"mask strength","localized":"","reload":"","hint":"mask strength"},
   {"id":"","label":"masked","localized":"","reload":"","hint":"masked"},
-  {"id":"","label":"math attention","localized":"","reload":"","hint":"math attention"},
   {"id":"","label":"max faces","localized":"","reload":"","hint":"max faces"},
   {"id":"","label":"max flavors","localized":"","reload":"","hint":"max flavors"},
   {"id":"","label":"max guidance","localized":"","reload":"","hint":"max guidance"},
@@ -866,7 +880,6 @@
   {"id":"","label":"medium","localized":"","reload":"","hint":"medium"},
   {"id":"","label":"mediums","localized":"","reload":"","hint":"mediums"},
   {"id":"","label":"memory","localized":"","reload":"","hint":"memory"},
-  {"id":"","label":"memory attention","localized":"","reload":"","hint":"memory attention"},
   {"id":"","label":"memory limit","localized":"","reload":"","hint":"memory limit"},
   {"id":"","label":"memory optimization","localized":"","reload":"","hint":"memory optimization"},
   {"id":"","label":"merge alpha","localized":"","reload":"","hint":"merge alpha"},
@@ -987,7 +1000,6 @@
   {"id":"","label":"postprocessing operation order","localized":"","reload":"","hint":"postprocessing operation order"},
   {"id":"","label":"power","localized":"","reload":"","hint":"power"},
   {"id":"","label":"predefined question","localized":"","reload":"","hint":"predefined question"},
-  {"id":"","label":"prediction method","localized":"","reload":"","hint":"prediction method"},
   {"id":"","label":"preset","localized":"","reload":"","hint":"preset"},
   {"id":"","label":"preset block merge","localized":"","reload":"","hint":"preset block merge"},
   {"id":"","label":"preview","localized":"","reload":"","hint":"preview"},
@@ -998,7 +1010,6 @@
   {"id":"","label":"processor move to cpu after use","localized":"","reload":"","hint":"processor move to cpu after use"},
   {"id":"","label":"processor settings","localized":"","reload":"","hint":"processor settings"},
   {"id":"","label":"processor unload after use","localized":"","reload":"","hint":"processor unload after use"},
-  {"id":"","label":"prompt attention normalization","localized":"","reload":"","hint":"prompt attention normalization"},
   {"id":"","label":"prompt ex","localized":"","reload":"","hint":"prompt ex"},
   {"id":"","label":"prompt processor","localized":"","reload":"","hint":"prompt processor"},
   {"id":"","label":"prompt strength","localized":"","reload":"","hint":"prompt strength"},
@@ -1035,13 +1046,12 @@
   {"id":"","label":"reprocess face","localized":"","reload":"","hint":"reprocess face"},
   {"id":"","label":"reprocess refine","localized":"","reload":"","hint":"reprocess refine"},
   {"id":"","label":"request browser notifications","localized":"","reload":"","hint":"request browser notifications"},
-  {"id":"","label":"rescale","localized":"","reload":"","hint":"rescale"},
+  {"id":"","label":"rescale","localized":"","reload":"","hint":"rescale betas with zero terminal snr"},
   {"id":"","label":"rescale betas with zero terminal snr","localized":"","reload":"","hint":"rescale betas with zero terminal snr"},
   {"id":"","label":"reset anchors","localized":"","reload":"","hint":"reset anchors"},
   {"id":"","label":"residual diff threshold","localized":"","reload":"","hint":"residual diff threshold"},
   {"id":"","label":"resize background color","localized":"","reload":"","hint":"resize background color"},
   {"id":"","label":"resize method","localized":"","reload":"","hint":"resize method"},
-  {"id":"","label":"resize mode","localized":"","reload":"","hint":"resize mode"},
   {"id":"","label":"resize scale","localized":"","reload":"","hint":"resize scale"},
   {"id":"","label":"restart step","localized":"","reload":"","hint":"restart step"},
   {"id":"","label":"restore faces: codeformer","localized":"","reload":"","hint":"restore faces: codeformer"},
@@ -1057,13 +1067,11 @@
   {"id":"","label":"run benchmark","localized":"","reload":"","hint":"run benchmark"},
   {"id":"","label":"sa solver","localized":"","reload":"","hint":"sa solver"},
   {"id":"","label":"safetensors","localized":"","reload":"","hint":"safetensors"},
-  {"id":"","label":"sage attention","localized":"","reload":"","hint":"sage attention"},
   {"id":"","label":"same as primary","localized":"","reload":"","hint":"same as primary"},
   {"id":"","label":"same latent","localized":"","reload":"","hint":"same latent"},
   {"id":"","label":"sample","localized":"","reload":"","hint":"sample"},
   {"id":"","label":"sampler","localized":"","reload":"","hint":"sampler"},
   {"id":"","label":"sampler dynamic shift","localized":"","reload":"","hint":"sampler dynamic shift"},
-  {"id":"","label":"sampler order","localized":"","reload":"","hint":"sampler order"},
   {"id":"","label":"sampler shift","localized":"","reload":"","hint":"sampler shift"},
   {"id":"","label":"sana: use complex human instructions","localized":"","reload":"","hint":"sana: use complex human instructions"},
   {"id":"","label":"saturation","localized":"","reload":"","hint":"saturation"},
@@ -1129,7 +1137,6 @@
   {"id":"","label":"sigma","localized":"","reload":"","hint":"sigma"},
   {"id":"","label":"sigma churn","localized":"","reload":"","hint":"sigma churn"},
   {"id":"","label":"sigma max","localized":"","reload":"","hint":"sigma max"},
-  {"id":"","label":"sigma method","localized":"","reload":"","hint":"sigma method"},
   {"id":"","label":"sigma min","localized":"","reload":"","hint":"sigma min"},
   {"id":"","label":"sigma noise","localized":"","reload":"","hint":"sigma noise"},
   {"id":"","label":"sigma tmin","localized":"","reload":"","hint":"sigma tmin"},
@@ -1148,7 +1155,6 @@
   {"id":"","label":"spatial frequency","localized":"","reload":"","hint":"spatial frequency"},
   {"id":"","label":"specify model revision","localized":"","reload":"","hint":"specify model revision"},
   {"id":"","label":"specify model variant","localized":"","reload":"","hint":"specify model variant"},
-  {"id":"","label":"split attention","localized":"","reload":"","hint":"split attention"},
   {"id":"","label":"stable-fast","localized":"","reload":"","hint":"stable-fast"},
   {"id":"","label":"standard","localized":"","reload":"","hint":"standard"},
   {"id":"","label":"start","localized":"","reload":"","hint":"start"},
@@ -1209,7 +1215,6 @@
   {"id":"","label":"timestep","localized":"","reload":"","hint":"timestep"},
   {"id":"","label":"timestep skip end","localized":"","reload":"","hint":"timestep skip end"},
   {"id":"","label":"timestep skip start","localized":"","reload":"","hint":"timestep skip start"},
-  {"id":"","label":"timestep spacing","localized":"","reload":"","hint":"timestep spacing"},
   {"id":"","label":"timesteps","localized":"","reload":"","hint":"timesteps"},
   {"id":"","label":"timesteps override","localized":"","reload":"","hint":"timesteps override"},
   {"id":"","label":"timesteps presets","localized":"","reload":"","hint":"timesteps presets"},
diff --git a/html/midnight-barbie.jpg b/html/midnight-barbie.jpg
deleted file mode 100644
index c4182e8f1..000000000
Binary files a/html/midnight-barbie.jpg and /dev/null differ
diff --git a/html/card-no-preview.png b/html/missing.png
similarity index 100%
rename from html/card-no-preview.png
rename to html/missing.png
diff --git a/html/orchid-dreams.jpg b/html/orchid-dreams.jpg
deleted file mode 100644
index 8a62eeb0a..000000000
Binary files a/html/orchid-dreams.jpg and /dev/null differ
diff --git a/html/reference.json b/html/reference.json
index 5d634e68d..ae8c37e85 100644
--- a/html/reference.json
+++ b/html/reference.json
@@ -1,4 +1,3 @@
-
 {
   "Tempest-by-Vlad XL": {
     "path": "tempestByVlad_baseV01.safetensors@https://civitai.com/api/download/models/1301775",
@@ -140,40 +139,47 @@
     "extras": "sampler: Default, cfg_scale: 4.5"
   },
 
-  "lodestones Chroma Unlocked HD": {
+  "lodestones Chroma1 HD": {
     "path": "lodestones/Chroma1-HD",
     "preview": "lodestones--Chroma-HD.jpg",
-    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. The model is still training right now, and I’d love to hear your thoughts! Your input and feedback are really appreciated.",
+    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. This is the high-res fine-tune of the Chroma1-Base at a 1024x1024 resolution.",
     "skip": true,
-    "extras": "sampler: Default, cfg_scale: 3.5"
+    "extras": ""
   },
-  "lodestones Chroma Unlocked HD Annealed": {
-    "path": "vladmandic/chroma-unlocked-v50-annealed",
-    "preview": "lodestones--Chroma-annealed.jpg",
-    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. The model is still training right now, and I’d love to hear your thoughts! Your input and feedback are really appreciated.",
+  "lodestones Chroma1 Base": {
+    "path": "lodestones/Chroma1-Base",
+    "preview": "lodestones--Chroma-Base.jpg",
+    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. This is the core 512x512 model. It's a solid, all-around foundation for pretty much any creative project.",
     "skip": true,
-    "extras": "sampler: Default, cfg_scale: 3.5"
+    "extras": ""
   },
-  "lodestones Chroma Unlocked HD Flash": {
+  "lodestones Chroma1 Flash": {
     "path": "lodestones/Chroma1-Flash",
     "preview": "lodestones--Chroma-flash.jpg",
-    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. The model is still training right now, and I’d love to hear your thoughts! Your input and feedback are really appreciated.",
+    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. A fine-tuned version of the Chroma1-Base made to find the best way to make these flow matching models faster.",
     "skip": true,
-    "extras": "sampler: Default, cfg_scale: 1.0"
+    "extras": ""
   },
-  "lodestones Chroma Unlocked v48": {
+  "lodestones Chroma1 v50 Preview Annealed": {
+    "path": "vladmandic/chroma-unlocked-v50-annealed",
+    "preview": "lodestones--Chroma-annealed.jpg",
+    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. Re-tweaked variant with extra noise added.",
+    "skip": true,
+    "extras": ""
+  },
+  "lodestones Chroma1 v48 Preview": {
     "path": "vladmandic/chroma-unlocked-v48",
     "preview": "lodestones--Chroma.jpg",
-    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. The model is still training right now, and I’d love to hear your thoughts! Your input and feedback are really appreciated.",
+    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. Last raw version of Chroma before final finetuning.",
     "skip": true,
-    "extras": "sampler: Default, cfg_scale: 1.0"
+    "extras": ""
   },
-  "lodestones Chroma Unlocked v48 Detail Calibrated": {
+  "lodestones Chroma1 v48 Preview Calibrated": {
     "path": "vladmandic/chroma-unlocked-v48-detail-calibrated",
     "preview": "lodestones--Chroma-detail.jpg",
-    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. The model is still training right now, and I’d love to hear your thoughts! Your input and feedback are really appreciated.",
+    "desc": "Chroma is a 8.9B parameter model based on FLUX.1-schnell. It’s fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build on top of it—no corporate gatekeeping. Last raw version of Chroma before final finetuning but with some detail calibration.",
     "skip": true,
-    "extras": "sampler: Default, cfg_scale: 1.0"
+    "extras": ""
   },
 
   "Qwen-Image": {
@@ -185,12 +191,12 @@
   },
   "Qwen-Image-Edit": {
     "path": "Qwen/Qwen-Image-Edit",
-    "preview": "Qwen--Qwen-Image.jpg",
+    "preview": "Qwen--Qwen-Image-Edit.jpg",
     "desc": " Qwen-Image-Edit, the image editing version of Qwen-Image. Built upon our 20B Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing.",
     "skip": true,
     "extras": ""
   },
-  "Qwen-Lightning": {
+  "Qwen-Image-Lightning": {
     "path": "vladmandic/Qwen-Lightning",
     "preview": "Qwen-Lightning.jpg",
     "desc": " Qwen-Lightning is step-distilled from Qwen-Image to allow for generation in 8 steps.",
@@ -281,13 +287,13 @@
   "NVLabs Sana 1.5 1.6B 1k": {
     "path": "Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers",
     "desc": "Sana is an efficient model with scaling of training-time and inference time techniques. SANA-1.5 delivers: efficient model growth from 1.6B Sana-1.0 model to 4.8B, achieving similar or better performance than training from scratch and saving 60% training cost; efficient model depth pruning, slimming any model size as you want; powerful VLM selection based inference scaling, smaller model+inference scaling > larger model.",
-    "preview": "Efficient-Large-Model--Sana15_1600M_1024px_diffusers.jpg",
+    "preview": "Efficient-Large-Model--SANA1.5_1.6B_1024px_diffusers.jpg",
     "skip": true
   },
   "NVLabs Sana 1.5 4.8B 1k": {
     "path": "Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers",
     "desc": "Sana is an efficient model with scaling of training-time and inference time techniques. SANA-1.5 delivers: efficient model growth from 1.6B Sana-1.0 model to 4.8B, achieving similar or better performance than training from scratch and saving 60% training cost; efficient model depth pruning, slimming any model size as you want; powerful VLM selection based inference scaling, smaller model+inference scaling > larger model.",
-    "preview": "Efficient-Large-Model--Sana15_4800M_1024px_diffusers.jpg",
+    "preview": "Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers.jpg",
     "skip": true
   },
   "NVLabs Sana 1.5 1.6B 1k Sprint": {
@@ -299,25 +305,25 @@
   "NVLabs Sana 1.0 1.6B 4k": {
     "path": "Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers",
     "desc": "Sana is a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU.",
-    "preview": "Efficient-Large-Model--Sana15_1600M_4Kpx_diffusers.jpg",
+    "preview": "Efficient-Large-Model--Sana_1600M_4Kpx_BF16_diffusers.jpg",
     "skip": true
   },
   "NVLabs Sana 1.0 1.6B 2k": {
     "path": "Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers",
     "desc": "Sana is a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU.",
-    "preview": "Efficient-Large-Model--Sana1_1600M_2Kpx_diffusers.jpg",
+    "preview": "Efficient-Large-Model--Sana_1600M_2Kpx_BF16_diffusers.jpg",
     "skip": true
   },
   "NVLabs Sana 1.0 1.6B 1k": {
     "path": "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
     "desc": "Sana is a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU.",
-    "preview": "Efficient-Large-Model--Sana1_1600M_1024px_diffusers.jpg",
+    "preview": "Efficient-Large-Model--Sana_1600M_1024px_diffusers.jpg",
     "skip": true
   },
   "NVLabs Sana 1.0 0.6B 0.5k": {
     "path": "Efficient-Large-Model/Sana_600M_512px_diffusers",
     "desc": "Sana is a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU.",
-    "preview": "Efficient-Large-Model--Sana1_600M_1024px_diffusers.jpg",
+    "preview": "Efficient-Large-Model--Sana_600M_512px_diffusers.jpg",
     "skip": true
   },
 
@@ -340,7 +346,6 @@
     "preview": "Shitao--OmniGen-v1.jpg",
     "skip": true
   },
-
   "VectorSpaceLab OmniGen v2": {
     "path": "OmniGen2/OmniGen2",
     "desc": "OmniGen2 is a powerful and efficient unified multimodal model. Unlike OmniGen v1, OmniGen2 features two distinct decoding pathways for text and image modalities, utilizing unshared parameters and a decoupled image tokenizer.",
@@ -462,7 +467,6 @@
     "skip": true,
     "extras": "sampler: Default"
   },
-
   "AlphaVLLM Lumina 2": {
     "path": "Alpha-VLLM/Lumina-Image-2.0",
     "desc": "A Unified and Efficient Image Generative Model. Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transformer capable of generating images from text descriptions.",
@@ -553,9 +557,10 @@
     "extras": "sampler: Default"
   },
   "Playground v2.5": {
-    "path": "playground-v2.5-1024px-aesthetic.fp16.safetensors@https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/resolve/main/playground-v2.5-1024px-aesthetic.fp16.safetensors?download=true",
-    "desc": "Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2. Playground v2.5 is the state-of-the-art open-source model in aesthetic quality. Our user studies demonstrate that our model outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2.",
+    "path": "playgroundai/playground-v2.5-1024px-aesthetic",
+    "desc": "Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2. Playground v2.5 is the state-of-the-art open-source model in aesthetic quality.",
     "preview": "playgroundai--playground-v2_5-1024px-aesthetic.jpg",
+    "variant": "fp16",
     "extras": "sampler: DPM++ 2M EDM"
   },
 
@@ -604,6 +609,7 @@
     "preview": "MeissonFlow--Meissonic.jpg",
     "skip": true
   },
+
   "aMUSEd 256": {
     "path": "huggingface/amused/amused-256",
     "skip": true,
@@ -624,6 +630,7 @@
     "preview": "warp-ai--wuerstchen.jpg",
     "extras": "sampler: Default, cfg_scale: 4.0, image_cfg_scale: 0.0"
   },
+
   "KOALA 700M": {
     "path": "huggingface/etri-vilab/koala-700m-llava-cap",
     "variant": "fp16",
@@ -632,22 +639,34 @@
     "preview": "etri-vilab--koala-700m-llava-cap.jpg",
     "extras": "sampler: Default"
   },
+
+  "HDM-XUT 340M Anime": {
+    "path": "KBlueLeaf/HDM-xut-340M-anime",
+    "skip": true,
+    "desc": "HDM(Home made Diffusion Model) is a project to investigate specialized training recipe/scheme for pretraining T2I model at home which require the training setup should be exectuable on customer level hardware or cheap enough second handed server hardware.",
+    "preview": "KBlueLeaf--HDM-xut-340M-anime.jpg",
+    "extras": ""
+  },
+
   "Tsinghua UniDiffuser": {
     "path": "thu-ml/unidiffuser-v1",
     "desc": "UniDiffuser is a unified diffusion framework to fit all distributions relevant to a set of multi-modal data in one transformer. UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead.\nSpecifically, UniDiffuser employs a variation of transformer, called U-ViT, which parameterizes the joint noise prediction network. Other components perform as encoders and decoders of different modalities, including a pretrained image autoencoder from Stable Diffusion, a pretrained image ViT-B/32 CLIP encoder, a pretrained text ViT-L CLIP encoder, and a GPT-2 text decoder finetuned by ourselves.",
     "preview": "thu-ml--unidiffuser-v1.jpg",
     "extras": "width: 512, height: 512, sampler: Default"
   },
+
   "SalesForce BLIP-Diffusion": {
     "path": "salesforce/blipdiffusion",
     "desc": "BLIP-Diffusion, a new subject-driven image generation model that supports multimodal control which consumes inputs of subject images and text prompts. Unlike other subject-driven generation models, BLIP-Diffusion introduces a new multimodal encoder which is pre-trained to provide subject representation.",
     "preview": "salesforce--blipdiffusion.jpg"
   },
+
   "InstaFlow 0.9B": {
     "path": "XCLiu/instaflow_0_9B_from_sd_1_5",
     "desc": "InstaFlow is an ultra-fast, one-step image generator that achieves image quality close to Stable Diffusion. This efficiency is made possible through a recent Rectified Flow technique, which trains probability flows with straight trajectories, hence inherently requiring only a single step for fast inference.",
     "preview": "XCLiu--instaflow_0_9B_from_sd_1_5.jpg"
   },
+
   "DeepFloyd IF Medium": {
     "path": "DeepFloyd/IF-I-M-v1.0",
     "desc": "DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model, that can generate pictures with new state-of-the-art for photorealism and language understanding. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID-30K score of 6.66 on the COCO dataset. It is modular and composed of frozen text mode and three pixel cascaded diffusion modules, each designed to generate images of increasing resolution: 64x64, 256x256, and 1024x1024.",
diff --git a/html/simple-dark.jpg b/html/simple-dark.jpg
deleted file mode 100644
index 0aa0f2450..000000000
Binary files a/html/simple-dark.jpg and /dev/null differ
diff --git a/html/simple-light.jpg b/html/simple-light.jpg
deleted file mode 100644
index aa85547c3..000000000
Binary files a/html/simple-light.jpg and /dev/null differ
diff --git a/html/timeless-beige.jpg b/html/timeless-beige.jpg
deleted file mode 100644
index e8c591379..000000000
Binary files a/html/timeless-beige.jpg and /dev/null differ
diff --git a/installer.py b/installer.py
index 8b32ed927..d59908bf0 100644
--- a/installer.py
+++ b/installer.py
@@ -447,6 +447,8 @@ def git(arg: str, folder: str = None, ignore: bool = False, optional: bool = Fal
         stdout += ('\n' if len(stdout) > 0 else '') + result.stderr.decode(encoding="utf8", errors="ignore")
     stdout = stdout.strip()
     if result.returncode != 0 and not ignore:
+        if folder is None:
+            folder = 'root'
         if "couldn't find remote ref" in stdout: # not a git repo
             log.error(f'Git: folder="{folder}" could not identify repository')
         elif "no submodule mapping found" in stdout:
@@ -601,7 +603,7 @@ def check_diffusers():
     if args.skip_git:
         install('diffusers')
         return
-    sha = '4fcd0bc7ebb934a1559d0b516f09534ba22c8a0d' # diffusers commit hash
+    sha = '9b721db205729d5a6e97a72312c3a0f4534064f1' # diffusers commit hash
     pkg = pkg_resources.working_set.by_key.get('diffusers', None)
     minor = int(pkg.version.split('.')[1] if pkg is not None else -1)
     cur = opts.get('diffusers_version', '') if minor > -1 else ''
@@ -622,18 +624,22 @@ def check_transformers():
     t_start = time.time()
     if args.skip_all or args.skip_git or args.experimental:
         return
-    pkg = pkg_resources.working_set.by_key.get('transformers', None)
+    pkg_transofmers = pkg_resources.working_set.by_key.get('transformers', None)
+    pkg_tokenizers = pkg_resources.working_set.by_key.get('tokenizers', None)
     if args.use_directml:
-        target = '4.52.4'
+        target_transformers = '4.52.4'
+        target_tokenizers = '0.21.4'
     else:
-        target = '4.55.2'
-    if (pkg is None) or ((pkg.version != target) and (not args.experimental)):
-        if pkg is None:
-            log.info(f'Transformers install: version={target}')
+        target_transformers = '4.56.0'
+        target_tokenizers = '0.22.0'
+    if (pkg_transofmers is None) or ((pkg_transofmers.version != target_transformers) or (pkg_tokenizers is None) or ((pkg_tokenizers.version != target_tokenizers) and (not args.experimental))):
+        if pkg_transofmers is None:
+            log.info(f'Transformers install: version={target_transformers}')
         else:
-            log.info(f'Transformers update: current={pkg.version} target={target}')
+            log.info(f'Transformers update: current={pkg_transofmers.version} target={target_transformers}')
         pip('uninstall --yes transformers', ignore=True, quiet=True, uv=False)
-        pip(f'install --upgrade transformers=={target}', ignore=False, quiet=True, uv=False)
+        pip(f'install --upgrade tokenizers=={target_tokenizers}', ignore=False, quiet=True, uv=False)
+        pip(f'install --upgrade transformers=={target_transformers}', ignore=False, quiet=True, uv=False)
     ts('transformers', t_start)
 
 
@@ -768,10 +774,6 @@ def install_rocm_zluda():
                 # older rocm (5.7) uses torch 2.3 or older
                 torch_command = os.environ.get('TORCH_COMMAND', f'torch torchvision --index-url https://download.pytorch.org/whl/rocm{rocm.version}')
 
-        if device is not None and rocm.version != "6.2" and rocm.get_blaslt_enabled():
-            log.debug(f'ROCm hipBLASLt: arch={device.name} available={device.blaslt_supported}')
-            rocm.set_blaslt_enabled(device.blaslt_supported)
-
     if device is None or os.environ.get("HSA_OVERRIDE_GFX_VERSION", None) is not None:
         log.info(f'ROCm: HSA_OVERRIDE_GFX_VERSION auto config skipped: device={device.name if device is not None else None} version={os.environ.get("HSA_OVERRIDE_GFX_VERSION", None)}')
     else:
@@ -1271,21 +1273,6 @@ def install_optional():
     ts('optional', t_start)
 
 
-def install_sentencepiece():
-    if installed('sentencepiece', quiet=True):
-        pass
-    elif int(sys.version_info.minor) >= 13:
-        backup_cmake_policy = os.environ.get('CMAKE_POLICY_VERSION_MINIMUM', None)
-        backup_cxxflags = os.environ.get('CXXFLAGS', None)
-        os.environ.setdefault('CMAKE_POLICY_VERSION_MINIMUM', '3.5')
-        os.environ.setdefault('CXXFLAGS', '-include cstdint')
-        install('git+https://github.com/google/sentencepiece#subdirectory=python', 'sentencepiece')
-        os.environ.setdefault('CMAKE_POLICY_VERSION_MINIMUM', backup_cmake_policy)
-        os.environ.setdefault('CXXFLAGS', backup_cxxflags)
-    else:
-        install('sentencepiece', 'sentencepiece')
-
-
 def install_requirements():
     t_start = time.time()
     if args.profile:
diff --git a/javascript/base.css b/javascript/base.css
index f6a7f7d09..cc2e22061 100644
--- a/javascript/base.css
+++ b/javascript/base.css
@@ -98,7 +98,7 @@ table.settings-value-table td { padding: 0.4em; border: 1px solid #ccc; max-widt
 .extra-network-cards .card:hover .overlay { background: rgba(0, 0, 0, 0.40); }
 .extra-network-cards .card:hover .preview { box-shadow: none; filter: grayscale(100%); }
 .extra-network-cards .card:hover .overlay { background: rgba(0, 0, 0, 0.40); }
-.extra-network-cards .card .tags { margin: 4px; display: none; overflow-wrap: break-word; }
+.extra-network-cards .card .tags { margin: 4px; display: none; overflow-wrap: anywhere; }
 .extra-network-cards .card .tag { padding: 2px; margin: 2px; background: var(--neutral-700); cursor: pointer; display: inline-block; }
 .extra-network-cards .card .actions > span { padding: 4px; }
 .extra-network-cards .card:hover .actions { display: block; }
@@ -128,8 +128,8 @@ div:has(>#tab-browser-folders) { flex-grow: 0 !important; background-color: var(
 
 /* loader */
 .splash { position: fixed; top: 0; left: 0; width: 100vw; height: 100vh; z-index: 1000; display: block; text-align: center; }
-.motd { margin-top: 2em; color: var(--body-text-color-subdued); font-family: monospace; font-variant: all-petite-caps; }
-.splash-img { margin: 10% auto 0 auto; width: 512px; background-repeat: no-repeat; height: 512px; animation: color 10s infinite alternate; max-width: 80vw; background-size: contain; }
+.motd { margin-top: 2em; color: var(--body-text-color-subdued); font-family: monospace; font-variant: all-petite-caps; font-size: 1.2em; }
+.splash-img {   margin: 10% auto 0 auto; width: 512px; background-repeat: no-repeat; height: 512px; animation: hue 5s infinite alternate; max-width: 80vw; background-size: contain; }
 .loading { color: white; position: absolute; top: 20%; left: 50%; transform: translateX(-50%); }
 .loader { width: 300px; height: 300px; border: var(--spacing-md) solid transparent; border-radius: 50%; border-top: var(--spacing-md) solid var(--primary-600); animation: spin 4s linear infinite; position: relative; }
 .loader::before, .loader::after { content: ""; position: absolute; top: 6px; bottom: 6px; left: 6px; right: 6px; border-radius: 50%; border: var(--spacing-md) solid transparent; }
@@ -137,4 +137,4 @@ div:has(>#tab-browser-folders) { flex-grow: 0 !important; background-color: var(
 .loader::after { border-top-color: var(--primary-300); animation: spin 1.5s linear infinite; }
 @keyframes move { from { background-position-x: 0, -40px; } to { background-position-x: 0, 40px; } }
 @keyframes spin { from { transform: rotate(0deg); } to { transform: rotate(360deg); } }
-@keyframes color { from { filter: hue-rotate(0deg) } to { filter: hue-rotate(360deg) } }
+@keyframes hue { from { filter: hue-rotate(0deg) } to { filter: hue-rotate(360deg) } }
diff --git a/javascript/black-teal-reimagined.css b/javascript/black-teal-reimagined.css
index 1e7d4dc0b..dc3d4d0ce 100644
--- a/javascript/black-teal-reimagined.css
+++ b/javascript/black-teal-reimagined.css
@@ -953,7 +953,7 @@ svg.feather.feather-image,
 }
 
 /* No Preview Card Styles */
-.extra-network-cards .card:has(>img[src*="card-no-preview.png"])::before {
+.extra-network-cards .card:has(>img[src*="missing.png"])::before {
   content: '';
   position: absolute;
   width: 100%;
@@ -1007,11 +1007,11 @@ svg.feather.feather-image,
 }
 
 .splash-img {
-  margin: 0;
+  margin: 10% auto 0 auto;
   width: 512px;
   height: 512px;
   background-repeat: no-repeat;
-  animation: color 8s infinite alternate, move 3s infinite alternate;
+  animation: hue 5s infinite alternate;
 }
 
 .loading {
diff --git a/javascript/civitai.js b/javascript/civitai.js
index 31f1fc890..71e957380 100644
--- a/javascript/civitai.js
+++ b/javascript/civitai.js
@@ -107,7 +107,7 @@ async function modelCardClick(id) {
     downloads: data.downloads?.toString() || '',
     creator,
     desc: data.desc || 'no description available',
-    image: images.length > 0 ? images[0] : '/sdapi/v1/network/thumb?filename=html/card-no-preview.png',
+    image: images.length > 0 ? images[0] : '/sdapi/v1/network/thumb?filename=html/missing.png',
     versions: versionsHTML || '',
   });
   el.innerHTML = modelHTML;
diff --git a/javascript/extraNetworks.js b/javascript/extraNetworks.js
index e13323307..5026ef6e2 100644
--- a/javascript/extraNetworks.js
+++ b/javascript/extraNetworks.js
@@ -14,8 +14,13 @@ const getENActiveTab = () => {
   else if (gradioApp().getElementById('extras_image')?.checkVisibility()) tabName = 'process';
   else if (gradioApp().getElementById('interrogate_image')?.checkVisibility()) tabName = 'caption';
   else if (gradioApp().getElementById('tab-gallery-search')?.checkVisibility()) tabName = 'gallery';
-  if (tabName in ['process', 'caption', 'gallery']) tabName = lastTab;
-  else lastTab = tabName;
+
+  if (['process', 'caption', 'gallery'].includes(tabName)) {
+    tabName = lastTab;
+  } else if (tabName !== '') {
+    lastTab = tabName;
+  }
+
   if (tabName !== '') return tabName;
   // legacy method
   if (gradioApp().getElementById('tab_txt2img')?.style.display === 'block') tabName = 'txt2img';
@@ -277,8 +282,31 @@ function extraNetworksSearchButton(event) {
   const tabName = getENActiveTab();
   const searchTextarea = gradioApp().querySelector(`#${tabName}_extra_search textarea`);
   const button = event.target;
-  searchTextarea.value = `${button.textContent.trim()}/`;
-  updateInput(searchTextarea);
+  if (searchTextarea) {
+    searchTextarea.value = `${button.textContent.trim()}/`;
+    updateInput(searchTextarea);
+  } else {
+    console.error(`Could not find the search textarea for the tab: ${tabName}`);
+  }
+}
+
+function extraNetworksFilterVersion(event) {
+  // log('extraNetworksFilterVersion', event);
+  const version = event.target.textContent.trim();
+  const activeTab = gradioApp().querySelector('.extra-networks-tab:not([style*="display: none"])');
+  if (!activeTab) return;
+  const cardContainer = activeTab.querySelector('.extra-network-cards');
+  if (!cardContainer) return;
+  if (cardContainer.dataset.activeVersion === version) {
+    cardContainer.dataset.activeVersion = '';
+    cardContainer.querySelectorAll('.card').forEach((card) => card.style.display = '');
+  } else {
+    cardContainer.dataset.activeVersion = version;
+    cardContainer.querySelectorAll('.card').forEach((card) => {
+      if (card.dataset.version === version) card.style.display = '';
+      else card.style.display = 'none';
+    });
+  }
 }
 
 let desiredStyle = '';
diff --git a/javascript/sdnext.css b/javascript/sdnext.css
index adbeb8d9d..5c29863ab 100644
--- a/javascript/sdnext.css
+++ b/javascript/sdnext.css
@@ -1197,7 +1197,7 @@ table.settings-value-table td {
 }
 
 .extra-networks .search textarea {
-  width: calc(120px / 1.1);
+  width: calc(140px / 1.1);
   resize: none;
   margin-right: 2px;
 }
@@ -1233,7 +1233,7 @@ table.settings-value-table td {
   padding: 3px 3px 3px 12px;
   text-align: left;
   text-indent: -6px;
-  width: 120px;
+  width: 140px;
   width: 100%;
 }
 
@@ -1249,7 +1249,7 @@ table.settings-value-table td {
 .extra-network-subdirs {
   background: var(--input-background-fill);
   border-radius: 4px;
-  min-width: max(15%, 120px);
+  min-width: max(15%, 140px);
   overflow-x: hidden;
   overflow-y: auto;
   padding-top: 0.5em;
@@ -1376,7 +1376,17 @@ table.settings-value-table td {
   display: block;
 }
 
-.extra-network-cards .card:has(>img[src*="card-no-preview.png"])::before {
+.extra-network-cards .card:hover {
+  z-index: 100;
+  position: relative;
+}
+
+.extra-network-cards .card:hover .tags {
+  display: block;
+  z-index: 101;  /* Optional: ensure tags are above everything */
+}
+
+.extra-network-cards .card:has(>img[src*="missing.png"])::before {
   background-color: var(--data-color);
   content: '';
   height: 100%;
@@ -1461,7 +1471,6 @@ table.settings-value-table td {
   overflow-y: auto;
 }
 
-
 .extra-details td:first-child {
   font-weight: bold;
   vertical-align: top;
@@ -1471,6 +1480,29 @@ table.settings-value-table td {
   max-height: 50vh;
 }
 
+.network-folder::before {
+  content: "󰉖 ";
+  margin-right: 0.8em;
+}
+
+.network-reference {
+  filter: contrast(0.9);
+}
+
+.network-reference::before {
+  content: "󰴊 ";
+  margin-right: 0.8em;
+}
+
+.network-model {
+  opacity: 0.6;
+}
+
+.network-model::before {
+  content: "󰴉 ";
+  margin-right: 0.8em;
+}
+
 .input-accordion-checkbox {
   display: none !important;
 }
@@ -1716,6 +1748,13 @@ background: var(--background-color)
   width: max-content;
 }
 
+#tab-gallery-files gallery-file {
+  /* Add a vertical gutter between items (left/right), matching existing small row spacing */
+  display: inline-block;
+  margin-right: 0.2em;
+  vertical-align: top; /* keep rows aligned on the top edge */
+}
+
 #tab-gallery-files {
   display: block;
   height: 75vh;
@@ -1813,6 +1852,24 @@ div:has(>#tab-gallery-folders) {
   object-fit: contain;
 }
 
+/* Gallery video preview matches image preview sizing and layout */
+#tab-gallery-video {
+  height: 63vh;
+}
+
+/* Ensure the <video> element fills the preview column and preserves aspect */
+#tab-gallery-video video {
+  width: 100%;
+  height: 100% !important;
+  object-fit: contain;
+  background: none;
+}
+
+/* Gradio container around the video should not add extra spacing */
+#tab-gallery-video .wrap {
+  height: 100%;
+}
+
 .gallery-sort {
   background: var(--input-background-fill) !important;
   margin: 0 !important;
@@ -1867,70 +1924,73 @@ div:has(>#tab-gallery-folders) {
 }
 
 .splash {
-  display: block;
-  height: 100vh;
-  left: 0;
   position: fixed;
-  text-align: center;
   top: 0;
+  left: 0;
   width: 100vw;
+  height: 100vh;
   z-index: 1000;
+  display: flex;
+  flex-direction: column;
+  align-items: center;
+  justify-content: center;
+  background-color: rgba(0, 0, 0, 0.8);
 }
 
 .motd {
+  margin-top: 1em;
   color: var(--body-text-color-subdued);
   font-family: monospace;
   font-variant: all-petite-caps;
-  margin-top: 2em;
+  font-size: 1.2em;
 }
 
 .splash-img {
-  animation: color 10s infinite alternate;
-  background-repeat: no-repeat;
-  background-size: contain;
-  height: 512px;
   margin: 10% auto 0 auto;
-  max-width: 80vw;
   width: 512px;
+  height: 512px;
+  background-repeat: no-repeat;
+  animation: hue 5s infinite alternate;
 }
 
 .loading {
   color: white;
-  left: 50%;
-  position: absolute;
-  top: 20%;
-  transform: translateX(-50%);
+  position: border-box;
+  top: 85%;
+  font-size: 1.5em;
 }
 
 .loader {
-  animation: spin 4s linear infinite;
+  width: 100px;
+  height: 100px;
   border: var(--spacing-md) solid transparent;
   border-radius: 50%;
   border-top: var(--spacing-md) solid var(--primary-600);
-  height: 300px;
-  position: relative;
-  width: 300px;
+  animation: spin 2s linear infinite, hue 5s infinite alternate;
+  position: border-box;
 }
 
-.loader::before, .loader::after {
-  border: var(--spacing-md) solid transparent;
-  border-radius: 50%;
-  bottom: 6px;
+.loader::before,
+.loader::after {
   content: "";
-  left: 6px;
   position: absolute;
-  right: 6px;
   top: 6px;
+  bottom: 6px;
+  left: 6px;
+  right: 6px;
+  border-radius: 50%;
+  border: var(--spacing-md) solid transparent;
+  animation: hue 5s infinite alternate;
 }
 
 .loader::before {
-  animation: 3s spin linear infinite;
   border-top-color: var(--primary-900);
+  animation: spin 3s linear infinite;
 }
 
 .loader::after {
-  animation: spin 1.5s linear infinite;
   border-top-color: var(--primary-300);
+  animation: spin 1.5s linear infinite;
 }
 
 .docs-search textarea {
@@ -2087,35 +2147,21 @@ div:has(>#tab-gallery-folders) {
   filter: blur(0);
 }
 
-@keyframes move {
-  from {
-    background-position-x: 0, -40px;
-  }
-
-  to {
-    background-position-x: 0, 40px;
-  }
-}
-
 @keyframes spin {
   from {
     transform: rotate(0deg);
   }
-
   to {
     transform: rotate(360deg);
   }
 }
 
-@keyframes color {
-  from {
-
-    filter: hue-rotate(0deg)
+@keyframes hue {
+  0% {
+    filter: hue-rotate(0deg);
   }
-
-  to {
-
-    filter: hue-rotate(360deg)
+  100% {
+    filter: hue-rotate(360deg);
   }
 }
 
diff --git a/javascript/setHints.js b/javascript/setHints.js
index 836d1908f..4dcd2a39f 100644
--- a/javascript/setHints.js
+++ b/javascript/setHints.js
@@ -9,10 +9,12 @@ const localeData = {
   type: 2,
   hint: null,
   btn: null,
-  expandTimeout: null, // New property for expansion timeout
+  expandTimeout: null, // Property for expansion timeout
   currentElement: null, // Track current element for expansion
+  observer: null, // MutationObserver for DOM changes
 };
 let localeTimeout = null;
+const isTouchDevice = 'ontouchstart' in window;
 
 async function cycleLocale() {
   clearTimeout(localeTimeout);
@@ -62,43 +64,49 @@ async function tooltipCreate() {
   if (window.opts.tooltips === 'None') localeData.type = 0;
   if (window.opts.tooltips === 'Browser default') localeData.type = 1;
   if (window.opts.tooltips === 'UI tooltips') localeData.type = 2;
+
+  if (localeData.type === 2) { // setup event delegation for tooltips instead of individual listeners
+    if (isTouchDevice) {
+      gradioApp().addEventListener('touchstart', tooltipShowDelegated); // eslint-disable-line no-use-before-define
+      gradioApp().addEventListener('touchend', tooltipHideDelegated); // eslint-disable-line no-use-before-define
+    }
+    gradioApp().addEventListener('pointerover', tooltipShowDelegated); // eslint-disable-line no-use-before-define
+    gradioApp().addEventListener('pointerout', tooltipHideDelegated); // eslint-disable-line no-use-before-define
+  }
+  if (!localeData.observer) initializeDOMObserver(); // eslint-disable-line no-use-before-define
 }
 
 async function expandTooltip(element, longHint) {
   if (localeData.currentElement === element && localeData.hint.classList.contains('tooltip-show')) {
-    // Hide the progress ring
     const ring = localeData.hint.querySelector('.tooltip-progress-ring');
-    if (ring) {
-      ring.style.opacity = '0';
-    }
-
-    // Expand the container
+    if (ring) ring.style.opacity = '0';
     localeData.hint.classList.add('tooltip-expanded');
-
-    // After container starts expanding, reveal the long content
     setTimeout(() => {
       const longContent = localeData.hint.querySelector('.long-content');
-      if (longContent) {
-        longContent.classList.add('show');
-      }
+      if (longContent) longContent.classList.add('show');
     }, 100);
   }
 }
 
+async function tooltipShowDelegated(e) { // use event delegation to handle dynamically created elements
+  if (e.target.dataset && e.target.dataset.hint) tooltipShow(e); // eslint-disable-line no-use-before-define
+}
+
+async function tooltipHideDelegated(e) {
+  if (e.target.dataset && e.target.dataset.hint) tooltipHide(e); // eslint-disable-line no-use-before-define
+}
+
 async function tooltipShow(e) {
-  // Clear any existing expansion timeout
-  if (localeData.expandTimeout) {
+  if (localeData.expandTimeout) { // clear any existing expansion timeout
     clearTimeout(localeData.expandTimeout);
     localeData.expandTimeout = null;
   }
 
-  // Remove expanded class and reset current element
-  localeData.hint.classList.remove('tooltip-expanded');
+  localeData.hint.classList.remove('tooltip-expanded'); // remove expanded class and reset current element
   localeData.currentElement = e.target;
 
   if (e.target.dataset.hint) {
-    // Create progress ring SVG
-    const progressRing = `
+    const progressRing = ` // create progress ring SVG
       <div class="tooltip-progress-ring">
         <svg viewBox="0 0 12 12">
           <circle class="ring-background" cx="6" cy="6" r="5"></circle>
@@ -106,8 +114,7 @@ async function tooltipShow(e) {
         </svg>
       </div>
     `;
-
-    // Set up the complete content structure from the start
+    // set up the complete content structure from the start
     let content = `
       <div class="tooltip-header">
         <b>${e.target.textContent}</b>
@@ -116,21 +123,12 @@ async function tooltipShow(e) {
       <div class="separator"></div>
       ${e.target.dataset.hint}
     `;
-
-    // Add long content if available, but keep it hidden
-    if (e.target.dataset.longHint) {
-      content += `<div class="long-content"><div class="separator"></div>${e.target.dataset.longHint}</div>`;
-    }
-
-    // Add reload notice if needed
-    if (e.target.dataset.reload) {
+    if (e.target.dataset.longHint) content += `<div class="long-content"><div class="separator"></div>${e.target.dataset.longHint}</div>`; // add long content if available, but keep it hidden
+    if (e.target.dataset.reload) { // add reload notice if needed
       const reloadType = e.target.dataset.reload;
       let reloadText = '';
-      if (reloadType === 'model') {
-        reloadText = 'Requires model reload';
-      } else if (reloadType === 'server') {
-        reloadText = 'Requires server restart';
-      }
+      if (reloadType === 'model') reloadText = 'Requires model reload';
+      else if (reloadType === 'server') reloadText = 'Requires server restart';
       if (reloadText) {
         content += `
           <div class="tooltip-reload-notice">
@@ -144,40 +142,28 @@ async function tooltipShow(e) {
     localeData.hint.innerHTML = content;
     localeData.hint.classList.add('tooltip-show');
 
-    if (e.clientX > window.innerWidth / 2) {
-      localeData.hint.classList.add('tooltip-left');
-    } else {
-      localeData.hint.classList.remove('tooltip-left');
-    }
+    if (e.clientX > window.innerWidth / 2) localeData.hint.classList.add('tooltip-left');
+    else localeData.hint.classList.remove('tooltip-left');
 
-    // Set up expansion timer if long hint is available
-    if (e.target.dataset.longHint) {
-      // Start progress ring animation
-      const ring = localeData.hint.querySelector('.tooltip-progress-ring');
+    if (e.target.dataset.longHint) { // set up expansion timer if long hint is available
+      const ring = localeData.hint.querySelector('.tooltip-progress-ring'); // start progress ring animation
       const ringProgress = localeData.hint.querySelector('.ring-progress');
-
       if (ring && ringProgress) {
-        // Show the ring and start animation
         setTimeout(() => {
           ring.classList.add('active');
           ringProgress.classList.add('animate');
         }, 100);
       }
-
-      localeData.expandTimeout = setTimeout(() => {
-        expandTooltip(e.target, e.target.dataset.longHint);
-      }, 3000);
+      localeData.expandTimeout = setTimeout(() => expandTooltip(e.target, e.target.dataset.longHint), 3000);
     }
   }
 }
 
 async function tooltipHide(e) {
-  // Clear expansion timeout when hiding
   if (localeData.expandTimeout) {
     clearTimeout(localeData.expandTimeout);
     localeData.expandTimeout = null;
   }
-
   localeData.hint.classList.remove('tooltip-show', 'tooltip-expanded');
   localeData.currentElement = null;
 }
@@ -294,8 +280,6 @@ async function setHint(el, entry) {
     el.dataset.hint = entry.hint;
     if (entry.longHint && entry.longHint.length > 0) el.dataset.longHint = entry.longHint;
     if (entry.reload && entry.reload.length > 0) el.dataset.reload = entry.reload;
-    el.addEventListener('mouseover', tooltipShow);
-    el.addEventListener('mouseout', tooltipHide);
   } else {
     // tooltips disabled
   }
@@ -345,6 +329,7 @@ async function setHints(analyze = false) {
   localeData.initial = false;
   const t1 = performance.now();
   // localeData.btn.style.backgroundColor = localeData.locale !== 'en' ? 'var(--primary-500)' : '';
+  log('touchDevice', isTouchDevice);
   log('setHints', { type: localeData.type, locale: localeData.locale, elements: elements.length, localized, hints, data: localeData.data.length, override: overrideData.length, time: Math.round(t1 - t0) });
   // sortUIElements();
   if (analyze) {
@@ -359,3 +344,80 @@ const analyzeHints = async () => {
   localeData.data = [];
   await setHints(true);
 };
+
+// Apply hints to a single element immediately
+async function applyHintToElement(el) {
+  if (!localeData.data || localeData.data.length === 0) return;
+  if (!el.textContent) return;
+
+  // check if element matches our selector criteria
+  const isValidElement = el.tagName === 'BUTTON'
+    || el.tagName === 'H2'
+    || (el.tagName === 'SPAN' && (el.parentElement?.tagName === 'LABEL' || el.parentElement?.classList.contains('label-wrap')));
+  if (!isValidElement) return;
+
+  let found; // find matching hint data
+  if (el.dataset.original) found = localeData.data.find((l) => l.label.toLowerCase().trim() === el.dataset.original.toLowerCase().trim());
+  else found = localeData.data.find((l) => l.label.toLowerCase().trim() === el.textContent.toLowerCase().trim());
+
+  if (found?.localized?.length > 0) { // apply localization if found
+    if (!el.dataset.original) el.dataset.original = el.textContent;
+    replaceTextContent(el, found.localized);
+  }
+
+  if (found?.hint?.length > 0) setHint(el, found); // apply hint if found
+}
+
+// Initialize MutationObserver for immediate hint application
+function initializeDOMObserver() {
+  if (localeData.observer) {
+    localeData.observer.disconnect();
+  }
+
+  localeData.observer = new MutationObserver((mutations) => {
+    // Process added nodes immediately
+    for (const mutation of mutations) {
+      if (mutation.type === 'childList') {
+        for (const node of mutation.addedNodes) {
+          if (node.nodeType === Node.ELEMENT_NODE) {
+            // Apply hints to the node itself
+            applyHintToElement(node);
+
+            // Apply hints to all relevant children
+            const elements = [
+              ...Array.from(node.querySelectorAll('button')),
+              ...Array.from(node.querySelectorAll('h2')),
+              ...Array.from(node.querySelectorAll('label > span')),
+              ...Array.from(node.querySelectorAll('.label-wrap > span')),
+            ];
+
+            // Include the node itself if it matches
+            if (node.matches && (
+              node.matches('button')
+              || node.matches('h2')
+              || node.matches('label > span')
+              || node.matches('.label-wrap > span')
+            )) {
+              elements.push(node);
+            }
+
+            // Apply hints immediately to all found elements
+            elements.forEach((el) => applyHintToElement(el));
+          }
+        }
+      }
+    }
+  });
+
+  // Start observing the entire gradio app for changes
+  const targetNode = gradioApp();
+  if (targetNode) {
+    localeData.observer.observe(targetNode, {
+      childList: true,
+      subtree: true,
+    });
+  }
+}
+
+// Export for external use if needed
+const forceReapplyHints = () => setHints();
diff --git a/javascript/startup.js b/javascript/startup.js
index 842f925ba..86d7078f0 100644
--- a/javascript/startup.js
+++ b/javascript/startup.js
@@ -8,18 +8,18 @@ async function initStartup() {
   if (window.setupLogger) await setupLogger();
 
   // all items here are non-blocking async calls
-  initModels();
-  getUIDefaults();
-  initPromptChecker();
-  initContextMenu();
-  initDragDrop();
-  initAccordions();
-  initSettings();
-  initImageViewer();
-  initGallery();
-  initiGenerationParams();
-  initChangelog();
-  setupControlUI();
+  await initModels();
+  await getUIDefaults();
+  await initPromptChecker();
+  await initContextMenu();
+  await initDragDrop();
+  await initAccordions();
+  await initSettings();
+  await initImageViewer();
+  await initGallery();
+  await initiGenerationParams();
+  await initChangelog();
+  await setupControlUI();
 
   // reconnect server session
   await reconnectUI();
diff --git a/launch.py b/launch.py
index 423842b1b..8201b8275 100755
--- a/launch.py
+++ b/launch.py
@@ -262,7 +262,6 @@ def main():
     installer.check_onnx()
     installer.check_transformers()
     installer.check_diffusers()
-    installer.install_sentencepiece()
     installer.check_modified_files()
     if args.test:
         installer.log.info('Startup: test mode')
diff --git a/models/Reference/Alpha-VLLM--Lumina-Image-2.0.jpg b/models/Reference/Alpha-VLLM--Lumina-Image-2.0.jpg
index 5c71625bf..b0a4cd18d 100644
Binary files a/models/Reference/Alpha-VLLM--Lumina-Image-2.0.jpg and b/models/Reference/Alpha-VLLM--Lumina-Image-2.0.jpg differ
diff --git a/models/Reference/Alpha-VLLM--Lumina-Next-SFT-diffusers.jpg b/models/Reference/Alpha-VLLM--Lumina-Next-SFT-diffusers.jpg
index fb94e2040..74200c3a3 100644
Binary files a/models/Reference/Alpha-VLLM--Lumina-Next-SFT-diffusers.jpg and b/models/Reference/Alpha-VLLM--Lumina-Next-SFT-diffusers.jpg differ
diff --git a/models/Reference/Efficient-Large-Model--SANA1.5_1.6B_1024px_diffusers.jpg b/models/Reference/Efficient-Large-Model--SANA1.5_1.6B_1024px_diffusers.jpg
new file mode 100644
index 000000000..8d31a5514
Binary files /dev/null and b/models/Reference/Efficient-Large-Model--SANA1.5_1.6B_1024px_diffusers.jpg differ
diff --git a/models/Reference/Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers.jpg b/models/Reference/Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers.jpg
new file mode 100644
index 000000000..ecbbeb3ad
Binary files /dev/null and b/models/Reference/Efficient-Large-Model--SANA1.5_4.8B_1024px_diffusers.jpg differ
diff --git a/models/Reference/Efficient-Large-Model--Sana15_1600M_1024px_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana15_1600M_1024px_diffusers.jpg
deleted file mode 100644
index 654f85403..000000000
Binary files a/models/Reference/Efficient-Large-Model--Sana15_1600M_1024px_diffusers.jpg and /dev/null differ
diff --git a/models/Reference/Efficient-Large-Model--Sana15_1600M_4Kpx_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana15_1600M_4Kpx_diffusers.jpg
deleted file mode 100644
index 654f85403..000000000
Binary files a/models/Reference/Efficient-Large-Model--Sana15_1600M_4Kpx_diffusers.jpg and /dev/null differ
diff --git a/models/Reference/Efficient-Large-Model--Sana15_4800M_1024px_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana15_4800M_1024px_diffusers.jpg
deleted file mode 100644
index 654f85403..000000000
Binary files a/models/Reference/Efficient-Large-Model--Sana15_4800M_1024px_diffusers.jpg and /dev/null differ
diff --git a/models/Reference/Efficient-Large-Model--Sana15_Sprint_1600M_1024px_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana15_Sprint_1600M_1024px_diffusers.jpg
index 654f85403..4aa1decda 100644
Binary files a/models/Reference/Efficient-Large-Model--Sana15_Sprint_1600M_1024px_diffusers.jpg and b/models/Reference/Efficient-Large-Model--Sana15_Sprint_1600M_1024px_diffusers.jpg differ
diff --git a/models/Reference/Efficient-Large-Model--Sana1_1600M_1024px_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana1_1600M_1024px_diffusers.jpg
deleted file mode 100644
index 654f85403..000000000
Binary files a/models/Reference/Efficient-Large-Model--Sana1_1600M_1024px_diffusers.jpg and /dev/null differ
diff --git a/models/Reference/Efficient-Large-Model--Sana1_1600M_2Kpx_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana1_1600M_2Kpx_diffusers.jpg
deleted file mode 100644
index 654f85403..000000000
Binary files a/models/Reference/Efficient-Large-Model--Sana1_1600M_2Kpx_diffusers.jpg and /dev/null differ
diff --git a/models/Reference/Efficient-Large-Model--Sana1_600M_1024px_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana1_600M_1024px_diffusers.jpg
deleted file mode 100644
index 654f85403..000000000
Binary files a/models/Reference/Efficient-Large-Model--Sana1_600M_1024px_diffusers.jpg and /dev/null differ
diff --git a/models/Reference/Efficient-Large-Model--Sana_1600M_1024px_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana_1600M_1024px_diffusers.jpg
new file mode 100644
index 000000000..40af017b4
Binary files /dev/null and b/models/Reference/Efficient-Large-Model--Sana_1600M_1024px_diffusers.jpg differ
diff --git a/models/Reference/Efficient-Large-Model--Sana_1600M_2Kpx_BF16_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana_1600M_2Kpx_BF16_diffusers.jpg
new file mode 100644
index 000000000..37170402d
Binary files /dev/null and b/models/Reference/Efficient-Large-Model--Sana_1600M_2Kpx_BF16_diffusers.jpg differ
diff --git a/models/Reference/Efficient-Large-Model--Sana_1600M_4Kpx_BF16_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana_1600M_4Kpx_BF16_diffusers.jpg
new file mode 100644
index 000000000..22da48399
Binary files /dev/null and b/models/Reference/Efficient-Large-Model--Sana_1600M_4Kpx_BF16_diffusers.jpg differ
diff --git a/models/Reference/Efficient-Large-Model--Sana_600M_512px_diffusers.jpg b/models/Reference/Efficient-Large-Model--Sana_600M_512px_diffusers.jpg
new file mode 100644
index 000000000..ac5daf86c
Binary files /dev/null and b/models/Reference/Efficient-Large-Model--Sana_600M_512px_diffusers.jpg differ
diff --git a/models/Reference/Freepik--F-Lite-7B.jpg b/models/Reference/Freepik--F-Lite-7B.jpg
index 934315545..b2dd6170d 100644
Binary files a/models/Reference/Freepik--F-Lite-7B.jpg and b/models/Reference/Freepik--F-Lite-7B.jpg differ
diff --git a/models/Reference/Freepik--F-Lite-Texture.jpg b/models/Reference/Freepik--F-Lite-Texture.jpg
index 934315545..ba7e0ec93 100644
Binary files a/models/Reference/Freepik--F-Lite-Texture.jpg and b/models/Reference/Freepik--F-Lite-Texture.jpg differ
diff --git a/models/Reference/Freepik--F-Lite.jpg b/models/Reference/Freepik--F-Lite.jpg
index 934315545..6f6e1d24c 100644
Binary files a/models/Reference/Freepik--F-Lite.jpg and b/models/Reference/Freepik--F-Lite.jpg differ
diff --git a/models/Reference/KBlueLeaf--HDM-xut-340M-anime.jpg b/models/Reference/KBlueLeaf--HDM-xut-340M-anime.jpg
new file mode 100755
index 000000000..e973abdf3
Binary files /dev/null and b/models/Reference/KBlueLeaf--HDM-xut-340M-anime.jpg differ
diff --git a/models/Reference/Kwai-Kolors--Kolors-diffusers.jpg b/models/Reference/Kwai-Kolors--Kolors-diffusers.jpg
index 6d2506926..10a945983 100644
Binary files a/models/Reference/Kwai-Kolors--Kolors-diffusers.jpg and b/models/Reference/Kwai-Kolors--Kolors-diffusers.jpg differ
diff --git a/models/Reference/MeissonFlow--Meissonic.jpg b/models/Reference/MeissonFlow--Meissonic.jpg
index ee9aea0ad..831a35bf2 100644
Binary files a/models/Reference/MeissonFlow--Meissonic.jpg and b/models/Reference/MeissonFlow--Meissonic.jpg differ
diff --git a/models/Reference/OmniGen2--OmniGen2.jpg b/models/Reference/OmniGen2--OmniGen2.jpg
index d8fce4558..eab4da910 100644
Binary files a/models/Reference/OmniGen2--OmniGen2.jpg and b/models/Reference/OmniGen2--OmniGen2.jpg differ
diff --git a/models/Reference/PixArt-alpha--PixArt-XL-2-1024-MS.jpg b/models/Reference/PixArt-alpha--PixArt-XL-2-1024-MS.jpg
index 875762252..5f3e9c13d 100644
Binary files a/models/Reference/PixArt-alpha--PixArt-XL-2-1024-MS.jpg and b/models/Reference/PixArt-alpha--PixArt-XL-2-1024-MS.jpg differ
diff --git a/models/Reference/PixArt-alpha--PixArt-XL-2-512x512.jpg b/models/Reference/PixArt-alpha--PixArt-XL-2-512x512.jpg
index d1a808760..3551e6e6e 100644
Binary files a/models/Reference/PixArt-alpha--PixArt-XL-2-512x512.jpg and b/models/Reference/PixArt-alpha--PixArt-XL-2-512x512.jpg differ
diff --git a/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-1024.jpg b/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-1024.jpg
index 40a9a76b9..67591000d 100644
Binary files a/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-1024.jpg and b/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-1024.jpg differ
diff --git a/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-2K.jpg b/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-2K.jpg
index 40a9a76b9..f538f32e7 100644
Binary files a/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-2K.jpg and b/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-2K.jpg differ
diff --git a/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-512.jpg b/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-512.jpg
index 40a9a76b9..332c24c84 100644
Binary files a/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-512.jpg and b/models/Reference/PixArt-alpha--pixart_sigma_sdxl2-512.jpg differ
diff --git a/models/Reference/Qwen--Qwen-Image-Edit.jpg b/models/Reference/Qwen--Qwen-Image-Edit.jpg
new file mode 100644
index 000000000..1825741ad
Binary files /dev/null and b/models/Reference/Qwen--Qwen-Image-Edit.jpg differ
diff --git a/models/Reference/Shitao--OmniGen-v1.jpg b/models/Reference/Shitao--OmniGen-v1.jpg
index f7bdb994f..15c9892ea 100644
Binary files a/models/Reference/Shitao--OmniGen-v1.jpg and b/models/Reference/Shitao--OmniGen-v1.jpg differ
diff --git a/models/Reference/THUDM--CogView3-Plus-3B.jpg b/models/Reference/THUDM--CogView3-Plus-3B.jpg
index 48c929936..fd80116ba 100644
Binary files a/models/Reference/THUDM--CogView3-Plus-3B.jpg and b/models/Reference/THUDM--CogView3-Plus-3B.jpg differ
diff --git a/models/Reference/THUDM--CogView4-6B.jpg b/models/Reference/THUDM--CogView4-6B.jpg
index 5876b935b..f9d2495a1 100644
Binary files a/models/Reference/THUDM--CogView4-6B.jpg and b/models/Reference/THUDM--CogView4-6B.jpg differ
diff --git a/models/Reference/XCLiu--instaflow_0_9B_from_sd_1_5.jpg b/models/Reference/XCLiu--instaflow_0_9B_from_sd_1_5.jpg
index 2bad1f892..8ddb7dfd0 100644
Binary files a/models/Reference/XCLiu--instaflow_0_9B_from_sd_1_5.jpg and b/models/Reference/XCLiu--instaflow_0_9B_from_sd_1_5.jpg differ
diff --git a/models/Reference/amused--amused-256.jpg b/models/Reference/amused--amused-256.jpg
index f410817a8..7661e1f59 100644
Binary files a/models/Reference/amused--amused-256.jpg and b/models/Reference/amused--amused-256.jpg differ
diff --git a/models/Reference/amused--amused-512.jpg b/models/Reference/amused--amused-512.jpg
index 0b8e26240..e8318270a 100644
Binary files a/models/Reference/amused--amused-512.jpg and b/models/Reference/amused--amused-512.jpg differ
diff --git a/models/Reference/etri-vilab--koala-700m-llava-cap.jpg b/models/Reference/etri-vilab--koala-700m-llava-cap.jpg
index c3b63a041..d88ad64ec 100644
Binary files a/models/Reference/etri-vilab--koala-700m-llava-cap.jpg and b/models/Reference/etri-vilab--koala-700m-llava-cap.jpg differ
diff --git a/models/Reference/fal--AuraFlow-v0.2.jpg b/models/Reference/fal--AuraFlow-v0.2.jpg
index bb3baf66d..99037a298 100644
Binary files a/models/Reference/fal--AuraFlow-v0.2.jpg and b/models/Reference/fal--AuraFlow-v0.2.jpg differ
diff --git a/models/Reference/fal--AuraFlow-v0.3.jpg b/models/Reference/fal--AuraFlow-v0.3.jpg
index bb3baf66d..1ce8fe728 100644
Binary files a/models/Reference/fal--AuraFlow-v0.3.jpg and b/models/Reference/fal--AuraFlow-v0.3.jpg differ
diff --git a/models/Reference/lodestones--Chroma-Base.jpg b/models/Reference/lodestones--Chroma-Base.jpg
new file mode 100644
index 000000000..14683e907
Binary files /dev/null and b/models/Reference/lodestones--Chroma-Base.jpg differ
diff --git a/models/Reference/lodestones--Chroma-HD.jpg b/models/Reference/lodestones--Chroma-HD.jpg
index 78e6e33f7..72509efe1 100644
Binary files a/models/Reference/lodestones--Chroma-HD.jpg and b/models/Reference/lodestones--Chroma-HD.jpg differ
diff --git a/models/Reference/lodestones--Chroma-annealed.jpg b/models/Reference/lodestones--Chroma-annealed.jpg
index 78e6e33f7..ceee2286a 100644
Binary files a/models/Reference/lodestones--Chroma-annealed.jpg and b/models/Reference/lodestones--Chroma-annealed.jpg differ
diff --git a/models/Reference/lodestones--Chroma-flash.jpg b/models/Reference/lodestones--Chroma-flash.jpg
index 78e6e33f7..0f9cbf974 100644
Binary files a/models/Reference/lodestones--Chroma-flash.jpg and b/models/Reference/lodestones--Chroma-flash.jpg differ
diff --git a/models/Reference/nvidia--Cosmos-Predict2-14B-Text2Image.jpg b/models/Reference/nvidia--Cosmos-Predict2-14B-Text2Image.jpg
index 54b9af8c1..856cb9608 100644
Binary files a/models/Reference/nvidia--Cosmos-Predict2-14B-Text2Image.jpg and b/models/Reference/nvidia--Cosmos-Predict2-14B-Text2Image.jpg differ
diff --git a/models/Reference/nvidia--Cosmos-Predict2-2B-Text2Image.jpg b/models/Reference/nvidia--Cosmos-Predict2-2B-Text2Image.jpg
index 54b9af8c1..d0b8d0194 100644
Binary files a/models/Reference/nvidia--Cosmos-Predict2-2B-Text2Image.jpg and b/models/Reference/nvidia--Cosmos-Predict2-2B-Text2Image.jpg differ
diff --git a/models/Reference/playgroundai--playground-v1.jpg b/models/Reference/playgroundai--playground-v1.jpg
index 5a2cf4b5b..a0be689c5 100644
Binary files a/models/Reference/playgroundai--playground-v1.jpg and b/models/Reference/playgroundai--playground-v1.jpg differ
diff --git a/models/Reference/playgroundai--playground-v2-1024px-aesthetic.jpg b/models/Reference/playgroundai--playground-v2-1024px-aesthetic.jpg
index f1eca8c9d..c29a3cbd4 100644
Binary files a/models/Reference/playgroundai--playground-v2-1024px-aesthetic.jpg and b/models/Reference/playgroundai--playground-v2-1024px-aesthetic.jpg differ
diff --git a/models/Reference/playgroundai--playground-v2-256px-base.jpg b/models/Reference/playgroundai--playground-v2-256px-base.jpg
index 95645d2f2..59559900c 100644
Binary files a/models/Reference/playgroundai--playground-v2-256px-base.jpg and b/models/Reference/playgroundai--playground-v2-256px-base.jpg differ
diff --git a/models/Reference/playgroundai--playground-v2-512px-base.jpg b/models/Reference/playgroundai--playground-v2-512px-base.jpg
index 7f03eceb3..bfababf96 100644
Binary files a/models/Reference/playgroundai--playground-v2-512px-base.jpg and b/models/Reference/playgroundai--playground-v2-512px-base.jpg differ
diff --git a/models/Reference/playgroundai--playground-v2_5-1024px-aesthetic.jpg b/models/Reference/playgroundai--playground-v2_5-1024px-aesthetic.jpg
index f1eca8c9d..2fc5380f6 100644
Binary files a/models/Reference/playgroundai--playground-v2_5-1024px-aesthetic.jpg and b/models/Reference/playgroundai--playground-v2_5-1024px-aesthetic.jpg differ
diff --git a/models/Reference/segmind--SSD-1B.jpg b/models/Reference/segmind--SSD-1B.jpg
index 2d00260b9..51907db27 100644
Binary files a/models/Reference/segmind--SSD-1B.jpg and b/models/Reference/segmind--SSD-1B.jpg differ
diff --git a/models/Reference/segmind--SegMoE-4x2-v0.jpg b/models/Reference/segmind--SegMoE-4x2-v0.jpg
index 7d7989ec2..93903538c 100644
Binary files a/models/Reference/segmind--SegMoE-4x2-v0.jpg and b/models/Reference/segmind--SegMoE-4x2-v0.jpg differ
diff --git a/models/Reference/segmind--Segmind-Vega.jpg b/models/Reference/segmind--Segmind-Vega.jpg
index 8356b0ae8..98a5ec811 100644
Binary files a/models/Reference/segmind--Segmind-Vega.jpg and b/models/Reference/segmind--Segmind-Vega.jpg differ
diff --git a/models/Reference/segmind--tiny-sd.jpg b/models/Reference/segmind--tiny-sd.jpg
index 228f8b45e..375f697aa 100644
Binary files a/models/Reference/segmind--tiny-sd.jpg and b/models/Reference/segmind--tiny-sd.jpg differ
diff --git a/models/Reference/shuttleai--shuttle-3-diffusion.jpg b/models/Reference/shuttleai--shuttle-3-diffusion.jpg
index 9f3aa8cdd..1a0992863 100644
Binary files a/models/Reference/shuttleai--shuttle-3-diffusion.jpg and b/models/Reference/shuttleai--shuttle-3-diffusion.jpg differ
diff --git a/models/Reference/shuttleai--shuttle-3_1-aestetic.jpg b/models/Reference/shuttleai--shuttle-3_1-aestetic.jpg
index 9f3aa8cdd..21340b16a 100644
Binary files a/models/Reference/shuttleai--shuttle-3_1-aestetic.jpg and b/models/Reference/shuttleai--shuttle-3_1-aestetic.jpg differ
diff --git a/models/Reference/shuttleai--shuttle-jaguar.jpg b/models/Reference/shuttleai--shuttle-jaguar.jpg
index 9f3aa8cdd..0584b774c 100644
Binary files a/models/Reference/shuttleai--shuttle-jaguar.jpg and b/models/Reference/shuttleai--shuttle-jaguar.jpg differ
diff --git a/models/Reference/thu-ml--unidiffuser-v1.jpg b/models/Reference/thu-ml--unidiffuser-v1.jpg
index 72f104c95..47dc42049 100644
Binary files a/models/Reference/thu-ml--unidiffuser-v1.jpg and b/models/Reference/thu-ml--unidiffuser-v1.jpg differ
diff --git a/models/Reference/warp-ai--wuerstchen.jpg b/models/Reference/warp-ai--wuerstchen.jpg
index e17909088..3be239743 100644
Binary files a/models/Reference/warp-ai--wuerstchen.jpg and b/models/Reference/warp-ai--wuerstchen.jpg differ
diff --git a/modules/civitai/metadata_civitai.py b/modules/civitai/metadata_civitai.py
index 93bc143be..6d4315860 100644
--- a/modules/civitai/metadata_civitai.py
+++ b/modules/civitai/metadata_civitai.py
@@ -168,7 +168,7 @@ def atomic_civit_search_metadata(item, results):
         # log.error(f'CivitAI search metadata: item={item} {e}')
         return
     has_meta = os.path.isfile(meta) and os.stat(meta).st_size > 0
-    if ('card-no-preview.png' in item['preview'] or not has_meta) and os.path.isfile(item['filename']):
+    if ('missing.png' in item['preview'] or not has_meta) and os.path.isfile(item['filename']):
         sha = item.get('hash', None)
         found = False
         result = {
@@ -260,7 +260,7 @@ def civit_search_metadata(title: str = None):
         if type(title) == str:
             if page.title != title:
                 continue
-        if page.name == 'style':
+        if page.name == 'style' or page.name == 'wildcards':
             continue
         for item in page.list_items():
             if item is None:
diff --git a/modules/civitai/search_civitai.py b/modules/civitai/search_civitai.py
index 226925bf9..0465411b9 100644
--- a/modules/civitai/search_civitai.py
+++ b/modules/civitai/search_civitai.py
@@ -200,7 +200,7 @@ def create_model_cards(all_models: list[Model]) -> str:
                 if image.url and len(image.url) > 0 and not image.url.lower().endswith('.mp4'):
                     previews.append(image.url)
         if len(previews) == 0:
-            previews = ['/sdapi/v1/network/thumb?filename=html/card-no-preview.png']
+            previews = ['/sdapi/v1/network/thumb?filename=html/missing.png']
         all_cards += card.format(id=model.id, name=model.name, type=model.type, preview=previews[0])
     html = details + cards.format(cards=all_cards)
     return html
diff --git a/modules/control/processor.py b/modules/control/processor.py
index 34f214ed1..4626f3e27 100644
--- a/modules/control/processor.py
+++ b/modules/control/processor.py
@@ -7,6 +7,7 @@ from modules.processing_class import StableDiffusionProcessingControl
 from modules import shared, images, masking, sd_models
 from modules.timer import process as process_timer
 from modules.control import util
+from modules.control import processors as control_processors
 
 
 debug = os.environ.get('SD_CONTROL_DEBUG', None) is not None
@@ -108,7 +109,7 @@ def preprocess_image(
         if processed_image is not None:
             processed_images.append(processed_image)
         if shared.opts.control_unload_processor and process.processor_id is not None:
-            processors.config[process.processor_id]['dirty'] = True # to force reload
+            control_processors.config[process.processor_id]['dirty'] = True # to force reload
             process.model = None
 
     # blend processed images
diff --git a/modules/control/units/controlnet.py b/modules/control/units/controlnet.py
index 5f3216ba0..2f486f801 100644
--- a/modules/control/units/controlnet.py
+++ b/modules/control/units/controlnet.py
@@ -102,6 +102,15 @@ predefined_sd3 = {
     "Alimama Inpainting SD35": 'alimama-creative/SD3-Controlnet-Inpainting',
     "Alimama SoftEdge SD35": 'alimama-creative/SD3-Controlnet-Softedge',
 }
+predefined_qwen = {
+    "InstantX Union Qwen": 'InstantX/Qwen-Image-ControlNet-Union',
+}
+predefined_hunyuandit = {
+    "HunyuanDiT Canny": 'Tencent-Hunyuan/HunyuanDiT-v1.2-ControlNet-Diffusers-Canny',
+    "HunyuanDiT Pose": 'Tencent-Hunyuan/HunyuanDiT-v1.2-ControlNet-Diffusers-Pose',
+    "HunyuanDiT Depth": 'Tencent-Hunyuan/HunyuanDiT-v1.2-ControlNet-Diffusers-Depth',
+}
+
 variants = {
     'NoobAI Canny XL': 'fp16',
     'NoobAI Lineart Anime XL': 'fp16',
@@ -116,6 +125,8 @@ all_models.update(predefined_sd15)
 all_models.update(predefined_sdxl)
 all_models.update(predefined_f1)
 all_models.update(predefined_sd3)
+all_models.update(predefined_qwen)
+all_models.update(predefined_hunyuandit)
 cache_dir = 'models/control/controlnet'
 load_lock = threading.Lock()
 
@@ -150,6 +161,10 @@ def api_list_models(model_type: str = None):
         model_list += list(predefined_f1)
     if model_type == 'sd3' or model_type == 'all':
         model_list += list(predefined_sd3)
+    if model_type == 'qwen' or model_type == 'all':
+        model_list += list(predefined_qwen)
+    if model_type == 'hunyuandit' or model_type == 'all':
+        model_list += list(predefined_hunyuandit)
     model_list += sorted(find_models())
     return model_list
 
@@ -170,6 +185,10 @@ def list_models(refresh=False):
         models = ['None'] + list(predefined_f1) + sorted(find_models())
     elif modules.shared.sd_model_type == 'sd3':
         models = ['None'] + list(predefined_sd3) + sorted(find_models())
+    elif modules.shared.sd_model_type == 'qwen':
+        models = ['None'] + list(predefined_qwen) + sorted(find_models())
+    elif modules.shared.sd_model_type == 'hunyuandit':
+        models = ['None'] + list(predefined_hunyuandit) + sorted(find_models())
     else:
         log.warning(f'Control {what} model list failed: unknown model type')
         models = ['None'] + sorted(predefined_sd15) + sorted(predefined_sdxl) + sorted(predefined_f1) + sorted(predefined_sd3) + sorted(find_models())
@@ -222,12 +241,18 @@ class ControlNet():
         elif shared.sd_model_type == 'sd3':
             from diffusers import SD3ControlNetModel as cls
             config = 'InstantX/SD3-Controlnet-Canny'
+        elif shared.sd_model_type == 'qwen':
+            from diffusers import QwenImageControlNetModel as cls
+            config = 'InstantX/Qwen-Image-ControlNet-Union'
+        elif shared.sd_model_type == 'hunyuandit':
+            from diffusers import HunyuanDiT2DControlNetModel as cls
+            config = 'Tencent-Hunyuan/HunyuanDiT-v1.2-ControlNet-Diffusers-Canny'
         else:
             log.error(f'Control {what}: type={shared.sd_model_type} unsupported model')
             return None, None
         return cls, config
 
-    def load_safetensors(self, model_id, model_path, cls, config):
+    def load_safetensors(self, model_id, model_path, cls, config): # pylint: disable=unused-argument
         name = os.path.splitext(model_path)[0]
         config_path = None
         if not os.path.exists(model_path):
@@ -302,6 +327,7 @@ class ControlNet():
                             errors.display(e, 'Control')
                 if self.model is None:
                     return
+                self.model.offload_never = True
                 if self.dtype is not None:
                     self.model.to(self.dtype)
                 if "Control" in opts.sdnq_quantize_weights:
@@ -422,6 +448,30 @@ class ControlNetPipeline():
                 controlnet=controlnets, # can be a list
             )
             sd_models.move_model(self.pipeline, pipeline.device)
+        elif detect.is_qwen(pipeline) and len(controlnets) > 0:
+            from diffusers import QwenImageControlNetPipeline
+            self.pipeline = QwenImageControlNetPipeline(
+                vae=pipeline.vae,
+                text_encoder=pipeline.text_encoder,
+                tokenizer=pipeline.tokenizer,
+                transformer=pipeline.transformer,
+                scheduler=pipeline.scheduler,
+                controlnet=controlnets[0] if isinstance(controlnets, list) else controlnets, # can be a list
+            )
+        elif detect.is_hunyuandit(pipeline) and len(controlnets) > 0:
+            from diffusers import HunyuanDiTControlNetPipeline
+            self.pipeline = HunyuanDiTControlNetPipeline(
+                vae=pipeline.vae,
+                text_encoder=pipeline.text_encoder,
+                tokenizer=pipeline.tokenizer,
+                text_encoder_2=pipeline.text_encoder_2,
+                tokenizer_2=pipeline.tokenizer_2,
+                transformer=pipeline.transformer,
+                scheduler=pipeline.scheduler,
+                safety_checker=None,
+                feature_extractor=None,
+                controlnet=controlnets[0] if isinstance(controlnets, list) else controlnets, # can be a list
+            )
         elif len(loras) > 0:
             self.pipeline = pipeline
             for lora in loras:
@@ -442,17 +492,19 @@ class ControlNetPipeline():
         if dtype is not None:
             self.pipeline = self.pipeline.to(dtype)
 
+        controlnet = None # free up memory
+        controlnets = None
         sd_models.copy_diffuser_options(self.pipeline, pipeline)
         if opts.diffusers_offload_mode == 'none':
             sd_models.move_model(self.pipeline, devices.device)
-        from modules.sd_models import set_diffuser_offload
-        set_diffuser_offload(self.pipeline, 'model')
+        sd_models.clear_caches()
+        sd_models.set_diffuser_offload(self.pipeline, 'model')
 
         t1 = time.time()
         debug_log(f'Control {what} pipeline: class={self.pipeline.__class__.__name__} time={t1-t0:.2f}')
 
     def restore(self):
-        if self.pipeline is not None:
+        if self.pipeline is not None and hasattr(self.pipeline, 'unload_lora_weights'):
             self.pipeline.unload_lora_weights()
         self.pipeline = None
         return self.orig_pipeline
diff --git a/modules/control/units/detect.py b/modules/control/units/detect.py
index 7995d7e12..8d836015f 100644
--- a/modules/control/units/detect.py
+++ b/modules/control/units/detect.py
@@ -20,3 +20,11 @@ def is_f1(model):
 
 def is_sd3(model):
     return is_compatible(model, pattern='StableDiffusion3Pipeline')
+
+
+def is_qwen(model):
+    return is_compatible(model, pattern='Qwen')
+
+
+def is_hunyuandit(model):
+    return is_compatible(model, pattern='HunyuanDiT')
diff --git a/modules/face/__init__.py b/modules/face/__init__.py
index e4c2ddd50..41fdddcdc 100644
--- a/modules/face/__init__.py
+++ b/modules/face/__init__.py
@@ -134,7 +134,7 @@ class Script(scripts_manager.Script):
             app = get_app('buffalo_l')
             from modules.face.faceid import face_id
             processed_images = face_id(p, app=app, source_images=input_images, model=ip_model, override=ip_override, cache=ip_cache, scale=ip_strength, structure=ip_structure) # run faceid pipeline
-            processed = processing.Processed(p, images_list=processed_images, seed=p.seed, subseed=p.subseed, index_of_first_image=0) # manually created processed object
+            processed = processing.get_processed(p, images_list=processed_images, seed=p.seed, subseed=p.subseed, index_of_first_image=0) # manually created processed object
         elif mode == 'PhotoMaker': # photomaker creates pipeline and triggers original process_images
             from modules.face.insightface import get_app
             app = get_app('buffalo_l')
diff --git a/modules/framepack/framepack_ui.py b/modules/framepack/framepack_ui.py
index 3cc124c12..305fddc38 100644
--- a/modules/framepack/framepack_ui.py
+++ b/modules/framepack/framepack_ui.py
@@ -40,7 +40,7 @@ def create_ui(prompt, negative, styles, _overrides):
             with gr.Accordion(label="Video", open=False):
                 with gr.Row():
                     mp4_codec = gr.Dropdown(label="FP codec", choices=['none', 'libx264'], value='libx264', type='value')
-                    ui_common.create_refresh_button(mp4_codec, get_codecs)
+                    ui_common.create_refresh_button(mp4_codec, get_codecs, elem_id="framepack_mp4_codec_refresh")
                     mp4_ext = gr.Textbox(label="FP format", value='mp4', elem_id="framepack_mp4_ext")
                     mp4_opt = gr.Textbox(label="FP options", value='crf:16', elem_id="framepack_mp4_ext")
                 with gr.Row():
diff --git a/modules/img2img.py b/modules/img2img.py
index b3c0880aa..4798671f8 100644
--- a/modules/img2img.py
+++ b/modules/img2img.py
@@ -317,7 +317,7 @@ def img2img(id_task: str, state: str, mode: int,
     p.is_batch = mode == 5
     if p.is_batch:
         process_batch(p, img2img_batch_files, img2img_batch_input_dir, img2img_batch_output_dir, img2img_batch_inpaint_mask_dir, args)
-        processed = processing.Processed(p, [], p.seed, "")
+        processed = processing.get_processed(p, [], p.seed, "")
     else:
         processed = scripts_manager.scripts_img2img.run(p, *args)
         if processed is None:
diff --git a/modules/lora/lora_extract.py b/modules/lora/lora_extract.py
index 97b8be08a..18682804b 100644
--- a/modules/lora/lora_extract.py
+++ b/modules/lora/lora_extract.py
@@ -253,7 +253,7 @@ def create_ui():
             gr.HTML('<h2>&nbspExtract currently loaded LoRA(s)<br></h2>')
         with gr.Row():
             loaded = gr.Textbox(placeholder="Press refresh to query loaded LoRA", label="Loaded LoRA", interactive=False)
-            create_refresh_button(loaded, lambda: None, lambda: {'value': loaded_lora_str()}, "testid")
+            create_refresh_button(loaded, lambda: None, lambda: {'value': loaded_lora_str()}, "lora_extract_refresh")
         with gr.Group():
             with gr.Row():
                 modules = gr.CheckboxGroup(label="Modules to extract", value=['unet'], choices=['te', 'unet'])
diff --git a/modules/ltx/ltx_ui.py b/modules/ltx/ltx_ui.py
index a3c537b69..27b033af5 100644
--- a/modules/ltx/ltx_ui.py
+++ b/modules/ltx/ltx_ui.py
@@ -20,7 +20,7 @@ def create_ui(prompt, negative, styles, overrides):
                 with gr.Row():
                     frames = gr.Slider(label='LTX frames', minimum=1, maximum=513, step=1, value=17, elem_id="ltx_frames")
                     seed = gr.Number(label='LTX seed', value=-1, elem_id="ltx_seed", container=True)
-                    random_seed = ToolButton(ui_symbols.random, elem_id="ltx_random_seed")
+                    random_seed = ToolButton(ui_symbols.random, elem_id="ltx_seed_random")
             with gr.Accordion(open=False, label="Condition", elem_id='ltx_condition_accordion'):
                 condition_strength = gr.Slider(label='LTX condition strength', minimum=0.1, maximum=1.0, step=0.05, value=0.8, elem_id="ltx_condition_image_strength")
                 with gr.Tabs():
@@ -47,7 +47,7 @@ def create_ui(prompt, negative, styles, overrides):
                     mp4_interpolate = gr.Slider(label="LTX interpolation", minimum=0, maximum=10, value=0, step=1)
                 with gr.Row():
                     mp4_codec = gr.Dropdown(label="LTX codec", choices=['none', 'libx264'], value='libx264', type='value')
-                    ui_common.create_refresh_button(mp4_codec, get_codecs)
+                    ui_common.create_refresh_button(mp4_codec, get_codecs, elem_id="framepack_mp4_codec_refresh")
                     mp4_ext = gr.Textbox(label="LTX format", value='mp4', elem_id="framepack_mp4_ext")
                     mp4_opt = gr.Textbox(label="LTX options", value='crf:16', elem_id="framepack_mp4_ext")
                 with gr.Row():
diff --git a/modules/masking.py b/modules/masking.py
index f9af9106a..5b3319df0 100644
--- a/modules/masking.py
+++ b/modules/masking.py
@@ -534,7 +534,7 @@ def create_segment_ui():
         with gr.Row():
             controls.append(gr.Checkbox(label="Live update", value=True, elem_id="control_mask_live_update"))
             btn_mask = ui_components.ToolButton(value=ui_symbols.refresh, visible=True, elem_id="control_mask_refresh", )
-            btn_lama = ui_components.ToolButton(value=ui_symbols.image, visible=True, elem_id="control_mask_lama")
+            btn_lama = ui_components.ToolButton(value=ui_symbols.image, visible=True, elem_id="control_mask_remove")
         with gr.Row():
             controls.append(gr.Checkbox(label="Inpaint masked only", value=False, elem_id="control_mask_only", ))
             controls.append(gr.Checkbox(label="Invert mask", value=False, elem_id="control_mask_invert"))
diff --git a/modules/modeldata.py b/modules/modeldata.py
index dced4f564..c56190ad8 100644
--- a/modules/modeldata.py
+++ b/modules/modeldata.py
@@ -76,6 +76,8 @@ def get_model_type(pipe):
     # hybrid models
     elif 'Wan' in name:
         model_type = 'wanai'
+    elif 'HDM-xut' in name:
+        model_type = 'hdm'
     else:
         model_type = name
     return model_type
diff --git a/modules/modelloader.py b/modules/modelloader.py
index 149c57867..ae3abc7f3 100644
--- a/modules/modelloader.py
+++ b/modules/modelloader.py
@@ -210,7 +210,9 @@ def get_reference_opts(name: str, quiet=False):
         # shared.log.error(f'Reference: model="{name}" not found')
         return {}
     if not quiet:
-        shared.log.debug(f'Reference: model="{name}" {model_opts}')
+        desc = model_opts.copy()
+        desc.pop('desc', None)
+        shared.log.debug(f'Reference: model="{name}" {desc}')
     return model_opts
 
 
diff --git a/modules/onnx_impl/ui.py b/modules/onnx_impl/ui.py
index 85481e621..0af6ba345 100644
--- a/modules/onnx_impl/ui.py
+++ b/modules/onnx_impl/ui.py
@@ -47,7 +47,7 @@ def create_ui():
                     cache_state_dirname = gr.Textbox(value=None, visible=False)
                     with gr.Row():
                         model_dropdown = gr.Dropdown(label="Model", value="Please select model", choices=checkpoint_titles())
-                        create_refresh_button(model_dropdown, refresh_checkpoints, {}, "onnx_cache_refresh_diffusers_model")
+                        create_refresh_button(model_dropdown, refresh_checkpoints, {}, "onnx_cache_diffusers_model_refresh")
                     with gr.Row():
                         def remove_cache_onnx_converted(dirname: str):
                             shutil.rmtree(os.path.join(opts.onnx_cached_models_path, dirname))
diff --git a/modules/postprocess/pixelart.py b/modules/postprocess/pixelart.py
index c7a342065..d9ce620ee 100644
--- a/modules/postprocess/pixelart.py
+++ b/modules/postprocess/pixelart.py
@@ -92,7 +92,7 @@ def rgb_to_ycbcr_tensor(image: torch.ByteTensor) -> torch.FloatTensor:
 
 @devices.inference_context()
 def ycbcr_tensor_to_rgb(ycbcr: torch.FloatTensor) -> torch.ByteTensor:
-    ycbcr_img = (ycbcr / 2)
+    ycbcr_img = ycbcr / 2
     y = ycbcr_img[:,0,:,:].add_(0.5)
     cb = ycbcr_img[:,1,:,:]
     cr = ycbcr_img[:,2,:,:]
@@ -171,12 +171,12 @@ def process_image_input(images: PipelineImageInput) -> torch.ByteTensor:
                 img = torch.from_numpy(np.asarray(img).copy()).unsqueeze(0)
                 combined_images.append(img)
             elif isinstance(img, np.ndarray):
-                if len(img.shape) == 3:
-                    img = img.unsqueeze(0)
                 img = torch.from_numpy(img)
+                if img.ndim == 3:
+                    img = img.unsqueeze(0)
                 combined_images.append(img)
             elif isinstance(img, torch.Tensor):
-                if len(img.shape) == 3:
+                if img.ndim == 3:
                     img = img.unsqueeze(0)
                 combined_images.append(img)
             else:
@@ -186,11 +186,11 @@ def process_image_input(images: PipelineImageInput) -> torch.ByteTensor:
         combined_images = torch.from_numpy(np.asarray(images).copy()).unsqueeze(0)
     elif isinstance(images, np.ndarray):
         combined_images = torch.from_numpy(images)
-        if len(combined_images.shape) == 3:
+        if combined_images.ndim == 3:
             combined_images = combined_images.unsqueeze(0)
     elif isinstance(images, torch.Tensor):
         combined_images = images
-        if len(combined_images.shape) == 3:
+        if combined_images.ndim == 3:
             combined_images = combined_images.unsqueeze(0)
     else:
         raise RuntimeError(f"Invalid input! Given: {type(images)} should be in ('torch.Tensor', 'np.ndarray', 'PIL.Image.Image')")
diff --git a/modules/postprocess/yolo.py b/modules/postprocess/yolo.py
index 1cd7f552b..dc7479443 100644
--- a/modules/postprocess/yolo.py
+++ b/modules/postprocess/yolo.py
@@ -414,8 +414,8 @@ class YoloRestorer(Detailer):
             with gr.Row():
                 detailers = gr.Dropdown(label="Detailer models", elem_id=f"{tab}_detailers", choices=list(self.list), value=shared.opts.detailer_models, multiselect=True, visible=True)
                 detailers_text = gr.Textbox(label="Detailer list", elem_id=f"{tab}_detailers_text", placeholder="Comma separated list of detailer models", lines=2, visible=False, interactive=True)
-                refresh_btn = ui_common.create_refresh_button(detailers, self.enumerate, lambda: {"choices": self.enumerate()}, 'yolo_refresh_models')
-                ui_mode = ui_components.ToolButton(value=ui_symbols.view)
+                refresh_btn = ui_common.create_refresh_button(detailers, self.enumerate, lambda: {"choices": self.enumerate()}, 'yolo_models_refresh')
+                ui_mode = ui_components.ToolButton(value=ui_symbols.view, elem_id=f'{tab}_yolo_models_list')
                 ui_mode.click(fn=self.change_mode, inputs=[detailers, detailers_text], outputs=[detailers, detailers_text, refresh_btn])
             with gr.Row():
                 classes = gr.Textbox(label="Detailer classes", placeholder="Classes", elem_id=f"{tab}_detailer_classes")
diff --git a/modules/processing.py b/modules/processing.py
index 2a5087d37..89fbc4bb0 100644
--- a/modules/processing.py
+++ b/modules/processing.py
@@ -27,50 +27,59 @@ get_sampler_index = processing_helpers.get_sampler_index
 validate_sample = processing_helpers.validate_sample
 decode_first_stage = processing_helpers.decode_first_stage
 images_tensor_to_samples = processing_helpers.images_tensor_to_samples
+processed = None # last known processed results
 
 
 class Processed:
     def __init__(self, p: StableDiffusionProcessing, images_list, seed=-1, info=None, subseed=None, all_prompts=None, all_negative_prompts=None, all_seeds=None, all_subseeds=None, index_of_first_image=0, infotexts=None, comments=""):
-        self.images = images_list
+        self.sd_model_hash = getattr(shared.sd_model, 'sd_model_hash', '') if model_data.sd_model is not None else ''
+
         self.prompt = p.prompt or ''
         self.negative_prompt = p.negative_prompt or ''
-        self.seed = seed if seed != -1 else p.seed
-        self.subseed = subseed
-        self.subseed_strength = p.subseed_strength
-        self.info = info or create_infotext(p)
-        self.comments = comments or ''
+        self.prompt = self.prompt if type(self.prompt) != list else self.prompt[0]
+        self.negative_prompt = self.negative_prompt if type(self.negative_prompt) != list else self.negative_prompt[0]
+        self.styles = p.styles
+
+        self.images = images_list
         self.width = p.width if hasattr(p, 'width') else (self.images[0].width if len(self.images) > 0 else 0)
         self.height = p.height if hasattr(p, 'height') else (self.images[0].height if len(self.images) > 0 else 0)
+
         self.sampler_name = p.sampler_name or ''
         self.cfg_scale = p.cfg_scale if p.cfg_scale > 1 else None
         self.cfg_end = p.cfg_end if p.cfg_end < 0 else None
         self.image_cfg_scale = p.image_cfg_scale or 0
         self.steps = p.steps or 0
         self.batch_size = max(1, p.batch_size)
+        self.denoising_strength = p.denoising_strength
+
         self.restore_faces = p.restore_faces or False
         self.face_restoration_model = shared.opts.face_restoration_model if p.restore_faces else None
         self.detailer = p.detailer_enabled or False
         self.detailer_model = shared.opts.detailer_model if p.detailer_enabled else None
-        self.sd_model_hash = getattr(shared.sd_model, 'sd_model_hash', '') if model_data.sd_model is not None else ''
         self.seed_resize_from_w = p.seed_resize_from_w
         self.seed_resize_from_h = p.seed_resize_from_h
-        self.denoising_strength = p.denoising_strength
         self.extra_generation_params = p.extra_generation_params
         self.index_of_first_image = index_of_first_image
-        self.styles = p.styles
         self.job_timestamp = shared.state.job_timestamp
         self.clip_skip = p.clip_skip
         self.eta = p.eta
-        self.prompt = self.prompt if type(self.prompt) != list else self.prompt[0]
-        self.negative_prompt = self.negative_prompt if type(self.negative_prompt) != list else self.negative_prompt[0]
+
+        self.seed = seed if seed != -1 else p.seed
+        self.subseed = subseed
         self.seed = int(self.seed if type(self.seed) != list else self.seed[0]) if self.seed is not None else -1
         self.subseed = int(self.subseed if type(self.subseed) != list else self.subseed[0]) if self.subseed is not None else -1
+        self.subseed_strength = p.subseed_strength
+
         self.is_using_inpainting_conditioning = p.is_using_inpainting_conditioning
+
         self.all_prompts = all_prompts or p.all_prompts or [self.prompt]
         self.all_negative_prompts = all_negative_prompts or p.all_negative_prompts or [self.negative_prompt]
         self.all_seeds = all_seeds or p.all_seeds or [self.seed]
         self.all_subseeds = all_subseeds or p.all_subseeds or [self.subseed]
+
+        self.info = info or create_infotext(p)
         self.infotexts = infotexts or [self.info]
+        self.comments = comments or ''
         memstats.reset_stats()
 
     def js(self):
@@ -113,6 +122,12 @@ class Processed:
         return f'{self.__class__.__name__}: {self.__dict__}'
 
 
+def get_processed(*args, **kwargs):
+    global processed # pylint: disable=global-statement
+    processed = Processed(*args, **kwargs)
+    return processed
+
+
 def process_images(p: StableDiffusionProcessing) -> Processed:
     timer.process.reset()
     debug(f'Process images: {vars(p)}')
@@ -133,7 +148,7 @@ def process_images(p: StableDiffusionProcessing) -> Processed:
             p.override_settings.pop(k, None)
     for k in p.override_settings.keys():
         stored_opts[k] = shared.opts.data.get(k, None) or shared.opts.data_labels[k].default
-    processed = None
+    results = None
     try:
         # if no checkpoint override or the override checkpoint can't be found, remove override entry and load opts checkpoint
         if p.override_settings.get('sd_model_checkpoint', None) is not None and sd_checkpoint.checkpoint_aliases.get(p.override_settings.get('sd_model_checkpoint')) is None:
@@ -196,11 +211,11 @@ def process_images(p: StableDiffusionProcessing) -> Processed:
                     shared.log.debug(f'Torch profile: {profile_args}')
                     shared.profiler = torch.profiler.profile(**profile_args)
                 shared.profiler.start()
-                processed = process_images_inner(p)
+                results = process_images_inner(p)
                 errors.profile_torch(shared.profiler, 'Process')
         else:
             with context_hypertile_vae(p), context_hypertile_unet(p):
-                processed = process_images_inner(p)
+                results = process_images_inner(p)
 
     finally:
         script_callbacks.after_process_callback(p)
@@ -215,7 +230,7 @@ def process_images(p: StableDiffusionProcessing) -> Processed:
                 if k == 'sd_vae':
                     sd_vae.reload_vae_weights()
         timer.process.record('post')
-    return processed
+    return results
 
 
 def process_init(p: StableDiffusionProcessing):
@@ -370,6 +385,8 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
             p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
         debug(f'Processing inner: args={vars(p)}')
         for n in range(p.n_iter):
+            if p.n_iter > 1:
+                shared.log.debug(f'Processing: batch={n+1} total={p.n_iter} progress={(n+1)/p.n_iter:.2f}')
             shared.state.batch_no = n + 1
             debug(f'Processing inner: iteration={n+1}/{p.n_iter}')
             p.iteration = n
@@ -397,10 +414,10 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
             samples = None
             timer.process.record('init')
             if p.scripts is not None and isinstance(p.scripts, scripts_manager.ScriptRunner):
-                processed = p.scripts.process_images(p)
-                if processed is not None:
-                    samples = processed.images
-                    for script_image, script_infotext in zip(processed.images, processed.infotexts):
+                results = p.scripts.process_images(p)
+                if results is not None:
+                    samples = results.images
+                    for script_image, script_infotext in zip(results.images, results.infotexts):
                         output_images.append(script_image)
                         infotexts.append(script_infotext)
             if samples is None:
@@ -451,7 +468,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
                 if shared.opts.grid_save:
                     images.save_image(grid, p.outpath_grids, "", p.all_seeds[0], p.all_prompts[0], shared.opts.grid_format, info=grid_info, p=p, grid=True) # main save grid
 
-    processed = Processed(
+    results = get_processed(
         p,
         images_list=output_images,
         seed=p.all_seeds[0],
@@ -462,7 +479,7 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
         infotexts=infotexts,
     )
     if p.scripts is not None and isinstance(p.scripts, scripts_manager.ScriptRunner) and not (shared.state.interrupted or shared.state.skipped):
-        p.scripts.postprocess(p, processed)
+        p.scripts.postprocess(p, results)
     timer.process.record('post')
     p.ops = list(set(p.ops))
     if not p.disable_extra_networks:
@@ -472,4 +489,4 @@ def process_images_inner(p: StableDiffusionProcessing) -> Processed:
 
     if shared.cmd_opts.lowvram or shared.cmd_opts.medvram:
         devices.torch_gc(force=True, reason='final')
-    return processed
+    return results
diff --git a/modules/processing_args.py b/modules/processing_args.py
index a80a3e56d..5e8368556 100644
--- a/modules/processing_args.py
+++ b/modules/processing_args.py
@@ -310,7 +310,7 @@ def set_pipeline_args(p, model, prompts:list, negative_prompts:list, prompts_2:t
     if 'Flex2' in model.__class__.__name__:
         if len(getattr(p, 'init_images', [])) > 0:
             args['inpaint_image'] = p.init_images[0] if isinstance(p.init_images, list) else p.init_images
-            args['inpaint_mask'] = Image.new('L', args['inpaint_image'].size, 1)
+            args['inpaint_mask'] = Image.new('L', args['inpaint_image'].size, int(p.denoising_strength * 255))
             args['control_image'] = args['inpaint_image'].convert('L').convert('RGB') # will be interpreted as depth
             args['control_strength'] = p.denoising_strength
             args['width'] = p.width
@@ -358,8 +358,8 @@ def set_pipeline_args(p, model, prompts:list, negative_prompts:list, prompts_2:t
     task_kwargs = task_specific_kwargs(p, model)
     pipe_args = getattr(p, 'task_args', {})
     model_args = getattr(model, 'task_args', {})
-    task_kwargs.update(pipe_args)
-    task_kwargs.update(model_args)
+    task_kwargs.update(pipe_args or {})
+    task_kwargs.update(model_args or {})
     if debug_enabled:
         debug_log(f'Process task args: {task_kwargs}')
     for k, v in task_kwargs.items():
@@ -382,8 +382,15 @@ def set_pipeline_args(p, model, prompts:list, negative_prompts:list, prompts_2:t
         if 'width' in possible and 'height' in possible:
             vae_scale_factor = sd_vae.get_vae_scale_factor(model)
             if isinstance(args['image'], torch.Tensor) or isinstance(args['image'], np.ndarray):
-                args['width'] = vae_scale_factor * args['image'].shape[-1]
-                args['height'] = vae_scale_factor * args['image'].shape[-2]
+                if args['image'].shape[-1] == 3: # nhwc
+                    args['width'] = args['image'].shape[-2]
+                    args['height'] = args['image'].shape[-3]
+                elif args['image'].shape[-3] == 3: # nchw
+                    args['width'] = args['image'].shape[-1]
+                    args['height'] = args['image'].shape[-2]
+                else: # assume latent
+                    args['width'] = vae_scale_factor * args['image'].shape[-1]
+                    args['height'] = vae_scale_factor * args['image'].shape[-2]
             elif isinstance(args['image'], Image.Image):
                 args['width'] = args['image'].width
                 args['height'] = args['image'].height
diff --git a/modules/processing_diffusers.py b/modules/processing_diffusers.py
index 4ae15903a..d31732528 100644
--- a/modules/processing_diffusers.py
+++ b/modules/processing_diffusers.py
@@ -223,6 +223,11 @@ def process_hires(p: processing.StableDiffusionProcessing, output):
             shared.state.update('Upscale', 0, 1)
             output.images = resize_hires(p, latents=output.images)
             sd_hijack_hypertile.hypertile_set(p, hr=True)
+        elif torch.is_tensor(output.images) and output.images.shape[-1] == 3: # nhwc
+            if output.images.dim() == 3:
+                output.images = TF.to_pil_image(output.images.permute(2,0,1))
+            elif output.images.dim() == 4:
+                output.images = [TF.to_pil_image(output.images[i].permute(2,0,1)) for i in range(output.images.shape[0])]
 
         strength = p.hr_denoising_strength if p.hr_denoising_strength > 0 else p.denoising_strength
         if (p.hr_upscaler.lower().startswith('latent') or p.hr_force) and strength > 0:
diff --git a/modules/rocm.py b/modules/rocm.py
index 816a3b10c..f6cffafe0 100644
--- a/modules/rocm.py
+++ b/modules/rocm.py
@@ -191,7 +191,6 @@ else:
 
     def set_blaslt_enabled(enabled: bool) -> None:
         if enabled:
-            load_library_global("/opt/rocm/lib/libhipblaslt.so") # Preload hipBLASLt.
             os.environ["HIPBLASLT_TENSILE_LIBPATH"] = blaslt_tensile_libpath
         else:
             os.environ["TORCH_BLAS_PREFER_HIPBLASLT"] = "0"
diff --git a/modules/sd_detect.py b/modules/sd_detect.py
index 6aef51a0b..dc488bb1d 100644
--- a/modules/sd_detect.py
+++ b/modules/sd_detect.py
@@ -48,6 +48,8 @@ def guess_by_name(fn, current_guess):
         return 'SegMoE'
     elif 'hunyuandit' in fn.lower():
         return 'HunyuanDiT'
+    elif 'hdm-xut' in fn.lower():
+        return 'hdm'
     elif 'pixart-xl' in fn.lower():
         return 'PixArt Alpha'
     elif 'stable-diffusion-3' in fn.lower():
diff --git a/modules/sd_hijack_te.py b/modules/sd_hijack_te.py
index 960d87430..82ca4ec5d 100644
--- a/modules/sd_hijack_te.py
+++ b/modules/sd_hijack_te.py
@@ -6,11 +6,12 @@ from modules import shared, errors, timer, sd_models
 def hijack_encode_prompt(*args, **kwargs):
     shared.state.begin('TE')
     t0 = time.time()
-    if 'max_sequence_length' in kwargs:
+    if 'max_sequence_length' in kwargs and kwargs['max_sequence_length'] is not None:
         kwargs['max_sequence_length'] = max(kwargs['max_sequence_length'], os.environ.get('HIDREAM_MAX_SEQUENCE_LENGTH', 256))
-    # if hasattr(shared.sd_model, 'text_encoder') and shared.sd_model.text_encoder is not None:
-    #     sd_models.move_model(shared.sd_model.text_encoder, devices.device)
     try:
+        prompt = kwargs.get('prompt', None) or (args[0] if len(args) > 0 else None)
+        if prompt is not None:
+            shared.log.debug(f'Encode: prompt="{prompt}" hijack=True')
         res = shared.sd_model.orig_encode_prompt(*args, **kwargs)
     except Exception as e:
         shared.log.error(f'Encode prompt: {e}')
@@ -18,15 +19,14 @@ def hijack_encode_prompt(*args, **kwargs):
         res = None
     t1 = time.time()
     timer.process.add('te', t1-t0)
-    if hasattr(shared.sd_model, "maybe_free_model_hooks"):
-        shared.sd_model.maybe_free_model_hooks()
+    # if hasattr(shared.sd_model, "maybe_free_model_hooks"):
+    #     shared.sd_model.maybe_free_model_hooks()
     shared.sd_model = sd_models.apply_balanced_offload(shared.sd_model)
     shared.state.end()
     return res
 
 
 def init_hijack(pipe):
-    if shared.opts.te_hijack and pipe is not None and not hasattr(pipe, 'orig_encode_prompt') and hasattr(pipe, 'encode_prompt'):
-        # shared.log.debug(f'Model: cls={pipe.__class__.__name__} hijack encode')
+    if pipe is not None and not hasattr(pipe, 'orig_encode_prompt') and hasattr(pipe, 'encode_prompt'):
         pipe.orig_encode_prompt = pipe.encode_prompt
         pipe.encode_prompt = hijack_encode_prompt
diff --git a/modules/sd_hijack_vae.py b/modules/sd_hijack_vae.py
new file mode 100644
index 000000000..26c36b347
--- /dev/null
+++ b/modules/sd_hijack_vae.py
@@ -0,0 +1,64 @@
+import os
+import time
+import torch
+from modules import shared, sd_models, devices, timer, errors
+
+
+debug = shared.log.trace if os.environ.get('SD_VIDEO_DEBUG', None) is not None else lambda *args, **kwargs: None
+
+
+def hijack_vae_decode(*args, **kwargs):
+    shared.state.begin('VAE')
+    t0 = time.time()
+    res = None
+    shared.sd_model = sd_models.apply_balanced_offload(shared.sd_model, exclude=['vae'])
+    try:
+        sd_models.move_model(shared.sd_model.vae, devices.device)
+        if torch.is_tensor(args[0]):
+            latents = args[0].to(device=devices.device, dtype=shared.sd_model.vae.dtype) # upcast to vae dtype
+            res = shared.sd_model.vae.orig_decode(latents, *args[1:], **kwargs)
+            t1 = time.time()
+            shared.log.debug(f'Decode: vae={shared.sd_model.vae.__class__.__name__} slicing={getattr(shared.sd_model.vae, "use_slicing", None)} tiling={getattr(shared.sd_model.vae, "use_tiling", None)} latents={list(latents.shape)}:{latents.device}:{latents.dtype} time={t1-t0:.3f}')
+        else:
+            res = shared.sd_model.vae.orig_decode(*args, **kwargs)
+    except Exception as e:
+        shared.log.error(f'Decode: vae={shared.sd_model.vae.__class__.__name__} {e}')
+        errors.display(e, 'vae')
+        res = None
+    t1 = time.time()
+    timer.process.add('vae', t1-t0)
+    shared.state.end()
+    return res
+
+
+def hijack_vae_encode(*args, **kwargs):
+    shared.state.begin('VAE')
+    t0 = time.time()
+    res = None
+    shared.sd_model = sd_models.apply_balanced_offload(shared.sd_model, exclude=['vae'])
+    try:
+        sd_models.move_model(shared.sd_model.vae, devices.device)
+        if torch.is_tensor(args[0]):
+            latents = args[0].to(device=devices.device, dtype=shared.sd_model.vae.dtype) # upcast to vae dtype
+            res = shared.sd_model.vae.orig_encode(latents, *args[1:], **kwargs)
+            t1 = time.time()
+            shared.log.debug(f'Encode: vae={shared.sd_model.vae.__class__.__name__} slicing={getattr(shared.sd_model.vae, "use_slicing", None)} tiling={getattr(shared.sd_model.vae, "use_tiling", None)} latents={list(latents.shape)}:{latents.device}:{latents.dtype} time={t1-t0:.3f}')
+        else:
+            res = shared.sd_model.vae.orig_encode(*args, **kwargs)
+    except Exception as e:
+        shared.log.error(f'Encode: vae={shared.sd_model.vae.__class__.__name__} {e}')
+        errors.display(e, 'vae')
+        res = None
+    t1 = time.time()
+    timer.process.add('vae', t1-t0)
+    shared.state.end()
+    return res
+
+
+def init_hijack(pipe):
+    if pipe is not None and hasattr(pipe, 'vae') and hasattr(pipe.vae, 'decode') and not hasattr(pipe.vae, 'orig_decode'):
+        pipe.vae.orig_decode = pipe.vae.decode
+        pipe.vae.decode = hijack_vae_decode
+    if pipe is not None and hasattr(pipe, 'vae') and hasattr(pipe.vae, 'encode') and not hasattr(pipe.vae, 'orig_encode'):
+        pipe.vae.orig_encode = pipe.vae.encode
+        pipe.vae.encode = hijack_vae_encode
diff --git a/modules/sd_models.py b/modules/sd_models.py
index f772cf53b..bebd745cd 100644
--- a/modules/sd_models.py
+++ b/modules/sd_models.py
@@ -57,22 +57,6 @@ i2i_pipes = [
 ]
 
 
-def copy_diffuser_options(new_pipe, orig_pipe):
-    new_pipe.sd_checkpoint_info = getattr(orig_pipe, 'sd_checkpoint_info', None)
-    new_pipe.sd_model_checkpoint = getattr(orig_pipe, 'sd_model_checkpoint', None)
-    new_pipe.embedding_db = getattr(orig_pipe, 'embedding_db', None)
-    new_pipe.sd_model_hash = getattr(orig_pipe, 'sd_model_hash', None)
-    new_pipe.has_accelerate = getattr(orig_pipe, 'has_accelerate', False)
-    new_pipe.current_attn_name = getattr(orig_pipe, 'current_attn_name', None)
-    new_pipe.default_scheduler = getattr(orig_pipe, 'default_scheduler', None)
-    new_pipe.is_sdxl = getattr(orig_pipe, 'is_sdxl', False) # a1111 compatibility item
-    new_pipe.is_sd2 = getattr(orig_pipe, 'is_sd2', False)
-    new_pipe.is_sd1 = getattr(orig_pipe, 'is_sd1', True)
-    add_noise_pred_to_diffusers_callback(new_pipe)
-    if new_pipe.has_accelerate:
-        set_accelerate(new_pipe)
-
-
 def set_huggingface_options():
     if shared.opts.diffusers_to_gpu: # and model_type.startswith('Stable Diffusion'):
         sd_hijack_accelerate.hijack_accelerate()
@@ -321,7 +305,7 @@ def load_diffuser_force(model_type, checkpoint_info, diffusers_load_config, op='
         elif model_type in ['AuraFlow']: # forced pipeline
             from pipelines.model_auraflow import load_auraflow
             sd_model = load_auraflow(checkpoint_info, diffusers_load_config)
-            allow_post_quant = True
+            allow_post_quant = False
         elif model_type in ['FLUX']:
             from pipelines.model_flux import load_flux
             sd_model = load_flux(checkpoint_info, diffusers_load_config)
@@ -355,7 +339,7 @@ def load_diffuser_force(model_type, checkpoint_info, diffusers_load_config, op='
             sd_model = load_meissonic(checkpoint_info, diffusers_load_config)
             allow_post_quant = True
         elif model_type in ['OmniGen2']: # forced pipeline
-            from pipelines.model_omnigen2 import load_omnigen2
+            from pipelines.model_omnigen import load_omnigen2
             sd_model = load_omnigen2(checkpoint_info, diffusers_load_config)
             allow_post_quant = False
         elif model_type in ['OmniGen']: # forced pipeline
@@ -397,7 +381,7 @@ def load_diffuser_force(model_type, checkpoint_info, diffusers_load_config, op='
         elif model_type in ['Kandinsky 2.2']:
             from pipelines.model_kandinsky import load_kandinsky22
             sd_model = load_kandinsky22(checkpoint_info, diffusers_load_config)
-            allow_post_quant = False
+            allow_post_quant = True
         elif model_type in ['Kandinsky 3.0']:
             from pipelines.model_kandinsky import load_kandinsky3
             sd_model = load_kandinsky3(checkpoint_info, diffusers_load_config)
@@ -406,6 +390,10 @@ def load_diffuser_force(model_type, checkpoint_info, diffusers_load_config, op='
             from pipelines.model_nextstep import load_nextstep
             sd_model = load_nextstep(checkpoint_info, diffusers_load_config) # pylint: disable=assignment-from-none
             allow_post_quant = False
+        elif model_type in ['hdm']:
+            from pipelines.model_hdm import load_hdm
+            sd_model = load_hdm(checkpoint_info, diffusers_load_config)
+            allow_post_quant = False
     except Exception as e:
         shared.log.error(f'Load {op}: path="{checkpoint_info.path}" {e}')
         if debug_load:
@@ -539,16 +527,34 @@ def load_diffuser_file(model_type, pipeline, checkpoint_info, diffusers_load_con
 
 
 def set_overrides(sd_model, checkpoint_info):
-    if 'bigaspv25' in checkpoint_info.name.lower():
+    checkpoint_info_name = checkpoint_info.name.lower()
+    if 'bigaspv25' in checkpoint_info_name or 'nyaflow' in checkpoint_info_name:
         scheduler_config = sd_model.scheduler.config
         scheduler_config['prediction_type'] = 'flow_prediction'
+        scheduler_config['use_flow_sigmas'] = True
+        scheduler_config['beta_schedule'] = 'linear'
         sd_model.scheduler = diffusers.UniPCMultistepScheduler.from_config(scheduler_config)
         shared.log.info(f'Setting override: model="{checkpoint_info.name}" component=scheduler prediction="flow-prediction"')
-    if 'vpred' in checkpoint_info.name.lower() or 'v-pred' in checkpoint_info.name.lower():
+    elif 'vpred' in checkpoint_info_name or 'v-pred' in checkpoint_info_name or 'v_pred' in checkpoint_info_name:
         scheduler_config = sd_model.scheduler.config
         scheduler_config['prediction_type'] = 'v_prediction'
+        scheduler_config['rescale_betas_zero_snr'] = True
         sd_model.scheduler = diffusers.EulerDiscreteScheduler.from_config(scheduler_config)
-        shared.log.info(f'Setting override: model="{checkpoint_info.name}" component=scheduler prediction="v-prediction"')
+        shared.log.info(f'Setting override: model="{checkpoint_info.name}" component=scheduler prediction="v-prediction" rescale=True')
+    elif checkpoint_info.path.lower().endswith('.safetensors'):
+        try:
+            from safetensors import safe_open
+            with safe_open(checkpoint_info.path, framework='pt') as f:
+                keys = f.keys()
+            if 'v_pred' in keys: # NoobAI VPred models added empty v_pred and ztsnr keys
+                scheduler_config = sd_model.scheduler.config
+                scheduler_config['prediction_type'] = 'v_prediction'
+                if 'ztsnr' in keys:
+                    scheduler_config['rescale_betas_zero_snr'] = True
+                sd_model.scheduler = diffusers.EulerDiscreteScheduler.from_config(scheduler_config)
+                shared.log.info(f'Setting override: model="{checkpoint_info.name}" component=scheduler prediction="v-prediction" rescale={scheduler_config.get("rescale_betas_zero_snr", False)}')
+        except Exception as e:
+            shared.log.debug(f'Setting override from keys failed: {e}')
 
 
 def set_defaults(sd_model, checkpoint_info):
@@ -854,6 +860,28 @@ def clean_diffuser_pipe(pipe):
         pipe.register_to_config(**internal_dict)
 
 
+def copy_diffuser_options(new_pipe, orig_pipe):
+    new_pipe.sd_checkpoint_info = getattr(orig_pipe, 'sd_checkpoint_info', None)
+    new_pipe.sd_model_checkpoint = getattr(orig_pipe, 'sd_model_checkpoint', None)
+    new_pipe.embedding_db = getattr(orig_pipe, 'embedding_db', None)
+    new_pipe.loaded_loras = getattr(orig_pipe, 'loaded_loras', {})
+    new_pipe.sd_model_hash = getattr(orig_pipe, 'sd_model_hash', None)
+    new_pipe.has_accelerate = getattr(orig_pipe, 'has_accelerate', False)
+    new_pipe.current_attn_name = getattr(orig_pipe, 'current_attn_name', None)
+    new_pipe.default_scheduler = getattr(orig_pipe, 'default_scheduler', None)
+    new_pipe.image_encoder = getattr(orig_pipe, 'image_encoder', None)
+    new_pipe.feature_extractor = getattr(orig_pipe, 'feature_extractor', None)
+    new_pipe.mask_processor = getattr(orig_pipe, 'mask_processor', None)
+    new_pipe.restore_pipeline = getattr(orig_pipe, 'restore_pipeline', None)
+    new_pipe.task_args = getattr(orig_pipe, 'task_args', None)
+    new_pipe.is_sdxl = getattr(orig_pipe, 'is_sdxl', False) # a1111 compatibility item
+    new_pipe.is_sd2 = getattr(orig_pipe, 'is_sd2', False)
+    new_pipe.is_sd1 = getattr(orig_pipe, 'is_sd1', True)
+    add_noise_pred_to_diffusers_callback(new_pipe)
+    if new_pipe.has_accelerate:
+        set_accelerate(new_pipe)
+
+
 def backup_pipe_components(pipe):
     if pipe is None:
         return {}
@@ -1157,13 +1185,18 @@ def unload_model_weights(op='model'):
         shared.log.debug(f'Unload {op}: {memory_stats()}')
 
 
-def hf_auth_check(checkpoint_info):
+def hf_auth_check(checkpoint_info, force:bool=False):
     login = None
-    try:
-        if (checkpoint_info.path.endswith('.safetensors') and os.path.isfile(checkpoint_info.path)) or (os.path.exists(checkpoint_info.path) and os.path.isdir(checkpoint_info.path) and os.path.isfile(os.path.join(checkpoint_info.path, 'model_index.json'))): # skip check for already downloaded models
-            return True
-    except Exception:
-        pass
+    if not force:
+        try:
+            # skip check for single-file safetensors models
+            if (checkpoint_info.path.endswith('.safetensors') and os.path.isfile(checkpoint_info.path)):
+                return True
+            # skip check for local diffusers folders
+            if (os.path.exists(checkpoint_info.path) and os.path.isdir(checkpoint_info.path) and os.path.isfile(os.path.join(checkpoint_info.path, 'model_index.json'))):
+                return True
+        except Exception:
+            pass
     try:
         login = modelloader.hf_login()
         repo_id = path_to_repo(checkpoint_info)
diff --git a/modules/sd_offload.py b/modules/sd_offload.py
index 4a253a428..0a35e3950 100644
--- a/modules/sd_offload.py
+++ b/modules/sd_offload.py
@@ -203,18 +203,19 @@ class OffloadHook(accelerate.hooks.ModelHook):
         return module
 
     def pre_forward(self, module, *args, **kwargs):
-        if self.last_pre != id(module): # offload every other module first time when new module starts pre-forward
-            self.last_pre = id(module)
+        _id = id(module)
+        if self.last_pre != _id and not hasattr(module, "offload_never"): # offload every other module first time when new module starts pre-forward
+            self.last_pre = _id
             if shared.opts.diffusers_offload_pre:
                 debug_move(f'Offload: type=balanced op=pre module={module.__class__.__name__}')
                 for pipe in get_pipe_variants():
                     for module_name in get_module_names(pipe):
                         module_instance = getattr(pipe, module_name, None)
                         module_cls = module_instance.__class__.__name__
-                        if (module_cls != module.__class__.__name__) and (module_cls not in self.offload_never) and (not devices.same_device(module_instance.device, devices.cpu)):
+                        if (_id != id(module_instance)) and (module_cls not in self.offload_never) and (not devices.same_device(module_instance.device, devices.cpu)):
                             apply_balanced_offload_to_module(module_instance, op='pre')
 
-        if not devices.same_device(module.device, devices.device):
+        if not devices.same_device(module.device, devices.device): # move-to-device
             device_index = torch.device(devices.device).index
             if device_index is None:
                 device_index = 0
@@ -233,6 +234,13 @@ class OffloadHook(accelerate.hooks.ModelHook):
             module._hf_hook.execution_device = torch.device(devices.device) # pylint: disable=protected-access
             module.balanced_offload_device_map = device_map
             module.balanced_offload_max_memory = max_memory
+
+        if debug:
+            for pipe in get_pipe_variants():
+                for module_name in get_module_names(pipe):
+                    module_instance = getattr(pipe, module_name, None)
+                    shared.log.trace(f'Offload: type=balanced op=pre check module={module_instance.__class__.__name__} device={module_instance.device} dtype={module_instance.dtype}')
+
         return args, kwargs
 
     def post_forward(self, module, output):
@@ -292,7 +300,7 @@ def get_module_sizes(pipe=None, exclude=[]):
     return modules
 
 
-def move_module_to_cpu(module, op='unk'):
+def move_module_to_cpu(module, op='unk', force:bool=False):
     try:
         module_name = getattr(module, "module_name", module.__class__.__name__)
         module_size = offload_hook_instance.offload_map.get(module_name, offload_hook_instance.model_size())
@@ -301,7 +309,11 @@ def move_module_to_cpu(module, op='unk'):
         prev_gpu = used_gpu
         module_cls = module.__class__.__name__
         op = f'{op}:skip'
-        if module_cls in offload_hook_instance.offload_never:
+        if force:
+            op = f'{op}:force'
+            module = module.to(devices.cpu)
+            used_gpu -= module_size
+        elif module_cls in offload_hook_instance.offload_never:
             op = f'{op}:never'
         elif module_cls in offload_hook_instance.offload_always:
             op = f'{op}:always'
@@ -313,7 +325,7 @@ def move_module_to_cpu(module, op='unk'):
             used_gpu -= module_size
         if debug:
             quant = getattr(module, "quantization_method", None)
-            debug_move(f'Offload: type=balanced op={op} gpu={prev_gpu:.3f}:{used_gpu:.3f} perc={perc_gpu:.2f} ram={used_ram:.3f} current={module.device} dtype={module.dtype} quant={quant} module={module_cls} size={module_size:.3f}')
+            debug_move(f'Offload: type=balanced op={op} gpu={prev_gpu:.3f}:{used_gpu:.3f} perc={perc_gpu:.2f}:{shared.opts.diffusers_offload_min_gpu_memory} ram={used_ram:.3f} current={module.device} dtype={module.dtype} quant={quant} module={module_cls} size={module_size:.3f}')
     except Exception as e:
         if 'out of memory' in str(e):
             devices.torch_gc(fast=True, force=True, reason='oom')
@@ -325,7 +337,7 @@ def move_module_to_cpu(module, op='unk'):
             errors.display(e, f'Offload: type=balanced op=apply module={getattr(module, "__name__", None)}')
 
 
-def apply_balanced_offload_to_module(module, op="apply"):
+def apply_balanced_offload_to_module(module, op="apply", force:bool=False):
     module_name = getattr(module, "module_name", module.__class__.__name__)
     network_layer_name = getattr(module, "network_layer_name", None)
     device_map = getattr(module, "balanced_offload_device_map", None)
@@ -334,7 +346,7 @@ def apply_balanced_offload_to_module(module, op="apply"):
         module = accelerate.hooks.remove_hook_from_module(module, recurse=True)
     except Exception as e:
         shared.log.warning(f'Offload remove hook: module={module_name} {e}')
-    move_module_to_cpu(module, op=op)
+    move_module_to_cpu(module, op=op, force=force)
     try:
         module = accelerate.hooks.add_hook_to_module(module, offload_hook_instance, append=True)
     except Exception as e:
@@ -345,7 +357,7 @@ def apply_balanced_offload_to_module(module, op="apply"):
     if device_map and max_memory:
         module.balanced_offload_device_map = device_map
         module.balanced_offload_max_memory = max_memory
-    module.offload_post = shared.sd_model_type in offload_post and shared.opts.te_hijack and module_name.startswith("text_encoder")
+    module.offload_post = shared.sd_model_type in offload_post and module_name.startswith("text_encoder")
     if shared.opts.layerwise_quantization or getattr(module, 'quantization_method', None) == 'LayerWise':
         model_quant.apply_layerwise(module, quiet=True) # need to reapply since hooks were removed/readded
     devices.torch_gc(fast=True, force=True, reason='offload')
diff --git a/modules/sd_samplers.py b/modules/sd_samplers.py
index 644e9dd16..fac58a732 100644
--- a/modules/sd_samplers.py
+++ b/modules/sd_samplers.py
@@ -48,11 +48,6 @@ def find_sampler_config(name):
     return config
 
 
-def visible_sampler_names():
-    visible_samplers = [x for x in all_samplers if x.name in shared.opts.show_samplers] if len(shared.opts.show_samplers) > 0 else all_samplers
-    return visible_samplers
-
-
 def restore_default(model):
     if model is None:
         return None
@@ -131,7 +126,7 @@ def create_sampler(name, model):
 def set_samplers():
     global samplers # pylint: disable=global-statement
     global samplers_for_img2img # pylint: disable=global-statement
-    samplers = visible_sampler_names()
+    samplers = all_samplers
     # samplers_for_img2img = [x for x in samplers if x.name != "PLMS"]
     samplers_for_img2img = samplers
     samplers_map.clear()
diff --git a/modules/sd_vae.py b/modules/sd_vae.py
index 7986b2568..de60bd2d5 100644
--- a/modules/sd_vae.py
+++ b/modules/sd_vae.py
@@ -18,6 +18,9 @@ vae_scale_override = {
 
 
 def get_vae_scale_factor(model=None):
+    if not shared.sd_loaded:
+        vae_scale_factor = 8
+        return vae_scale_factor
     patch_size = 1
     if model is None:
         model = shared.sd_model
diff --git a/modules/sdnq/__init__.py b/modules/sdnq/__init__.py
index 642506842..34c2c4478 100644
--- a/modules/sdnq/__init__.py
+++ b/modules/sdnq/__init__.py
@@ -11,7 +11,7 @@ from diffusers.quantizers.quantization_config import QuantizationConfigMixin
 from diffusers.utils import get_module_from_name
 from modules import devices, shared
 
-from .common import dtype_dict, use_tensorwise_fp8_matmul, quantized_matmul_dtypes, allowed_types, conv_types, conv_transpose_types
+from .common import dtype_dict, use_tensorwise_fp8_matmul, allowed_types, conv_types, conv_transpose_types
 from .dequantizer import dequantizer_dict
 from .forward import get_forward_func
 
@@ -50,6 +50,7 @@ def quantize_weight(weight: torch.FloatTensor, reduction_axes: Union[int, List[i
 def sdnq_quantize_layer(layer, weights_dtype="int8", torch_dtype=None, group_size=0, quant_conv=False, use_quantized_matmul=False, use_quantized_matmul_conv=False, dequantize_fp32=False, non_blocking=False, quantization_device=None, return_device=None, param_name=None): # pylint: disable=unused-argument
     layer_class_name = layer.__class__.__name__
     if layer_class_name in allowed_types:
+        num_of_groups = 1
         is_conv_type = False
         is_conv_transpose_type = False
         is_linear_type = False
@@ -69,12 +70,9 @@ def sdnq_quantize_layer(layer, weights_dtype="int8", torch_dtype=None, group_siz
             group_channel_size = channel_size // layer.groups
             use_quantized_matmul = False
             if use_quantized_matmul_conv:
-                use_quantized_matmul = weights_dtype in quantized_matmul_dtypes and group_channel_size >= 32 and output_channel_size >= 32
+                use_quantized_matmul = group_channel_size >= 32 and output_channel_size >= 32
                 if use_quantized_matmul and not dtype_dict[weights_dtype]["is_integer"]:
                     use_quantized_matmul = output_channel_size % 16 == 0 and group_channel_size % 16 == 0
-                if use_quantized_matmul:
-                    result_shape = layer.weight.shape
-                    layer.weight.data = layer.weight.reshape(output_channel_size, -1)
         elif layer_class_name in conv_transpose_types:
             if not quant_conv:
                 return layer
@@ -90,9 +88,9 @@ def sdnq_quantize_layer(layer, weights_dtype="int8", torch_dtype=None, group_siz
             try:
                 output_channel_size, channel_size = layer.weight.shape
             except Exception as e:
-                raise ValueError(f"SDNQ: layer_class_name={layer_class_name} layer_weight_shape={layer.weight.shape} weights_dtype={weights_dtype} unsupported") from e
+                raise ValueError(f"SDNQ: param_name={param_name} layer_class_name={layer_class_name} layer_weight_shape={layer.weight.shape} weights_dtype={weights_dtype} unsupported") from e
             if use_quantized_matmul:
-                use_quantized_matmul = weights_dtype in quantized_matmul_dtypes and channel_size >= 32 and output_channel_size >= 32
+                use_quantized_matmul = channel_size >= 32 and output_channel_size >= 32
                 if use_quantized_matmul:
                     if dtype_dict[weights_dtype]["is_integer"]:
                         use_quantized_matmul = output_channel_size % 8 == 0 and channel_size % 8 == 0
@@ -100,14 +98,18 @@ def sdnq_quantize_layer(layer, weights_dtype="int8", torch_dtype=None, group_siz
                         use_quantized_matmul = output_channel_size % 16 == 0 and channel_size % 16 == 0
 
         if group_size == 0:
-            if is_linear_type:
+            if use_quantized_matmul and dtype_dict[weights_dtype]["num_bits"] >= 6:
+                group_size = -1
+            elif is_linear_type:
                 group_size = 2 ** (2 + dtype_dict[weights_dtype]["num_bits"])
             else:
                 group_size = 2 ** (1 + dtype_dict[weights_dtype]["num_bits"])
+        elif use_quantized_matmul and dtype_dict[weights_dtype]["num_bits"] == 8:
+            group_size = -1 # override user value, re-quantizing 8bit into 8bit is pointless
         elif group_size != -1 and not is_linear_type:
             group_size = max(group_size // 2, 1)
 
-        if not use_quantized_matmul and group_size > 0:
+        if group_size > 0:
             if group_size >= channel_size:
                 group_size = channel_size
                 num_of_groups = 1
@@ -159,12 +161,16 @@ def sdnq_quantize_layer(layer, weights_dtype="int8", torch_dtype=None, group_siz
             if zero_point is not None:
                 zero_point = zero_point.to(torch_dtype)
 
-        if use_quantized_matmul:
+        re_quantize_for_matmul = (num_of_groups > 1 or zero_point is not None)
+        if use_quantized_matmul and not re_quantize_for_matmul:
+            if is_conv_type:
+                result_shape = layer.weight.shape
+                layer.weight.data = layer.weight.reshape(output_channel_size, -1)
             scale.transpose_(0,1)
             layer.weight.transpose_(0,1)
             if not dtype_dict[weights_dtype]["is_integer"]:
-                stride = layer.weight.stride()
-                if stride[0] > stride[1] and stride[1] == 1:
+                weight_stride = layer.weight.stride()
+                if not (weight_stride[0] == 1 and weight_stride[1] > 1):
                     layer.weight.data = layer.weight.t().contiguous().t()
                 if not use_tensorwise_fp8_matmul:
                     scale = scale.to(torch.float32)
@@ -178,6 +184,7 @@ def sdnq_quantize_layer(layer, weights_dtype="int8", torch_dtype=None, group_siz
             original_shape=original_shape,
             weights_dtype=weights_dtype,
             use_quantized_matmul=use_quantized_matmul,
+            re_quantize_for_matmul=re_quantize_for_matmul,
         )
         layer.weight.data = layer.sdnq_dequantizer.pack_weight(layer.weight).to(return_device, non_blocking=non_blocking)
         layer.sdnq_dequantizer = layer.sdnq_dequantizer.to(return_device, non_blocking=non_blocking)
@@ -392,9 +399,15 @@ class SDNQQuantizer(DiffusersQuantizer):
         devices.torch_gc(force=True, reason='sdnq')
         return model
 
-    def get_cuda_warm_up_factor(self):
+    def get_accelerator_warm_up_factor(self):
         return 32 // dtype_dict[self.quantization_config.weights_dtype]["num_bits"]
 
+    def get_cuda_warm_up_factor(self):
+        """
+        needed for transformers compatibilty, returns self.get_accelerator_warm_up_factor
+        """
+        return self.get_accelerator_warm_up_factor()
+
     def update_tp_plan(self, config):
         """
         needed for transformers compatibilty, no-op function
@@ -425,6 +438,12 @@ class SDNQQuantizer(DiffusersQuantizer):
         """
         return param_name
 
+    def update_dtype(self, dtype: torch.dtype) -> torch.dtype:
+        """
+        needed for transformers compatibilty, no-op function
+        """
+        return dtype
+
     @property
     def is_trainable(self):
         return False
diff --git a/modules/sdnq/common.py b/modules/sdnq/common.py
index d3dc3e3b5..fab48c4b9 100644
--- a/modules/sdnq/common.py
+++ b/modules/sdnq/common.py
@@ -2,7 +2,7 @@
 
 import os
 import torch
-from modules import devices, shared
+from modules import shared
 
 torch_version = float(torch.__version__[:3])
 
@@ -34,10 +34,6 @@ if hasattr(torch, "float8_e5m2fnuz"):
 use_torch_compile = shared.opts.sdnq_dequantize_compile # this setting requires a full restart of the webui to apply
 use_tensorwise_fp8_matmul = os.environ.get('SDNQ_USE_TENSORWISE_FP8_MATMUL', "1").lower() not in {"0", "false", "no"} # row-wise FP8 only exist on H100 hardware, sdnq will use software row-wise with tensorwise hardware with this setting
 
-quantized_matmul_dtypes = ("int8", "int7", "int6", "int5", "int4", "int3", "int2", "float8_e4m3fn", "float8_e5m2")
-if devices.backend in {"cpu", "openvino"}:
-    quantized_matmul_dtypes += ("float8_e4m3fnuz", "float8_e5m2fnuz")
-
 linear_types = ("Linear",)
 conv_types = ("Conv1d", "Conv2d", "Conv3d")
 conv_transpose_types = ("ConvTranspose1d", "ConvTranspose2d", "ConvTranspose3d")
diff --git a/modules/sdnq/dequantizer.py b/modules/sdnq/dequantizer.py
index 1461b923a..9000f4475 100644
--- a/modules/sdnq/dequantizer.py
+++ b/modules/sdnq/dequantizer.py
@@ -1,5 +1,7 @@
 # pylint: disable=redefined-builtin,no-member,protected-access
 
+from typing import Tuple
+
 import torch
 
 from .common import dtype_dict, use_torch_compile
@@ -34,6 +36,34 @@ def dequantize_packed_int_symmetric(weight: torch.ByteTensor, scale: torch.Float
     return dequantize_symmetric(unpack_int_symetric(weight, shape, weights_dtype, dtype=scale.dtype), scale, dtype, result_shape, skip_quantized_matmul=skip_quantized_matmul)
 
 
+def quantize_int8(input: torch.FloatTensor, dim: int = -1) -> Tuple[torch.CharTensor, torch.FloatTensor]:
+    scale = torch.amax(input.abs(), dim=dim, keepdims=True).div_(127)
+    input = torch.div(input, scale).round_().clamp_(-128, 127).to(dtype=torch.int8)
+    return input, scale
+
+
+def re_quantize_matmul_asymmetric(weight: torch.ByteTensor, scale: torch.FloatTensor, zero_point: torch.FloatTensor, result_shape: torch.Size) -> Tuple[torch.CharTensor, torch.FloatTensor]:
+    result = dequantize_asymmetric(weight, scale, zero_point, scale.dtype, result_shape)
+    if result.ndim > 2: # convs
+        result = result.flatten(1,-1)
+    return quantize_int8(result.t_(), dim=0)
+
+
+def re_quantize_matmul_symmetric(weight: torch.CharTensor, scale: torch.FloatTensor, result_shape: torch.Size) -> Tuple[torch.CharTensor, torch.FloatTensor]:
+    result = dequantize_symmetric(weight, scale, scale.dtype, result_shape)
+    if result.ndim > 2: # convs
+        result = result.flatten(1,-1)
+    return quantize_int8(result.t_(), dim=0)
+
+
+def re_quantize_matmul_packed_int_asymmetric(weight: torch.ByteTensor, scale: torch.FloatTensor, zero_point: torch.FloatTensor, shape: torch.Size, result_shape: torch.Size, weights_dtype: str) -> torch.FloatTensor:
+    return re_quantize_matmul_asymmetric(unpack_int_asymetric(weight, shape, weights_dtype), scale, zero_point, result_shape)
+
+
+def re_quantize_matmul_packed_int_symmetric(weight: torch.ByteTensor, scale: torch.FloatTensor, shape: torch.Size, result_shape: torch.Size, weights_dtype: str) -> torch.FloatTensor:
+    return re_quantize_matmul_symmetric(unpack_int_symetric(weight, shape, weights_dtype, dtype=scale.dtype), scale, result_shape)
+
+
 class AsymmetricWeightsDequantizer(torch.nn.Module):
     def __init__(
         self,
@@ -43,12 +73,14 @@ class AsymmetricWeightsDequantizer(torch.nn.Module):
         result_shape: torch.Size,
         original_shape: torch.Size,
         weights_dtype: str,
+        use_quantized_matmul: bool = False,
         **kwargs, # pylint: disable=unused-argument
     ):
         super().__init__()
         self.weights_dtype = weights_dtype
         self.original_shape = original_shape
-        self.use_quantized_matmul = False
+        self.use_quantized_matmul = use_quantized_matmul
+        self.re_quantize_for_matmul = True
         self.result_dtype = result_dtype
         self.result_shape = result_shape
         self.register_buffer("scale", scale)
@@ -57,6 +89,9 @@ class AsymmetricWeightsDequantizer(torch.nn.Module):
     def pack_weight(self, weight: torch.Tensor) -> torch.Tensor:
         return weight.to(dtype=dtype_dict[self.weights_dtype]["torch_dtype"])
 
+    def re_quantize_matmul(self, weight, **kwargs): # pylint: disable=unused-argument
+        return re_quantize_matmul_asymmetric_compiled(weight, self.scale, self.zero_point, self.result_shape)
+
     def forward(self, weight, **kwargs): # pylint: disable=unused-argument
         return dequantize_asymmetric_compiled(weight, self.scale, self.zero_point, self.result_dtype, self.result_shape)
 
@@ -70,12 +105,14 @@ class SymmetricWeightsDequantizer(torch.nn.Module):
         original_shape: torch.Size,
         weights_dtype: str,
         use_quantized_matmul: bool = False,
+        re_quantize_for_matmul: bool = False,
         **kwargs, # pylint: disable=unused-argument
     ):
         super().__init__()
         self.weights_dtype = weights_dtype
         self.original_shape = original_shape
         self.use_quantized_matmul = use_quantized_matmul
+        self.re_quantize_for_matmul = re_quantize_for_matmul
         self.result_dtype = result_dtype
         self.result_shape = result_shape
         self.register_buffer("scale", scale)
@@ -83,7 +120,11 @@ class SymmetricWeightsDequantizer(torch.nn.Module):
     def pack_weight(self, weight: torch.Tensor) -> torch.Tensor:
         return weight.to(dtype=dtype_dict[self.weights_dtype]["torch_dtype"])
 
+    def re_quantize_matmul(self, weight, **kwargs): # pylint: disable=unused-argument
+        return re_quantize_matmul_symmetric_compiled(weight, self.scale, self.result_shape)
+
     def forward(self, weight, skip_quantized_matmul=False, **kwargs): # pylint: disable=unused-argument
+        skip_quantized_matmul = skip_quantized_matmul and not self.re_quantize_for_matmul
         return dequantize_symmetric_compiled(weight, self.scale, self.result_dtype, self.result_shape, skip_quantized_matmul=skip_quantized_matmul)
 
 
@@ -97,11 +138,13 @@ class PackedINTAsymmetricWeightsDequantizer(torch.nn.Module):
         result_shape: torch.Size,
         original_shape: torch.Size,
         weights_dtype: str,
+        use_quantized_matmul: bool = False,
         **kwargs, # pylint: disable=unused-argument
     ):
         super().__init__()
         self.weights_dtype = weights_dtype
-        self.use_quantized_matmul = False
+        self.use_quantized_matmul = use_quantized_matmul
+        self.re_quantize_for_matmul = True
         self.original_shape = original_shape
         self.quantized_weight_shape = quantized_weight_shape
         self.result_dtype = result_dtype
@@ -112,6 +155,9 @@ class PackedINTAsymmetricWeightsDequantizer(torch.nn.Module):
     def pack_weight(self, weight: torch.Tensor) -> torch.Tensor:
         return pack_int_asymetric(weight, self.weights_dtype)
 
+    def re_quantize_matmul(self, weight, **kwargs): # pylint: disable=unused-argument
+        return re_quantize_matmul_packed_int_asymmetric_compiled(weight, self.scale, self.zero_point, self.quantized_weight_shape, self.result_shape, self.weights_dtype)
+
     def forward(self, weight, **kwargs): # pylint: disable=unused-argument
         return dequantize_packed_int_asymmetric_compiled(weight, self.scale, self.zero_point, self.quantized_weight_shape, self.result_dtype, self.result_shape, self.weights_dtype)
 
@@ -126,12 +172,14 @@ class PackedINTSymmetricWeightsDequantizer(torch.nn.Module):
         original_shape: torch.Size,
         weights_dtype: str,
         use_quantized_matmul: bool = False,
+        re_quantize_for_matmul: bool = False,
         **kwargs, # pylint: disable=unused-argument
     ):
         super().__init__()
         self.weights_dtype = weights_dtype
         self.original_shape = original_shape
         self.use_quantized_matmul = use_quantized_matmul
+        self.re_quantize_for_matmul = re_quantize_for_matmul
         self.quantized_weight_shape = quantized_weight_shape
         self.result_dtype = result_dtype
         self.result_shape = result_shape
@@ -140,7 +188,11 @@ class PackedINTSymmetricWeightsDequantizer(torch.nn.Module):
     def pack_weight(self, weight: torch.Tensor) -> torch.Tensor:
         return pack_int_symetric(weight, self.weights_dtype)
 
+    def re_quantize_matmul(self, weight, **kwargs): # pylint: disable=unused-argument
+        return re_quantize_matmul_packed_int_symmetric_compiled(weight, self.scale, self.quantized_weight_shape, self.result_shape, self.weights_dtype)
+
     def forward(self, weight, skip_quantized_matmul=False, **kwargs): # pylint: disable=unused-argument
+        skip_quantized_matmul = skip_quantized_matmul and not self.re_quantize_for_matmul
         return dequantize_packed_int_symmetric_compiled(weight, self.scale, self.quantized_weight_shape, self.result_dtype, self.result_shape, self.weights_dtype, skip_quantized_matmul=skip_quantized_matmul)
 
 
@@ -159,8 +211,8 @@ dequantizer_dict = {
     "uint4": PackedINTAsymmetricWeightsDequantizer,
     "uint3": PackedINTAsymmetricWeightsDequantizer,
     "uint2": PackedINTAsymmetricWeightsDequantizer,
-    "uint1": AsymmetricWeightsDequantizer,
-    "bool": AsymmetricWeightsDequantizer,
+    "uint1": PackedINTAsymmetricWeightsDequantizer,
+    "bool": PackedINTAsymmetricWeightsDequantizer,
     "float8_e4m3fn": SymmetricWeightsDequantizer,
     "float8_e4m3fnuz": SymmetricWeightsDequantizer,
     "float8_e5m2": SymmetricWeightsDequantizer,
@@ -173,8 +225,16 @@ if use_torch_compile:
     dequantize_symmetric_compiled = torch.compile(dequantize_symmetric, fullgraph=True, dynamic=False)
     dequantize_packed_int_asymmetric_compiled = torch.compile(dequantize_packed_int_asymmetric, fullgraph=True, dynamic=False)
     dequantize_packed_int_symmetric_compiled = torch.compile(dequantize_packed_int_symmetric, fullgraph=True, dynamic=False)
+    re_quantize_matmul_asymmetric_compiled = torch.compile(re_quantize_matmul_asymmetric, fullgraph=True, dynamic=False)
+    re_quantize_matmul_symmetric_compiled = torch.compile(re_quantize_matmul_symmetric, fullgraph=True, dynamic=False)
+    re_quantize_matmul_packed_int_asymmetric_compiled = torch.compile(re_quantize_matmul_packed_int_asymmetric, fullgraph=True, dynamic=False)
+    re_quantize_matmul_packed_int_symmetric_compiled = torch.compile(re_quantize_matmul_packed_int_symmetric, fullgraph=True, dynamic=False)
 else:
     dequantize_asymmetric_compiled = dequantize_asymmetric
     dequantize_symmetric_compiled = dequantize_symmetric
     dequantize_packed_int_asymmetric_compiled = dequantize_packed_int_asymmetric
     dequantize_packed_int_symmetric_compiled = dequantize_packed_int_symmetric
+    re_quantize_matmul_asymmetric_compiled = re_quantize_matmul_asymmetric
+    re_quantize_matmul_symmetric_compiled = re_quantize_matmul_symmetric
+    re_quantize_matmul_packed_int_asymmetric_compiled = re_quantize_matmul_packed_int_asymmetric
+    re_quantize_matmul_packed_int_symmetric_compiled = re_quantize_matmul_packed_int_symmetric
diff --git a/modules/sdnq/layers/conv/conv_fp8.py b/modules/sdnq/layers/conv/conv_fp8.py
index 11bf9f424..cfce8994a 100644
--- a/modules/sdnq/layers/conv/conv_fp8.py
+++ b/modules/sdnq/layers/conv/conv_fp8.py
@@ -25,7 +25,9 @@ def conv_fp8_matmul(
     input, input_scale = quantize_fp8_matmul_input(input)
 
     if groups == 1:
-        result = torch._scaled_mm(input, weight, scale_a=input_scale, scale_b=scale, bias=bias, out_dtype=return_dtype).view(mm_output_shape)
+        if bias is not None and bias.dtype != torch.bfloat16:
+            bias = bias.to(dtype=torch.bfloat16)
+        result = torch._scaled_mm(input, weight, scale_a=input_scale, scale_b=scale, bias=bias, out_dtype=torch.bfloat16).view(mm_output_shape).to(return_dtype)
     else:
         scale = scale.view(groups, 1, scale.shape[1] // groups)
         input_scale = input_scale.view(groups, input_scale.shape[0] // groups, 1)
@@ -34,12 +36,14 @@ def conv_fp8_matmul(
         result = []
         if bias is not None:
             bias = bias.view(groups, bias.shape[0] // groups)
+            if bias.dtype != torch.bfloat16:
+                bias = bias.to(dtype=torch.bfloat16)
             for i in range(groups):
-                result.append(torch._scaled_mm(input[:, i], weight[:, i], scale_a=input_scale[i], scale_b=scale[i], bias=bias[i], out_dtype=return_dtype))
+                result.append(torch._scaled_mm(input[:, i], weight[:, i], scale_a=input_scale[i], scale_b=scale[i], bias=bias[i], out_dtype=torch.bfloat16))
         else:
             for i in range(groups):
-                result.append(torch._scaled_mm(input[:, i], weight[:, i], scale_a=input_scale[i], scale_b=scale[i], bias=None, out_dtype=return_dtype))
-        result = torch.cat(result, dim=-1).view(mm_output_shape)
+                result.append(torch._scaled_mm(input[:, i], weight[:, i], scale_a=input_scale[i], scale_b=scale[i], bias=None, out_dtype=torch.bfloat16))
+        result = torch.cat(result, dim=-1).view(mm_output_shape).to(return_dtype)
 
     if conv_type == 1:
         result = result.transpose_(1,2)
diff --git a/modules/sdnq/layers/conv/conv_int8.py b/modules/sdnq/layers/conv/conv_int8.py
index 625063dfa..cd18b1db6 100644
--- a/modules/sdnq/layers/conv/conv_int8.py
+++ b/modules/sdnq/layers/conv/conv_int8.py
@@ -16,8 +16,8 @@ def conv_int8_matmul(
     weight: torch.CharTensor,
     bias: torch.FloatTensor,
     scale: torch.FloatTensor,
-    result_shape: torch.Size,
     quantized_weight_shape: torch.Size,
+    result_shape: torch.Size,
     weights_dtype: str,
     reversed_padding_repeated_twice: List[int],
     padding_mode: str, conv_type: int,
@@ -57,11 +57,17 @@ def quantized_conv_forward_int8_matmul(self, input) -> torch.FloatTensor:
     if torch.numel(input) / input.shape[2] < 32:
         return self._conv_forward(input, self.sdnq_dequantizer(self.weight, skip_quantized_matmul=True), self.bias)
     conv_type, stride, padding, dilation = get_conv_args(input.ndim, self.stride, self.padding, self.dilation)
+    if self.sdnq_dequantizer.re_quantize_for_matmul:
+        weight, scale = self.sdnq_dequantizer.re_quantize_matmul(self.weight)
+        quantized_weight_shape = None
+    else:
+        weight = self.weight
+        scale = self.sdnq_dequantizer.scale
+        quantized_weight_shape = getattr(self.sdnq_dequantizer, "quantized_weight_shape", None)
     return conv_int8_matmul(
-        input, self.weight, self.bias,
-        self.sdnq_dequantizer.scale,
+        input, weight, self.bias,
+        scale, quantized_weight_shape,
         self.sdnq_dequantizer.result_shape,
-        getattr(self.sdnq_dequantizer, "quantized_weight_shape", None),
         self.sdnq_dequantizer.weights_dtype,
         self._reversed_padding_repeated_twice,
         self.padding_mode, conv_type,
diff --git a/modules/sdnq/layers/linear/linear_fp8.py b/modules/sdnq/layers/linear/linear_fp8.py
index ba261eb50..9b22f4696 100644
--- a/modules/sdnq/layers/linear/linear_fp8.py
+++ b/modules/sdnq/layers/linear/linear_fp8.py
@@ -8,7 +8,7 @@ from ...common import use_torch_compile # noqa: TID252
 
 
 def quantize_fp8_matmul_input(input: torch.FloatTensor) -> Tuple[torch.Tensor, torch.FloatTensor]:
-    input = input.flatten(0,-2).contiguous().to(dtype=torch.float32)
+    input = input.flatten(0,-2).to(dtype=torch.float32)
     input_scale = torch.amax(input.abs(), dim=-1, keepdims=True).div_(448)
     input = torch.div(input, input_scale).clamp_(-448, 448).to(dtype=torch.float8_e4m3fn)
     return input, input_scale
@@ -23,7 +23,9 @@ def fp8_matmul(
     return_dtype = input.dtype
     output_shape = (*input.shape[:-1], weight.shape[-1])
     input, input_scale = quantize_fp8_matmul_input(input)
-    return torch._scaled_mm(input, weight, scale_a=input_scale, scale_b=scale, bias=bias, out_dtype=return_dtype).view(output_shape)
+    if bias is not None and bias.dtype != torch.bfloat16:
+        bias = bias.to(dtype=torch.bfloat16)
+    return torch._scaled_mm(input, weight, scale_a=input_scale, scale_b=scale, bias=bias, out_dtype=torch.bfloat16).view(output_shape).to(return_dtype)
 
 
 def quantized_linear_forward_fp8_matmul(self, input: torch.FloatTensor) -> torch.FloatTensor:
diff --git a/modules/sdnq/layers/linear/linear_fp8_tensorwise.py b/modules/sdnq/layers/linear/linear_fp8_tensorwise.py
index f07154aab..778194e3d 100644
--- a/modules/sdnq/layers/linear/linear_fp8_tensorwise.py
+++ b/modules/sdnq/layers/linear/linear_fp8_tensorwise.py
@@ -9,7 +9,7 @@ from ...dequantizer import dequantize_symmetric, dequantize_symmetric_with_bias
 
 
 def quantize_fp8_matmul_input_tensorwise(input: torch.FloatTensor, scale: torch.FloatTensor) -> Tuple[torch.Tensor, torch.FloatTensor]:
-    input = input.flatten(0,-2).contiguous().to(dtype=scale.dtype)
+    input = input.flatten(0,-2).to(dtype=scale.dtype)
     input_scale = torch.amax(input.abs(), dim=-1, keepdims=True).div_(448)
     input = torch.div(input, input_scale).clamp_(-448, 448).to(dtype=torch.float8_e4m3fn)
     scale = torch.mul(input_scale, scale)
diff --git a/modules/sdnq/layers/linear/linear_int8.py b/modules/sdnq/layers/linear/linear_int8.py
index 6d94052a2..ca202ea6e 100644
--- a/modules/sdnq/layers/linear/linear_int8.py
+++ b/modules/sdnq/layers/linear/linear_int8.py
@@ -1,18 +1,16 @@
 # pylint: disable=relative-beyond-top-level,redefined-builtin,protected-access
 
 from typing import Tuple
-
 import torch
 
 from ...common import use_torch_compile # noqa: TID252
 from ...packed_int import unpack_int_symetric # noqa: TID252
-from ...dequantizer import dequantize_symmetric, dequantize_symmetric_with_bias # noqa: TID252
+from ...dequantizer import quantize_int8, dequantize_symmetric, dequantize_symmetric_with_bias # noqa: TID252
 
 
 def quantize_int8_matmul_input(input: torch.FloatTensor, scale: torch.FloatTensor) -> Tuple[torch.CharTensor, torch.FloatTensor]:
-    input = input.flatten(0,-2).contiguous().to(dtype=scale.dtype)
-    input_scale = torch.amax(input.abs(), dim=-1, keepdims=True).div_(127)
-    input = torch.div(input, input_scale).round_().clamp_(-128, 127).to(dtype=torch.int8)
+    input = input.flatten(0,-2).to(dtype=scale.dtype)
+    input, input_scale = quantize_int8(input, dim=-1)
     scale = torch.mul(input_scale, scale)
     if scale.dtype == torch.float16: # fp16 will overflow
         scale = scale.to(dtype=torch.float32)
@@ -41,7 +39,14 @@ def int8_matmul(
 def quantized_linear_forward_int8_matmul(self, input: torch.FloatTensor) -> torch.FloatTensor:
     if torch.numel(input) / input.shape[-1] < 32:
         return torch.nn.functional.linear(input, self.sdnq_dequantizer(self.weight, skip_quantized_matmul=True), self.bias)
-    return int8_matmul(input, self.weight, self.bias, self.sdnq_dequantizer.scale, getattr(self.sdnq_dequantizer, "quantized_weight_shape", None), self.sdnq_dequantizer.weights_dtype)
+    if self.sdnq_dequantizer.re_quantize_for_matmul:
+        weight, scale = self.sdnq_dequantizer.re_quantize_matmul(self.weight)
+        quantized_weight_shape = None
+    else:
+        weight = self.weight
+        scale = self.sdnq_dequantizer.scale
+        quantized_weight_shape = getattr(self.sdnq_dequantizer, "quantized_weight_shape", None)
+    return int8_matmul(input, weight, self.bias, scale, quantized_weight_shape, self.sdnq_dequantizer.weights_dtype)
 
 
 if use_torch_compile:
diff --git a/modules/sdnq/packed_int.py b/modules/sdnq/packed_int.py
index bb0ea1ec8..c93cbc7e9 100644
--- a/modules/sdnq/packed_int.py
+++ b/modules/sdnq/packed_int.py
@@ -122,6 +122,21 @@ def pack_uint2(tensor: torch.ByteTensor) -> torch.ByteTensor:
     return packed_tensor
 
 
+def pack_uint1(tensor: torch.Tensor) -> torch.Tensor:
+    packed_tensor = tensor.contiguous().reshape(-1, 8)
+    packed_tensor = torch.bitwise_or(
+        torch.bitwise_or(
+            torch.bitwise_or(packed_tensor[:, 0], torch.bitwise_left_shift(packed_tensor[:, 1], 1)),
+            torch.bitwise_or(torch.bitwise_left_shift(packed_tensor[:, 2], 2), torch.bitwise_left_shift(packed_tensor[:, 3], 3))
+        ),
+        torch.bitwise_or(
+            torch.bitwise_or(torch.bitwise_left_shift(packed_tensor[:, 4], 4), torch.bitwise_left_shift(packed_tensor[:, 5], 5)),
+            torch.bitwise_or(torch.bitwise_left_shift(packed_tensor[:, 6], 6), torch.bitwise_left_shift(packed_tensor[:, 7], 7))
+        ),
+    )
+    return packed_tensor
+
+
 def unpack_uint7(packed_tensor: torch.ByteTensor, shape: torch.Size) -> torch.ByteTensor:
     result = torch.stack(
         (
@@ -246,6 +261,23 @@ def unpack_uint2(packed_tensor: torch.ByteTensor, shape: torch.Size) -> torch.By
     return result
 
 
+def unpack_uint1(packed_tensor: torch.Tensor, shape: torch.Size) -> torch.Tensor:
+    result = torch.stack(
+        (
+            torch.bitwise_and(packed_tensor, 1),
+            torch.bitwise_and(torch.bitwise_right_shift(packed_tensor, 1), 1),
+            torch.bitwise_and(torch.bitwise_right_shift(packed_tensor, 2), 1),
+            torch.bitwise_and(torch.bitwise_right_shift(packed_tensor, 3), 1),
+            torch.bitwise_and(torch.bitwise_right_shift(packed_tensor, 4), 1),
+            torch.bitwise_and(torch.bitwise_right_shift(packed_tensor, 5), 1),
+            torch.bitwise_and(torch.bitwise_right_shift(packed_tensor, 6), 1),
+            torch.bitwise_right_shift(packed_tensor, 7),
+        ),
+        dim=-1
+    ).reshape(shape)
+    return result
+
+
 packed_int_function_dict = {
     "int7": {"pack": pack_uint7, "unpack": unpack_uint7},
     "int6": {"pack": pack_uint6, "unpack": unpack_uint6},
@@ -259,4 +291,6 @@ packed_int_function_dict = {
     "uint4": {"pack": pack_uint4, "unpack": unpack_uint4},
     "uint3": {"pack": pack_uint3, "unpack": unpack_uint3},
     "uint2": {"pack": pack_uint2, "unpack": unpack_uint2},
+    "uint1": {"pack": pack_uint1, "unpack": unpack_uint1},
+    "bool": {"pack": pack_uint1, "unpack": unpack_uint1},
 }
diff --git a/modules/shared.py b/modules/shared.py
index 8129cd386..b4da378f4 100644
--- a/modules/shared.py
+++ b/modules/shared.py
@@ -160,14 +160,14 @@ options_templates.update(options_section(('model_options', "Model Options"), {
     "model_h1_sep": OptionInfo("<h2>HiDream</h2>", "", gr.HTML),
     "model_h1_llama_repo": OptionInfo("Default", "LLama repo", gr.Textbox),
     "model_wan_sep": OptionInfo("<h2>WanAI</h2>", "", gr.HTML),
-    "model_wan_stage": OptionInfo("first", "Processing stage", gr.Radio, {"choices": ['high noise', 'low noise', 'combined'] }),
+    "model_wan_stage": OptionInfo("low noise", "Processing stage", gr.Radio, {"choices": ['high noise', 'low noise', 'combined'] }),
     "model_wan_boundary": OptionInfo(0.85, "Stage boundary ratio", gr.Slider, {"minimum": 0, "maximum": 1.0, "step": 0.05 }),
 }))
 
 options_templates.update(options_section(('offload', "Model Offloading"), {
     "offload_sep": OptionInfo("<h2>Model Offloading</h2>", "", gr.HTML),
     "diffusers_offload_mode": OptionInfo(startup_offload_mode, "Model offload mode", gr.Radio, {"choices": ['none', 'balanced', 'group', 'model', 'sequential']}),
-    "diffusers_offload_pre": OptionInfo(False, "Offload during pre-forward"),
+    "diffusers_offload_pre": OptionInfo(True, "Offload during pre-forward"),
     "diffusers_offload_nonblocking": OptionInfo(False, "Non-blocking move operations"),
     "diffusers_offload_min_gpu_memory": OptionInfo(startup_offload_min_gpu, "Balanced offload GPU low watermark", gr.Slider, {"minimum": 0, "maximum": 1, "step": 0.01 }),
     "diffusers_offload_max_gpu_memory": OptionInfo(startup_offload_max_gpu, "Balanced offload GPU high watermark", gr.Slider, {"minimum": 0.1, "maximum": 1, "step": 0.01 }),
@@ -251,7 +251,6 @@ options_templates.update(options_section(('text_encoder', "Text Encoder"), {
     "sd_textencoder_cache_size": OptionInfo(4, "Text encoder cache size", gr.Slider, {"minimum": 0, "maximum": 16, "step": 1}),
     "sd_textencder_linebreak": OptionInfo(True, "Use line break as prompt segment marker", gr.Checkbox),
     "diffusers_zeros_prompt_pad": OptionInfo(False, "Use zeros for prompt padding", gr.Checkbox),
-    "te_hijack": OptionInfo(True, "Offload after prompt encode", gr.Checkbox),
     "te_optional_sep": OptionInfo("<h2>Optional</h2>", "", gr.HTML),
     "te_shared_t5": OptionInfo(True, "T5: Use shared instance of text encoder"),
     "te_pooled_embeds": OptionInfo(False, "SDXL: Use weighted pooled embeds"),
@@ -520,7 +519,7 @@ options_templates.update(options_section(('image-metadata', "Image Metadata"), {
 
 options_templates.update(options_section(('ui', "User Interface"), {
     "themes_sep_ui": OptionInfo("<h2>Theme options</h2>", "", gr.HTML),
-    "theme_type": OptionInfo("Standard", "Theme type", gr.Radio, {"choices": ["Modern", "Standard", "None"]}),
+    "theme_type": OptionInfo("Modern", "Theme type", gr.Radio, {"choices": ["Modern", "Standard", "None"]}),
     "theme_style": OptionInfo("Auto", "Theme mode", gr.Radio, {"choices": ["Auto", "Dark", "Light"]}),
     "gradio_theme": OptionInfo("black-teal", "UI theme", gr.Dropdown, lambda: {"choices": theme.list_themes()}, refresh=theme.refresh_themes),
 
@@ -533,10 +532,10 @@ options_templates.update(options_section(('ui', "User Interface"), {
     "subpath": OptionInfo("", "Mount URL subpath"),
     "ui_request_timeout": OptionInfo(120000, "UI request timeout", gr.Slider, {"minimum": 1000, "maximum": 300000, "step": 10}),
 
-    "cards_sep_ui": OptionInfo("<h2>Card options</h2>", "", gr.HTML),
-    "extra_networks_card_size": OptionInfo(140, "UI card size (px)", gr.Slider, {"minimum": 20, "maximum": 2000, "step": 1}),
-    "extra_networks_card_cover": OptionInfo("sidebar", "UI position", gr.Radio, {"choices": ["cover", "inline", "sidebar"]}),
-    "extra_networks_card_square": OptionInfo(True, "UI disable variable aspect ratio"),
+    "cards_sep_ui": OptionInfo("<h2>Networks panel</h2>", "", gr.HTML),
+    "extra_networks_card_size": OptionInfo(140, "Network card size (px)", gr.Slider, {"minimum": 20, "maximum": 2000, "step": 1}),
+    "extra_networks_card_cover": OptionInfo("sidebar", "Network panel position", gr.Radio, {"choices": ["cover", "inline", "sidebar"]}),
+    "extra_networks_card_square": OptionInfo(True, "Disable variable aspect ratio"),
 
     "other_sep_ui": OptionInfo("<h2>Other...</h2>", "", gr.HTML),
     "ui_locale": OptionInfo("Auto", "UI locale", gr.Dropdown, lambda: {"choices": theme.list_locales()}),
@@ -556,7 +555,6 @@ options_templates.update(options_section(('ui', "User Interface"), {
     "return_mask_composite": OptionInfo(False, "Inpainting include masked composite in results"),
     "send_seed": OptionInfo(True, "Send seed when sending prompt or image to other interface", gr.Checkbox, {"visible": False}),
     "send_size": OptionInfo(False, "Send size when sending prompt or image to another interface", gr.Checkbox, {"visible": False}),
-
 }))
 
 options_templates.update(options_section(('live-preview', "Live Previews"), {
diff --git a/modules/shared_defaults.py b/modules/shared_defaults.py
index 8ada27a0c..a2d23b2f0 100644
--- a/modules/shared_defaults.py
+++ b/modules/shared_defaults.py
@@ -20,10 +20,12 @@ def get_default_modes(cmd_opts, mem_stat):
                 cmd_opts.medvram = True # VAE Tiling and other stuff
                 default_offload_mode = "balanced"
                 default_diffusers_offload_min_gpu_memory = 0
+                default_diffusers_offload_always = ', '.join(['T5EncoderModel', 'UMT5EncoderModel'])
                 log.info(f"Device detect: memory={gpu_memory:.1f} default=balanced optimization=medvram")
             elif gpu_memory >= 24:
                 default_offload_mode = "balanced"
                 default_diffusers_offload_max_gpu_memory = 0.8
+                default_diffusers_offload_always = ', '.join(['T5EncoderModel', 'UMT5EncoderModel'])
                 default_diffusers_offload_never = ', '.join(['CLIPTextModel', 'CLIPTextModelWithProjection', 'AutoencoderKL'])
                 log.info(f"Device detect: memory={gpu_memory:.1f} default=balanced optimization=highvram")
             else:
diff --git a/modules/shared_items.py b/modules/shared_items.py
index 791dc0436..938d9064a 100644
--- a/modules/shared_items.py
+++ b/modules/shared_items.py
@@ -54,6 +54,7 @@ pipelines = {
     'SegMoE': getattr(diffusers, 'DiffusionPipeline', None),
     'FLite': getattr(diffusers, 'DiffusionPipeline', None),
     'Bria': getattr(diffusers, 'DiffusionPipeline', None),
+    'hdm': getattr(diffusers, 'DiffusionPipeline', None),
 }
 
 
diff --git a/modules/ui_caption.py b/modules/ui_caption.py
index a4da339d3..32c549bdc 100644
--- a/modules/ui_caption.py
+++ b/modules/ui_caption.py
@@ -77,7 +77,7 @@ def create_ui():
                 with gr.Tab("CLiP Interrogate", elem_id='tab_clip_interrogate'):
                     with gr.Row():
                         clip_model = gr.Dropdown([], value=shared.opts.interrogate_clip_model, label='CLiP model', elem_id='clip_clip_model')
-                        ui_common.create_refresh_button(clip_model, openclip.refresh_clip_models, lambda: {"choices": openclip.refresh_clip_models()}, 'clip_refresh_models')
+                        ui_common.create_refresh_button(clip_model, openclip.refresh_clip_models, lambda: {"choices": openclip.refresh_clip_models()}, 'clip_models_refresh')
                         blip_model = gr.Dropdown(list(openclip.caption_models), value=shared.opts.interrogate_blip_model, label='Caption model', elem_id='btN_clip_blip_model')
                         clip_mode = gr.Dropdown(openclip.caption_types, label='Mode', value='fast', elem_id='clip_clip_mode')
                     with gr.Accordion(label='Advanced options', open=False, visible=True):
diff --git a/modules/ui_common.py b/modules/ui_common.py
index 292adcdef..8d7147d26 100644
--- a/modules/ui_common.py
+++ b/modules/ui_common.py
@@ -68,15 +68,18 @@ def delete_files(js_data, files, all_files, index):
         start_index = index
     deleted = []
     all_files = [f.split('/file=')[1] if 'file=' in f else f for f in all_files] if isinstance(all_files, list) else []
+    all_files = [os.path.normpath(f) for f in all_files]
     for _image_index, filedata in enumerate(files, start_index):
         try:
-            fn = filedata['name']
+            fn = os.path.normpath(filedata['name'])
             if os.path.exists(fn) and os.path.isfile(fn):
                 deleted.append(fn)
                 os.remove(fn)
                 if fn in all_files:
                     all_files.remove(fn)
-                shared.log.info(f'Delete: image="{fn}"')
+                    shared.log.info(f'Delete: image="{fn}"')
+                else:
+                    shared.log.warning(f'Delete: image="{fn}" ui mismatch')
             base, _ext = os.path.splitext(fn)
             desc = f'{base}.txt'
             if os.path.exists(desc) and os.path.isfile(desc):
@@ -333,7 +336,7 @@ def create_refresh_button(refresh_component, refresh_method, refreshed_args = No
         return gr.update(**args)
 
     refresh_button = ui_components.ToolButton(value=ui_symbols.refresh, elem_id=elem_id, visible=visible)
-    refresh_button.click(fn=refresh, inputs=[], outputs=[refresh_component])
+    refresh_button.click(fn=refresh, inputs=[], outputs=[refresh_component], show_progress=False)
     return refresh_button
 
 
@@ -344,7 +347,26 @@ def create_override_inputs(tab): # pylint: disable=unused-argument
     return override_settings
 
 
-def connect_reuse_seed(seed: gr.Number, reuse_seed: gr.Button, generation_info: gr.Textbox, is_subseed, subseed_strength=None):
+def reuse_seed(seed_component: gr.Number, reuse_button: gr.Button, subseed:bool=False):
+    def reuse_click(selected_gallery_index):
+        selected_gallery_index = int(selected_gallery_index)
+        from modules import processing
+        if processing.processed is None:
+            seed = -1
+        elif selected_gallery_index >= len(processing.processed.all_seeds):
+            selected_gallery_index -= len(processing.processed.images) - len(processing.processed.all_seeds) # if we have more images than seeds it is likely the grid image
+            seed = processing.processed.all_seeds[selected_gallery_index] if not subseed else processing.processed.all_subseeds[selected_gallery_index]
+        elif len(processing.processed.all_seeds) > 0:
+            seed = processing.processed.all_seeds[0] if not subseed else processing.processed.all_subseeds[0]
+        else:
+            seed = -1
+        shared.log.debug(f'Reuse seed: index={selected_gallery_index} seed={seed} subseed={subseed}')
+        return seed
+
+    reuse_button.click(fn=reuse_click, _js="selected_gallery_index", inputs=[seed_component], outputs=[seed_component], show_progress=False)
+
+
+def connect_reuse_seed(seed: gr.Number, reuse_seed_btn: gr.Button, generation_info: gr.Textbox, is_subseed, subseed_strength=None):
     """ Connects a 'reuse (sub)seed' button's click event so that it copies last used
         (sub)seed value from generation info the to the seed field. If copying subseed and subseed strength
         was 0, i.e. no variation seed was used, it copies the normal seed value instead."""
@@ -372,9 +394,9 @@ def connect_reuse_seed(seed: gr.Number, reuse_seed: gr.Button, generation_info:
             return [restore_seed, gr_show(False)]
     dummy_component = gr.Number(visible=False, value=0)
     if subseed_strength is None:
-        reuse_seed.click(fn=copy_seed, _js="(x, y) => [x, selected_gallery_index()]", show_progress=False, inputs=[generation_info, dummy_component], outputs=[seed, dummy_component])
+        reuse_seed_btn.click(fn=copy_seed, _js="(x, y) => [x, selected_gallery_index()]", show_progress=False, inputs=[generation_info, dummy_component], outputs=[seed, dummy_component])
     else:
-        reuse_seed.click(fn=copy_seed, _js="(x, y) => [x, selected_gallery_index()]", show_progress=False, inputs=[generation_info, dummy_component], outputs=[seed, dummy_component, subseed_strength])
+        reuse_seed_btn.click(fn=copy_seed, _js="(x, y) => [x, selected_gallery_index()]", show_progress=False, inputs=[generation_info, dummy_component], outputs=[seed, dummy_component, subseed_strength])
 
 
 def update_token_counter(text):
diff --git a/modules/ui_components.py b/modules/ui_components.py
index e2efc94c5..0b4ad86ed 100644
--- a/modules/ui_components.py
+++ b/modules/ui_components.py
@@ -9,9 +9,7 @@ class FormComponent:
 gr.Dropdown.get_expected_parent = FormComponent.get_expected_parent
 
 
-class ToolButton(FormComponent, gr.Button):
-    """Small button with single emoji as text, fits inside gradio forms"""
-
+class ToolButton(FormComponent, gr.Button): # small button with single emoji as text
     def __init__(self, *args, **kwargs):
         classes = kwargs.pop("elem_classes", [])
         super().__init__(*args, elem_classes=["tool", *classes], **kwargs)
@@ -19,66 +17,49 @@ class ToolButton(FormComponent, gr.Button):
     def get_block_name(self):
         return "button"
 
+### unused components below for compatibility with extensions ###
 
 class FormRow(FormComponent, gr.Row): # unused
-    """Same as gr.Row but fits inside gradio forms"""
-
     def get_block_name(self):
         return "row"
 
 
 class FormColumn(FormComponent, gr.Column): # unused
-    """Same as gr.Column but fits inside gradio forms"""
-
     def get_block_name(self):
         return "column"
 
 
 class FormGroup(FormComponent, gr.Group): # unused
-    """Same as gr.Row but fits inside gradio forms"""
-
     def get_block_name(self):
         return "group"
 
 
 class FormHTML(FormComponent, gr.HTML): # unused
-    """Same as gr.HTML but fits inside gradio forms"""
-
     def get_block_name(self):
         return "html"
 
 
 class FormColorPicker(FormComponent, gr.ColorPicker): # unused
-    """Same as gr.ColorPicker but fits inside gradio forms"""
-
     def get_block_name(self):
         return "colorpicker"
 
 
 class DropdownMulti(FormComponent, gr.Dropdown): # unused
-    """Same as gr.Dropdown but always multiselect"""
     def __init__(self, **kwargs):
         super().__init__(multiselect=True, **kwargs)
-
     def get_block_name(self):
         return "dropdown"
 
 
 class DropdownEditable(FormComponent, gr.Dropdown): # unused
-    """Same as gr.Dropdown but allows editing value"""
     def __init__(self, **kwargs):
         super().__init__(allow_custom_value=True, **kwargs)
-
     def get_block_name(self):
         return "dropdown"
 
 
 class InputAccordion(gr.Checkbox): # unused
-    """A gr.Accordion that can be used as an input - returns True if open, False if closed.
-    Actaully just a hidden checkbox, but creates an accordion that follows and is followed by the state of the checkbox.
-    """
     global_index = 0
-
     def __init__(self, value, **kwargs):
         self.accordion_id = kwargs.get('elem_id')
         if self.accordion_id is None:
@@ -97,15 +78,6 @@ class InputAccordion(gr.Checkbox): # unused
         self.accordion = gr.Accordion(**kwargs_accordion)
 
     def extra(self):
-        """Allows you to put something into the label of the accordion.
-        Use it like this:
-        ```
-        with InputAccordion(False, label="Accordion") as acc:
-            with acc.extra():
-                FormHTML(value="hello", min_width=0)
-            ...
-        ```
-        """
         return gr.Column(elem_id=self.accordion_id + '-extra', elem_classes='input-accordion-extra', min_width=0)
 
     def __enter__(self):
@@ -120,11 +92,8 @@ class InputAccordion(gr.Checkbox): # unused
 
 
 class ResizeHandleRow(gr.Row): # unusued
-    """Same as gr.Row but fits inside gradio forms"""
-
     def __init__(self, **kwargs):
         super().__init__(**kwargs)
         self.elem_classes.append("resize-handle-row")
-
     def get_block_name(self):
         return "row"
diff --git a/modules/ui_control.py b/modules/ui_control.py
index cfcd191ed..deb812779 100644
--- a/modules/ui_control.py
+++ b/modules/ui_control.py
@@ -157,7 +157,9 @@ def create_ui(_blocks: gr.Blocks=None):
 
                 batch_count, batch_size = ui_sections.create_batch_inputs('control', accordion=True)
 
-                seed, _reuse_seed, subseed, _reuse_subseed, subseed_strength, seed_resize_from_h, seed_resize_from_w = ui_sections.create_seed_inputs('control', reuse_visible=False)
+                seed, reuse_seed, subseed, reuse_subseed, subseed_strength, seed_resize_from_h, seed_resize_from_w = ui_sections.create_seed_inputs('control')
+                ui_common.reuse_seed(seed, reuse_seed, subseed=False)
+                ui_common.reuse_seed(subseed, reuse_subseed, subseed=True)
 
                 mask_controls = masking.create_segment_ui()
 
@@ -244,17 +246,17 @@ def create_ui(_blocks: gr.Blocks=None):
                                     enabled_cb = gr.Checkbox(enabled, label='Active', container=False, show_label=True, elem_id=f'control_unit-{i}-enabled')
                                     process_id = gr.Dropdown(label="Processor", choices=processors.list_models(), value='None', elem_id=f'control_unit-{i}-process_name')
                                     model_id = gr.Dropdown(label="ControlNet", choices=controlnet.list_models(), value='None', elem_id=f'control_unit-{i}-model_name')
-                                    ui_common.create_refresh_button(model_id, controlnet.list_models, lambda: {"choices": controlnet.list_models(refresh=True)}, f'refresh_controlnet_models_{i}')
+                                    ui_common.create_refresh_button(model_id, controlnet.list_models, lambda: {"choices": controlnet.list_models(refresh=True)}, f'controlnet_models_{i}_refresh')
                                     control_mode = gr.Dropdown(label="CN Mode", choices=['default'], value='default', visible=False, elem_id=f'control_unit-{i}-mode')
                                     model_strength = gr.Slider(label="CN Strength", minimum=0.01, maximum=2.0, step=0.01, value=1.0, elem_id=f'control_unit-{i}-strength')
                                     control_start = gr.Slider(label="CN Start", minimum=0.0, maximum=1.0, step=0.05, value=0, elem_id=f'control_unit-{i}-start')
                                     control_end = gr.Slider(label="CN End", minimum=0.0, maximum=1.0, step=0.05, value=1.0, elem_id=f'control_unit-{i}-end')
                                     control_tile = gr.Dropdown(label="CN Tiles", choices=[x.strip() for x in shared.opts.control_tiles.split(',') if 'x' in x], value='1x1', visible=False, elem_id=f'control_unit-{i}-tile')
-                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset)
-                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'])
-                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse)
-                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview)
-                                    image_preview = gr.Image(label="Input", type="pil", height=128, width=128, visible=False, interactive=True, show_label=False, show_download_button=False, container=False, elem_id=f'control_unit-{i}-override')
+                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset, elem_id=f'controlnet_unit-{i}-reset')
+                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'], elem_id=f'controlnet_unit-{i}-upload')
+                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse, elem_id=f'controlnet_unit-{i}-reuse')
+                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview, elem_id=f'controlnet_unit-{i}-preview')
+                                    image_preview = gr.Image(label="Input", type="pil", height=128, width=128, visible=False, interactive=True, show_label=False, show_download_button=False, container=False, elem_id=f'controlnet_unit-{i}-override')
                             controlnet_ui_units.append(unit_ui)
                             units.append(unit.Unit(
                                 unit_type = 'controlnet',
@@ -297,13 +299,13 @@ def create_ui(_blocks: gr.Blocks=None):
                                     enabled_cb = gr.Checkbox(enabled, label='Active', container=False, show_label=True, elem_id=f'control_unit-{i}-enabled')
                                     process_id = gr.Dropdown(label="Processor", choices=processors.list_models(), value='None', elem_id=f'control_unit-{i}-process_name')
                                     model_id = gr.Dropdown(label="Adapter", choices=t2iadapter.list_models(), value='None', elem_id=f'control_unit-{i}-model_name')
-                                    ui_common.create_refresh_button(model_id, t2iadapter.list_models, lambda: {"choices": t2iadapter.list_models(refresh=True)}, f'refresh_adapter_models_{i}')
+                                    ui_common.create_refresh_button(model_id, t2iadapter.list_models, lambda: {"choices": t2iadapter.list_models(refresh=True)}, f'adapter_models_{i}_refresh')
                                     model_strength = gr.Slider(label="T2I Strength", minimum=0.01, maximum=1.0, step=0.01, value=1.0, elem_id=f'control_unit-{i}-strength')
-                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset)
-                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'])
-                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse)
-                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview)
-                                    image_preview = gr.Image(label="Input", show_label=False, type="pil", interactive=False, height=128, width=128, visible=False, elem_id=f'control_unit-{i}-override')
+                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset, elem_id=f'adapter_unit-{i}-reset')
+                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'], elem_id=f'adapter_unit-{i}-upload')
+                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse, elem_id=f'adapter_unit-{i}-reuse')
+                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview, elem_id=f'adapter_unit-{i}-preview')
+                                    image_preview = gr.Image(label="Input", show_label=False, type="pil", interactive=False, height=128, width=128, visible=False, elem_id=f'adapter_unit-{i}-override')
                             adapter_ui_units.append(unit_ui)
                             units.append(unit.Unit(
                                 unit_type = 't2i adapter',
@@ -342,15 +344,15 @@ def create_ui(_blocks: gr.Blocks=None):
                                     enabled_cb = gr.Checkbox(enabled, label='Active', container=False, show_label=True, elem_id=f'control_unit-{i}-enabled')
                                     process_id = gr.Dropdown(label="Processor", choices=processors.list_models(), value='None', elem_id=f'control_unit-{i}-process_name')
                                     model_id = gr.Dropdown(label="ControlNet-XS", choices=xs.list_models(), value='None', elem_id=f'control_unit-{i}-model_name')
-                                    ui_common.create_refresh_button(model_id, xs.list_models, lambda: {"choices": xs.list_models(refresh=True)}, f'refresh_xs_models_{i}')
+                                    ui_common.create_refresh_button(model_id, xs.list_models, lambda: {"choices": xs.list_models(refresh=True)}, f'xs_models_{i}_refresh')
                                     model_strength = gr.Slider(label="CN Strength", minimum=0.01, maximum=1.0, step=0.01, value=1.0, elem_id=f'control_unit-{i}-strength')
                                     control_start = gr.Slider(label="Start", minimum=0.0, maximum=1.0, step=0.05, value=0, elem_id=f'control_unit-{i}-start')
                                     control_end = gr.Slider(label="End", minimum=0.0, maximum=1.0, step=0.05, value=1.0, elem_id=f'control_unit-{i}-end')
-                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset)
-                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'])
-                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse)
-                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview)
-                                    image_preview = gr.Image(label="Input", show_label=False, type="pil", interactive=False, height=128, width=128, visible=False, elem_id=f'control_unit-{i}-override')
+                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset, elem_id=f'controlnetxs_unit-{i}-reset')
+                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'], elem_id=f'controlnetxs_unit-{i}-upload')
+                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse, elem_id=f'controlnetxs_unit-{i}-reuse')
+                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview, elem_id=f'controlnetxs_unit-{i}-preview')
+                                    image_preview = gr.Image(label="Input", show_label=False, type="pil", interactive=False, height=128, width=128, visible=False, elem_id=f'controlnetxs_unit-{i}-override')
                             controlnetxs_ui_units.append(unit_ui)
                             units.append(unit.Unit(
                                 unit_type = 'xs',
@@ -390,13 +392,13 @@ def create_ui(_blocks: gr.Blocks=None):
                                     enabled_cb = gr.Checkbox(enabled, label='Active', container=False, show_label=True, elem_id=f'control_unit-{i}-enabled')
                                     process_id = gr.Dropdown(label="Processor", choices=processors.list_models(), value='None', elem_id=f'control_unit-{i}-process_name')
                                     model_id = gr.Dropdown(label="Model", choices=lite.list_models(), value='None', elem_id=f'control_unit-{i}-model_name')
-                                    ui_common.create_refresh_button(model_id, lite.list_models, lambda: {"choices": lite.list_models(refresh=True)}, f'refresh_lite_models_{i}')
+                                    ui_common.create_refresh_button(model_id, lite.list_models, lambda: {"choices": lite.list_models(refresh=True)}, f'lite_models_{i}_refresh')
                                     model_strength = gr.Slider(label="CN Strength", minimum=0.01, maximum=1.0, step=0.01, value=1.0, elem_id=f'control_unit-{i}-strength')
-                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset)
-                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'])
-                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse)
-                                    image_preview = gr.Image(label="Input", show_label=False, type="pil", interactive=False, height=128, width=128, visible=False, elem_id=f'control_unit-{i}-override')
-                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview)
+                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset, elem_id=f'lite_unit-{i}-reset')
+                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'], elem_id=f'lite_unit-{i}-upload')
+                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse, elem_id=f'lite_unit-{i}-reuse')
+                                    image_preview = gr.Image(label="Input", show_label=False, type="pil", interactive=False, height=128, width=128, visible=False, elem_id=f'lite_unit-{i}-override')
+                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview, elem_id=f'lite_unit-{i}-preview')
                             lite_ui_units.append(unit_ui)
                             units.append(unit.Unit(
                                 unit_type = 'lite',
@@ -436,11 +438,11 @@ def create_ui(_blocks: gr.Blocks=None):
                                     enabled_cb = gr.Checkbox(enabled, label='Active', container=False, show_label=True, elem_id=f'control_unit-{i}-enabled')
                                     model_id = gr.Dropdown(label="Reference", choices=reference.list_models(), value='Reference', visible=False, elem_id=f'control_unit-{i}-model_name')
                                     model_strength = gr.Slider(label="CN Strength", minimum=0.01, maximum=1.0, step=0.01, value=1.0, visible=False, elem_id=f'control_unit-{i}-strength')
-                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset)
-                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'])
-                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse)
-                                    image_preview = gr.Image(label="Input", show_label=False, type="pil", interactive=False, height=128, width=128, visible=False, elem_id=f'control_unit-{i}-override')
-                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview)
+                                    reset_btn = ui_components.ToolButton(value=ui_symbols.reset, elem_id=f'reference_unit-{i}-reset')
+                                    image_upload = gr.UploadButton(label=ui_symbols.upload, file_types=['image'], elem_classes=['form', 'gradio-button', 'tool'], elem_id=f'reference_unit-{i}-upload')
+                                    image_reuse= ui_components.ToolButton(value=ui_symbols.reuse, elem_id=f'reference_unit-{i}-reuse')
+                                    image_preview = gr.Image(label="Input", show_label=False, type="pil", interactive=False, height=128, width=128, visible=False, elem_id=f'reference_unit-{i}-override')
+                                    btn_preview= ui_components.ToolButton(value=ui_symbols.preview, elem_id=f'reference_unit-{i}-preview')
                             units.append(unit.Unit(
                                 unit_type = 'reference',
                                 index = i,
diff --git a/modules/ui_docs.py b/modules/ui_docs.py
index 1b64f946c..d8b4e3dce 100644
--- a/modules/ui_docs.py
+++ b/modules/ui_docs.py
@@ -248,7 +248,7 @@ def create_ui_logs():
 def create_ui_github():
     with gr.Row():
         github_search = gr.Textbox(label="Search GitHub Wiki Pages", elem_id="github_search", elem_classes="docs-search")
-        github_search_btn = ui_components.ToolButton(value=ui_symbols.search, elem_id="github_search_btn")
+        github_search_btn = ui_components.ToolButton(value=ui_symbols.search, elem_id="github_btn_search")
     with gr.Row():
         github_result = gr.HTML(elem_id="github_result", value='', elem_classes="github-result")
     with gr.Row():
@@ -262,7 +262,7 @@ def create_ui_github():
 def create_ui_docs():
     with gr.Row():
         docs_search = gr.Textbox(label="Search Docs", elem_id="github_search", elem_classes="docs-search")
-        docs_search_btn = ui_components.ToolButton(value=ui_symbols.search, elem_id="github_search_btn")
+        docs_search_btn = ui_components.ToolButton(value=ui_symbols.search, elem_id="docs_btn_search")
     with gr.Row():
         docs_result = gr.HTML(elem_id="docs_result", value='', elem_classes="docs-result")
     with gr.Row():
diff --git a/modules/ui_extra_networks.py b/modules/ui_extra_networks.py
index 22d45a15d..54d9183c9 100644
--- a/modules/ui_extra_networks.py
+++ b/modules/ui_extra_networks.py
@@ -24,7 +24,7 @@ extra_pages = shared.extra_networks
 debug = shared.log.trace if os.environ.get('SD_EN_DEBUG', None) is not None else lambda *args, **kwargs: None
 debug('Trace: EN')
 card_full = '''
-    <div class='card' onclick={card_click} title='{name}' data-page='{page}' data-name='{name}' data-filename='{filename}' data-short='{short}' data-tags='{tags}' data-mtime='{mtime}' data-size='{size}' data-search='{search}' style='--data-color: {color}'>
+    <div class='card' onclick={card_click} title='{name}' data-page='{page}' data-name='{name}' data-filename='{filename}' data-short='{short}' data-tags='{tags}' data-mtime='{mtime}' data-size='{size}' data-search='{search}' data-version='{version}' style='--data-color: {color}'>
         <div class='overlay'>
             <div class='name {reference}'>{title}</div>
         </div>
@@ -56,7 +56,9 @@ def init_api():
         global allowed_dirs # pylint: disable=global-statement
         if len(allowed_dirs) == 0:
             allowed_dirs = shared.demo.allowed_paths
-        if not os.path.exists(filename):
+        if filename is None or len(filename) == 0:
+            return JSONResponse({ "error": "no filename" }, status_code=400)
+        if not os.path.exists(filename) or not os.path.isfile(filename):
             return JSONResponse({ "error": f"file {filename}: not found" }, status_code=404)
         if filename.startswith('html/') or filename.startswith('models/'):
             return FileResponse(filename, headers={"Accept-Ranges": "bytes"})
@@ -149,23 +151,30 @@ class ExtraNetworksPage:
         return text.replace('~tabname', tabname)
 
     def create_xyz_grid(self):
-        """
-        xyz_grid = [x for x in scripts.scripts_data if x.script_class.__module__ == "xyz_grid.py"][0].module
+        pass
 
-        def add_prompt(p, opt, x):
-            for item in [x for x in self.items if x["name"] == opt]:
-                try:
-                    p.prompt = f'{p.prompt} {eval(item["prompt"])}' # pylint: disable=eval-used
-                except Exception as e:
-                    shared.log.error(f'Cannot evaluate extra network prompt: {item["prompt"]} {e}')
-
-        if not any(self.title in x.label for x in xyz_grid.axis_options):
-            if self.title == 'Model':
-                return
-            opt = xyz_grid.AxisOption(f"[Network] {self.title}", str, add_prompt, choices=lambda: [x["name"] for x in self.items])
-            if opt not in xyz_grid.axis_options:
-                xyz_grid.axis_options.append(opt)
-        """
+    def find_version(self, item, info):
+        all_versions = info.get('modelVersions', [])
+        if len(all_versions) == 0:
+            return {}
+        try:
+            if item is None:
+                return all_versions[0]
+            elif hasattr(item, 'hash') and item.hash is not None:
+                current_hash = item.hash[:8].upper()
+            elif hasattr(item, 'shorthash') and item.shorthash is not None:
+                current_hash = item.shorthash[:8].upper()
+            elif hasattr(item, 'sha256') and item.sha256 is not None:
+                current_hash = item.sha256[:8].upper()
+            else:
+                return all_versions[0]
+            for v in info.get('modelVersions', []):
+                for f in v.get('files', []):
+                    if any(h.startswith(current_hash) for h in f.get('hashes', {}).values()):
+                        return v
+        except Exception as e:
+            errors.display(e, 'Network version')
+        return all_versions[0]
 
     def link_preview(self, filename):
         quoted_filename = urllib.parse.quote(filename.replace('\\', '/'))
@@ -225,7 +234,6 @@ class ExtraNetworksPage:
         debug(f'EN create-items: page={self.name} items={len(self.items)} time={t1-t0:.2f}')
         self.list_time += t1-t0
 
-
     def create_page(self, tabname, skip = False):
         debug(f'EN create-page: {self.name}')
         if self.page_time > refresh_time and len(self.html) > 0: # cached page
@@ -276,9 +284,19 @@ class ExtraNetworksPage:
             if len(subdir) == 0:
                 continue
             style = 'color: var(--color-accent)' if subdir in ['All', 'Local', 'Diffusers', 'Reference'] else ''
-            subdirs_html += f'<button class="lg secondary gradio-button custom-button" onclick="extraNetworksSearchButton(event)" style="{style}">{html.escape(subdir)}</button><br>'
+            if subdir in ['All', 'Local', 'Diffusers', 'Reference']:
+                style = 'network-reference'
+            else:
+                style = 'network-folder'
+            subdirs_html += f'<button class="lg secondary gradio-button custom-button {style}" onclick="extraNetworksSearchButton(event)">{html.escape(subdir)}</button><br>'
         self.html = ''
         self.create_items(tabname)
+        versions = sorted({item.get("version", "") for item in self.items if item.get("version")})
+        if 'ref' in versions:
+            versions.remove('ref')
+        versions_html = ''
+        for ver in versions:
+            versions_html += f'<button class="lg secondary gradio-button custom-button network-model" onclick="extraNetworksFilterVersion(event)">{html.escape(ver)}</button><br>'
         self.create_xyz_grid()
         htmls = []
 
@@ -302,7 +320,7 @@ class ExtraNetworksPage:
             htmls.append(self.create_html(item, tabname))
         self.html += ''.join(htmls)
         self.page_time = time.time()
-        self.html = f"<div id='{tabname}_{self_name_id}_subdirs' class='extra-network-subdirs'>{subdirs_html}</div><div id='~tabname_{self_name_id}_cards' class='extra-network-cards'>{self.html}</div>"
+        self.html = f"""<div id='{tabname}_{self_name_id}_subdirs' class='extra-network-subdirs'>{subdirs_html}{versions_html}</div><div id='~tabname_{self_name_id}_cards' class='extra-network-cards'>{self.html}</div>"""
         shared.log.debug(f'Networks: type="{self.name}" items={len(self.items)} subfolders={len(subdirs)} tab={tabname} folders={self.allowed_directories_for_previews()} list={self.list_time:.2f} thumb={self.preview_time:.2f} desc={self.desc_time:.2f} info={self.info_time:.2f} workers={shared.max_workers}')
         if len(self.missing_thumbs) > 0:
             threading.Thread(target=self.create_thumb).start()
@@ -331,7 +349,7 @@ class ExtraNetworksPage:
                 "filename": item.get('filename', ''),
                 "short": os.path.splitext(os.path.basename(item.get('filename', '')))[0],
                 "tags": '|'.join([item.get('tags')] if isinstance(item.get('tags', {}), str) else list(item.get('tags', {}).keys())),
-                "preview": html.escape(item.get('preview', None) or self.link_preview('html/card-no-preview.png')),
+                "preview": html.escape(item.get('preview', None) or self.link_preview('html/missing.png')),
                 "width": 'var(--card-size)',
                 "height": 'var(--card-size)' if shared.opts.extra_networks_card_square else 'auto',
                 "fit": shared.opts.extra_networks_card_fit,
@@ -357,7 +375,7 @@ class ExtraNetworksPage:
 
     def find_preview_file(self, path):
         if path is None:
-            return 'html/card-no-preview.png'
+            return 'html/missing.png'
         if os.path.join('models', 'Reference') in path:
             return path
         exts = ["jpg", "jpeg", "png", "webp", "tiff", "jp2", "jxl"]
@@ -374,7 +392,7 @@ class ExtraNetworksPage:
                 if '.thumb.' not in file:
                     self.missing_thumbs.append(file)
                 return file
-        return 'html/card-no-preview.png'
+        return 'html/missing.png'
 
     def find_preview(self, filename):
         t0 = time.time()
@@ -424,7 +442,7 @@ class ExtraNetworksPage:
                     item['preview'] = self.link_preview(found)
                     debug(f'EN mapped-preview: {item["name"]}={found}')
             if item.get('preview', None) is None:
-                item['preview'] = self.link_preview('html/card-no-preview.png')
+                item['preview'] = self.link_preview('html/missing.png')
                 debug(f'EN missing-preview: {item["name"]}')
         self.preview_time += time.time() - t0
 
@@ -692,19 +710,19 @@ def create_ui(container, button_parent, tabname, skip_indexing = False):
 
     def fn_save_img(image):
         if ui.last_item is None or ui.last_item.local_preview is None:
-            return 'html/card-no-preview.png'
+            return 'html/missing.png'
         images = []
         if ui.gallery is not None:
             images = list(ui.gallery.temp_files) # gallery cannot be used as input component so looking at most recently registered temp files
         if len(images) < 1:
             shared.log.warning(f'Network no image: item="{ui.last_item.name}"')
-            return 'html/card-no-preview.png'
+            return 'html/missing.png'
         try:
             images.sort(key=lambda f: os.path.getmtime(f), reverse=True)
             image = Image.open(images[0])
         except Exception as e:
             shared.log.error(f'Network error opening image: item="{ui.last_item.name}" {e}')
-            return 'html/card-no-preview.png'
+            return 'html/missing.png'
         fn_delete_img(image)
         if image.width > 512 or image.height > 512:
             image = image.convert('RGB')
@@ -723,7 +741,7 @@ def create_ui(container, button_parent, tabname, skip_indexing = False):
             if os.path.exists(file):
                 os.remove(file)
                 shared.log.debug(f'Network delete image: item="{ui.last_item.name}" filename="{file}"')
-        return 'html/card-no-preview.png'
+        return 'html/missing.png'
 
     def fn_save_desc(desc):
         if hasattr(ui.last_item, 'type') and ui.last_item.type == 'Style':
diff --git a/modules/ui_extra_networks_checkpoints.py b/modules/ui_extra_networks_checkpoints.py
index 520492761..04a327de2 100644
--- a/modules/ui_extra_networks_checkpoints.py
+++ b/modules/ui_extra_networks_checkpoints.py
@@ -28,10 +28,11 @@ class ExtraNetworksPageCheckpoints(ui_extra_networks.ExtraNetworksPage):
             preview = v.get('preview', v['path'])
             preview_file = self.find_preview_file(os.path.join(reference_dir, preview))
             _size, mtime = modelstats.stat(preview_file)
+            name = os.path.normpath(os.path.join(reference_dir, k)).replace('\\', '/')
             yield {
                 "type": 'Model',
-                "name": os.path.join(reference_dir, k),
-                "title": os.path.join(reference_dir, k),
+                "name": name,
+                "title": name,
                 "filename": url,
                 "preview": self.find_preview(os.path.join(reference_dir, preview)),
                 "local_preview": preview_file,
@@ -42,6 +43,7 @@ class ExtraNetworksPageCheckpoints(ui_extra_networks.ExtraNetworksPage):
                 "info": {},
                 "metadata": {},
                 "description": v.get('desc', ''),
+                "version": "ref",
             }
 
     def create_item(self, name):
@@ -62,6 +64,9 @@ class ExtraNetworksPageCheckpoints(ui_extra_networks.ExtraNetworksPage):
             }
             record["info"] = self.find_info(checkpoint.filename)
             record["description"] = self.find_description(checkpoint.filename, record["info"])
+            version = self.find_version(checkpoint, record["info"])
+            record["version"] = version.get("baseModel", "") if record["info"] else ""
+
         except Exception as e:
             shared.log.debug(f'Networks error: type=model file="{name}" {e}')
         return record
diff --git a/modules/ui_extra_networks_lora.py b/modules/ui_extra_networks_lora.py
index 8ab95ce8d..d390fe914 100644
--- a/modules/ui_extra_networks_lora.py
+++ b/modules/ui_extra_networks_lora.py
@@ -17,7 +17,7 @@ class ExtraNetworksPageLora(ui_extra_networks.ExtraNetworksPage):
         lora_load.list_available_networks()
 
     @staticmethod
-    def get_tags(l, info):
+    def get_tags(l, info, version):
         tags = {}
         try:
             if l.metadata is not None:
@@ -37,26 +37,13 @@ class ExtraNetworksPageLora(ui_extra_networks.ExtraNetworksPage):
                     tag = ' '.join(words[1:]).lower()
                     tags[tag] = words[0]
 
-            def find_version():
-                found_versions = []
-                current_hash = l.hash[:8].upper()
-                all_versions = info.get('modelVersions', [])
-                for v in info.get('modelVersions', []):
-                    for f in v.get('files', []):
-                        if any(h.startswith(current_hash) for h in f.get('hashes', {}).values()):
-                            found_versions.append(v)
-                if len(found_versions) == 0:
-                    found_versions = all_versions
-                return found_versions
-
-            for v in find_version():  # trigger words from info json
-                possible_tags = v.get('trainedWords', [])
-                if isinstance(possible_tags, list):
-                    for tag_str in possible_tags:
-                        for tag in tag_str.split(','):
-                            tag = tag.strip().lower()
-                            if tag not in tags:
-                                tags[tag] = 0
+            possible_tags = version.get('trainedWords', [])
+            if isinstance(possible_tags, list):
+                for tag_str in possible_tags:
+                    for tag in tag_str.split(','):
+                        tag = tag.strip().lower()
+                        if tag not in tags:
+                            tags[tag] = 0
 
             possible_tags = info.get('tags', []) # tags from info json
             if not isinstance(possible_tags, list):
@@ -87,6 +74,7 @@ class ExtraNetworksPageLora(ui_extra_networks.ExtraNetworksPage):
             name = os.path.splitext(os.path.relpath(l.filename, shared.cmd_opts.lora_dir))[0]
             size, mtime = modelstats.stat(l.filename)
             info = self.find_info(l.filename)
+            version = self.find_version(l, info)
             item = {
                 "type": 'Lora',
                 "name": name,
@@ -97,10 +85,10 @@ class ExtraNetworksPageLora(ui_extra_networks.ExtraNetworksPage):
                 "metadata": json.dumps(l.metadata, indent=4) if l.metadata else None,
                 "mtime": mtime,
                 "size": size,
-                "version": l.sd_version,
+                "version": version.get("baseModel", l.sd_version) if info else l.sd_version,
                 "info": info,
                 "description": self.find_description(l.filename, info),
-                "tags": self.get_tags(l, info),
+                "tags": self.get_tags(l, info, version),
             }
             return item
         except Exception as e:
diff --git a/modules/ui_extra_networks_vae.py b/modules/ui_extra_networks_vae.py
index 0db733a3c..c62f7b3ab 100644
--- a/modules/ui_extra_networks_vae.py
+++ b/modules/ui_extra_networks_vae.py
@@ -16,6 +16,7 @@ class ExtraNetworksPageVAEs(ui_extra_networks.ExtraNetworksPage):
             try:
                 size, mtime = modelstats.stat(filename)
                 info = self.find_info(filename)
+                version = self.find_version(None, info)
                 record = {
                     "type": 'VAE',
                     "name": name,
@@ -31,6 +32,7 @@ class ExtraNetworksPageVAEs(ui_extra_networks.ExtraNetworksPage):
                     "size": size,
                     "info": info,
                     "description": self.find_description(filename, info),
+                    "version": version.get("baseModel", "N/A") if info else "N/A",
                 }
                 yield record
             except Exception as e:
diff --git a/modules/ui_gallery.py b/modules/ui_gallery.py
index 07d2dec26..e6cb3f297 100644
--- a/modules/ui_gallery.py
+++ b/modules/ui_gallery.py
@@ -12,20 +12,34 @@ def read_media(fn):
         shared.log.error(f'Gallery not found: file="{fn}"')
         return [[], None, '', '', f'Media not found: {fn}']
     stat_size, stat_mtime = modelstats.stat(fn)
-    if fn.lower().endswith('.mp4'):
-        frames, fps, duration, w, h, codec, _frame = video.get_video_params(fn)
+    # Treat common containers as video for preview; Gradio/HTML5 will handle codec support.
+    video_exts = ('.mp4', '.webm', '.mkv', '.avi', '.mov', '.mpg', '.mpeg', '.mjpeg')
+    if fn.lower().endswith(video_exts):
         geninfo = ''
-        log = f'''
-            <p>Video <b>{w} x {h}</b>
-            | Codec <b>{codec}</b>
-            | Frames <b>{frames:,}</b>
-            | FPS <b>{fps:.2f}</b>
-            | Duration <b>{duration:.2f}</b>
-            | Size <b>{stat_size:,}</b>
-            | Modified <b>{stat_mtime}</b></p><br>
-            '''
-        return [gr.update(visible=False, value=[]), gr.update(visible=True, value=fn), geninfo, geninfo, log]
-    else:
+        try:
+            frames, fps, duration, w, h, codec, _frame = video.get_video_params(fn)
+            log = f'''
+                <p>Video <b>{w} x {h}</b>
+                | Codec <b>{codec}</b>
+                | Frames <b>{frames:,}</b>
+                | FPS <b>{fps:.2f}</b>
+                | Duration <b>{duration:.2f}</b>
+                | Size <b>{stat_size:,}</b>
+                | Modified <b>{stat_mtime}</b></p><br>
+                '''
+        except Exception as e:  # keep preview even if probing fails
+            shared.log.warning(f'Video probe failed: file="{fn}" {e}')
+            log = f'''
+                <p>Video
+                | Size <b>{stat_size:,}</b>
+                | Modified <b>{stat_mtime}</b></p><br>
+                '''
+        return [
+            gr.update(visible=False, value=[]),          # hide image gallery preview
+            gr.update(visible=True, value=fn),           # show video player
+            geninfo, geninfo, log
+        ]
+    else:  # image
         image = Image.open(fn)
         image.already_saved_as = fn
         geninfo, _items = images.read_info_from_image(image)
diff --git a/modules/ui_img2img.py b/modules/ui_img2img.py
index 6d23a9f37..66b770c69 100644
--- a/modules/ui_img2img.py
+++ b/modules/ui_img2img.py
@@ -159,8 +159,8 @@ def create_ui():
 
             img2img_gallery, img2img_generation_info, img2img_html_info, _img2img_html_info_formatted, img2img_html_log = ui_common.create_output_panel("img2img", prompt=img2img_prompt)
 
-            ui_common.connect_reuse_seed(seed, reuse_seed, img2img_generation_info, is_subseed=False)
-            ui_common.connect_reuse_seed(subseed, reuse_subseed, img2img_generation_info, is_subseed=True, subseed_strength=subseed_strength)
+            ui_common.reuse_seed(seed, reuse_seed, subseed=False)
+            ui_common.reuse_seed(subseed, reuse_subseed, subseed=True)
 
             img2img_prompt_img.change(fn=modules.images.image_data, inputs=[img2img_prompt_img], outputs=[img2img_prompt, img2img_prompt_img])
             dummy_component1 = gr.Textbox(visible=False, value='dummy')
diff --git a/modules/ui_models.py b/modules/ui_models.py
index 71b96ba93..bfb876372 100644
--- a/modules/ui_models.py
+++ b/modules/ui_models.py
@@ -169,11 +169,11 @@ def create_ui():
                             merge_mode_docs = gr.HTML(value=getattr(merge_methods, "weighted_sum", "").__doc__.replace("\n", "<br>"))
                         with gr.Row():
                             primary_model_name = gr.Dropdown(sd_model_choices(), label="Primary model", value="None")
-                            create_refresh_button(primary_model_name, sd_models.list_models, lambda: {"choices": sd_model_choices()}, "refresh_checkpoint_A")
+                            create_refresh_button(primary_model_name, sd_models.list_models, lambda: {"choices": sd_model_choices()}, "checkpoint_A_refresh")
                             secondary_model_name = gr.Dropdown(sd_model_choices(), label="Secondary model", value="None")
-                            create_refresh_button(secondary_model_name, sd_models.list_models, lambda: {"choices": sd_model_choices()}, "refresh_checkpoint_B")
+                            create_refresh_button(secondary_model_name, sd_models.list_models, lambda: {"choices": sd_model_choices()}, "checkpoint_B_refresh")
                             tertiary_model_name = gr.Dropdown(sd_model_choices(), label="Tertiary model", value="None", visible=False)
-                            tertiary_refresh = create_refresh_button(tertiary_model_name, sd_models.list_models, lambda: {"choices": sd_model_choices()}, "refresh_checkpoint_C", visible=False)
+                            tertiary_refresh = create_refresh_button(tertiary_model_name, sd_models.list_models, lambda: {"choices": sd_model_choices()}, "checkpoint_C_refresh", visible=False)
                         with gr.Row():
                             with gr.Tabs() as tabs:
                                 with gr.TabItem(label="Simple Merge", id=0):
@@ -229,7 +229,7 @@ def create_ui():
                             bake_in_vae = gr.Dropdown(choices=["None"] + list(sd_vae.vae_dict), value="None", interactive=True, label="Replace VAE")
                             create_refresh_button(bake_in_vae, sd_vae.refresh_vae_list,
                                                   lambda: {"choices": ["None"] + list(sd_vae.vae_dict)},
-                                                  "modelmerger_refresh_bake_in_vae")
+                                                  "modelmerger_bake_in_vae_refresh")
                         with gr.Row():
                             modelmerger_merge = gr.Button(value="Merge", variant='primary')
 
@@ -403,7 +403,7 @@ def create_ui():
                     with gr.Column(scale=5):
                         with gr.Row():
                             model_name = gr.Dropdown(sd_models.checkpoint_titles(), label="Input model")
-                            create_refresh_button(model_name, sd_models.list_models, lambda: {"choices": sd_models.checkpoint_titles()}, "refresh_checkpoint_Z")
+                            create_refresh_button(model_name, sd_models.list_models, lambda: {"choices": sd_models.checkpoint_titles()}, "checkpoint_Z_refresh")
                     with gr.Column(scale=5):
                         custom_name = gr.Textbox(label="Output model", placeholder="Output model path")
                 with gr.Row():
@@ -492,7 +492,7 @@ def create_ui():
                 with gr.Row(elem_id='civitai_search_row'):
                     civit_search_text = gr.Textbox(label='', placeholder='keyword', elem_id="civit_search_text")
                     civit_search_tag = gr.Textbox(label='', placeholder='tag', elem_id="civit_search_text")
-                    civit_search_text_btn = ToolButton(value=ui_symbols.search, interactive=True)
+                    civit_search_text_btn = ToolButton(value=ui_symbols.search, interactive=True, elem_id="civit_text_search")
                 with gr.Accordion(label='Advanced', open=False, elem_id="civitai_search_options"):
                     civit_download_btn = gr.Button(value="Download model", variant='primary', elem_id="civitai_download_btn", visible=False)
                     with gr.Row():
@@ -530,7 +530,7 @@ def create_ui():
                         gr.HTML('<h2>&nbspDownload model from huggingface<br></h2>')
                     with gr.Row():
                         hf_search_text = gr.Textbox('', label='Search models', placeholder='search huggingface models')
-                        hf_search_btn = ToolButton(value=ui_symbols.search)
+                        hf_search_btn = ToolButton(value=ui_symbols.search, interactive=True, elem_id="hf_text_search")
                     with gr.Row():
                         hf_selected = gr.Textbox('', label='Select model', placeholder='select model from search results or enter model name manually')
                     with gr.Accordion(label='Advanced', open=False, elem_id="hf_search_options"):
diff --git a/modules/ui_sections.py b/modules/ui_sections.py
index 4f72783bb..2042f896d 100644
--- a/modules/ui_sections.py
+++ b/modules/ui_sections.py
@@ -87,7 +87,7 @@ def create_resolution_inputs(tab, default_width=1024, default_height=1024):
     ar_dropdown = gr.Dropdown(show_label=False, interactive=True, choices=ar_list, value=ar_list[0], elem_id=f"{tab}_ar", elem_classes=["ar-dropdown"])
     for c in [ar_dropdown, width, height]:
         c.change(fn=ar_change, inputs=[ar_dropdown, width, height], outputs=[width, height], show_progress=False)
-    res_switch_btn = ToolButton(value=ui_symbols.switch, elem_id=f"{tab}_res_switch_btn")
+    res_switch_btn = ToolButton(value=ui_symbols.switch, elem_id=f"{tab}_res_btn_swap")
     res_switch_btn.click(lambda w, h: (h, w), inputs=[width, height], outputs=[width, height], show_progress=False)
     return width, height
 
@@ -111,12 +111,12 @@ def create_seed_inputs(tab, reuse_visible=True, accordion=True, subseed_visible=
     with gr.Accordion(open=False, label="Seed", elem_id=f"{tab}_seed_group", elem_classes=["small-accordion"]) if accordion else gr.Group():
         with gr.Row(elem_id=f"{tab}_seed_row", variant="compact"):
             seed = gr.Number(label='Initial seed', value=-1, elem_id=f"{tab}_seed", container=True)
-            random_seed = ToolButton(ui_symbols.random, elem_id=f"{tab}_random_seed")
-            reuse_seed = ToolButton(ui_symbols.reuse, elem_id=f"{tab}_reuse_seed", visible=reuse_visible)
+            random_seed = ToolButton(ui_symbols.random, elem_id=f"{tab}_seed_random")
+            reuse_seed = ToolButton(ui_symbols.reuse, elem_id=f"{tab}_seed_reuse", visible=reuse_visible)
         with gr.Row(elem_id=f"{tab}_subseed_row", variant="compact", visible=subseed_visible):
             subseed = gr.Number(label='Variation', value=-1, elem_id=f"{tab}_subseed", container=True)
-            random_subseed = ToolButton(ui_symbols.random, elem_id=f"{tab}_random_subseed")
-            reuse_subseed = ToolButton(ui_symbols.reuse, elem_id=f"{tab}_reuse_subseed", visible=reuse_visible)
+            random_subseed = ToolButton(ui_symbols.random, elem_id=f"{tab}_subseed_random")
+            reuse_subseed = ToolButton(ui_symbols.reuse, elem_id=f"{tab}_subseed_reuse", visible=reuse_visible)
             subseed_strength = gr.Slider(label='Variation strength', value=0.0, minimum=0, maximum=1, step=0.01, elem_id=f"{tab}_subseed_strength", elem_classes=["subseed-strength"])
         with gr.Row(visible=seed_resize_visible):
             seed_resize_from_w = gr.Slider(minimum=0, maximum=4096, step=8, label="Resize seed from width", value=0, elem_id=f"{tab}_seed_resize_from_w")
@@ -342,7 +342,7 @@ def create_resize_inputs(tab, images, accordion=True, latent=False, non_zero=Tru
             resize_name = gr.Dropdown(label=f"Method{prefix}" if non_zero else "Resize method", elem_id=f"{tab}_resize_name", choices=available_upscalers, value=available_upscalers[0], visible=True)
             resize_context_choices = ["Add with forward", "Remove with forward", "Add with backward", "Remove with backward"]
             resize_context = gr.Dropdown(label=f"Context{prefix}", elem_id=f"{tab}_resize_context", choices=resize_context_choices, value=resize_context_choices[0], visible=False)
-            resize_refresh_btn = ui_common.create_refresh_button(resize_name, modelloader.load_upscalers, lambda: {"choices": modelloader.load_upscalers()}, 'refresh_upscalers')
+            resize_refresh_btn = ui_common.create_refresh_button(resize_name, modelloader.load_upscalers, lambda: {"choices": modelloader.load_upscalers()}, f'{tab}_upscalers_refresh')
 
             def resize_mode_change(mode):
                 if mode is None or mode == 0:
@@ -365,9 +365,9 @@ def create_resize_inputs(tab, images, accordion=True, latent=False, non_zero=Tru
                                 ar_dropdown = gr.Dropdown(show_label=False, interactive=True, choices=ar_list, value=ar_list[0], elem_id=f"{tab}_resize_ar", elem_classes=["ar-dropdown"])
                                 for c in [ar_dropdown, width, height]:
                                     c.change(fn=ar_change, inputs=[ar_dropdown, width, height], outputs=[width, height], show_progress=False)
-                                res_switch_btn = ToolButton(value=ui_symbols.switch, elem_id=f"{tab}_resize_switch_size_btn")
+                                res_switch_btn = ToolButton(value=ui_symbols.switch, elem_id=f"{tab}_resize_size_swap")
                                 res_switch_btn.click(lambda w, h: (h, w), inputs=[width, height], outputs=[width, height], show_progress=False)
-                                detect_image_size_btn = ToolButton(value=ui_symbols.detect, elem_id=f"{tab}_resize_detect_size_btn")
+                                detect_image_size_btn = ToolButton(value=ui_symbols.detect, elem_id=f"{tab}_resize_detect_size")
                                 el = tab.split('_')[0]
                                 detect_image_size_btn.click(fn=lambda w, h, _: (w or gr.update(), h or gr.update()), _js=f'currentImageResolution{el}', inputs=[dummy_component, dummy_component, dummy_component], outputs=[width, height], show_progress=False)
                     with gr.Tab(label="Scale", id=1, elem_id=f"{tab}_scale_tab_scale") as tab_scale_by:
diff --git a/modules/ui_settings.py b/modules/ui_settings.py
index f1def3d89..9b24464cc 100644
--- a/modules/ui_settings.py
+++ b/modules/ui_settings.py
@@ -77,11 +77,11 @@ def create_setting_component(key, is_quicksettings=False):
     if info.refresh is not None:
         if is_quicksettings:
             res = comp(label=info.label, value=fun(), elem_id=elem_id, **args)
-            ui_common.create_refresh_button(res, info.refresh, info.component_args, f"refresh_{key}")
+            ui_common.create_refresh_button(res, info.refresh, info.component_args, f"settings_{key}_refresh")
         else:
             with gr.Row():
                 res = comp(label=info.label, value=fun(), elem_id=elem_id, **args)
-                ui_common.create_refresh_button(res, info.refresh, info.component_args, f"refresh_{key}")
+                ui_common.create_refresh_button(res, info.refresh, info.component_args, f"settings_{key}_refresh")
     elif info.folder is not None:
         with gr.Row():
             res = comp(label=info.label, value=fun(), elem_id=elem_id, elem_classes="folder-selector", **args)
@@ -175,10 +175,10 @@ def create_ui():
     global text_settings # pylint: disable=global-statement
     text_settings = gr.Textbox(elem_id="settings_json", elem_classes=["settings_json"], value=lambda: shared.opts.dumpjson(), visible=False)
     with gr.Row(elem_id="system_row"):
-        restart_submit = gr.Button(value="Restart server", variant='primary', elem_id="restart_submit")
-        shutdown_submit = gr.Button(value="Shutdown server", variant='primary', elem_id="shutdown_submit")
         unload_sd_model = gr.Button(value='Unload model', variant='primary', elem_id="sett_unload_sd_model")
         reload_sd_model = gr.Button(value='Reload model', variant='primary', elem_id="sett_reload_sd_model")
+        restart_submit = gr.Button(value="Restart server", variant='primary', elem_id="restart_submit")
+        shutdown_submit = gr.Button(value="Shutdown server", variant='primary', elem_id="shutdown_submit")
         enable_profiling = gr.Button(value='Start profiling', variant='primary', elem_id="enable_profiling")
 
     with gr.Tabs(elem_id="system") as system_tabs:
@@ -187,7 +187,6 @@ def create_ui():
         with gr.TabItem("Settings", id="system_settings", elem_id="tab_settings"):
             with gr.Row(elem_id="settings_row"):
                 settings_submit = gr.Button(value="Apply settings", variant='primary', elem_id="settings_submit")
-                preview_theme = gr.Button(value="Preview theme", variant='primary', elem_id="settings_preview_theme")
                 defaults_submit = gr.Button(value="Restore defaults", variant='primary', elem_id="defaults_submit")
             with gr.Row():
                 _settings_search = gr.Textbox(label="Search", elem_id="settings_search")
@@ -288,7 +287,6 @@ def create_ui():
     reload_sd_model.click(fn=reload_sd_weights, inputs=[], outputs=[])
     enable_profiling.click(fn=switch_profiling, inputs=[], outputs=[enable_profiling])
     request_notifications.click(fn=lambda: None, inputs=[], outputs=[], _js='function(){}')
-    preview_theme.click(fn=None, _js='previewTheme', inputs=[], outputs=[])
     settings_submit.click(
         fn=call_queue.wrap_gradio_call(run_settings, extra_outputs=[gr.update()]),
         inputs=components,
@@ -323,7 +321,7 @@ def create_quicksettings(interfaces):
                 quicksetting_keys.append(k)
                 shared.settings_components[k] = component
             quicksetting_keys = gr.State(value=','.join(quicksetting_keys), elem_id="quicksettings_keys")
-            btn_reset = ui_components.ToolButton(value=ui_symbols.clear, visible=True, elem_id="quicksettings_reset")
+            btn_reset = ui_components.ToolButton(value=ui_symbols.clear, visible=True, elem_id="quicksettings_clear")
             btn_reset.click(fn=reset_quicksettings, inputs=[quicksetting_keys], outputs=quicksetting_components)
 
         generation_parameters_copypaste.connect_paste_params_buttons()
diff --git a/modules/ui_symbols.py b/modules/ui_symbols.py
index 975cd4e69..6a77c0e40 100644
--- a/modules/ui_symbols.py
+++ b/modules/ui_symbols.py
@@ -27,8 +27,6 @@ preview = '🖼️'
 image = '🖌️'
 resize = '⁜'
 interrogate = '♻'
-int_clip = '✎'
-int_blip = '✐'
 bullet = '⃝'
 sort_alpha_asc = '\uf15d'
 sort_alpha_dsc = '\uf15e'
diff --git a/modules/ui_txt2img.py b/modules/ui_txt2img.py
index e610305e7..7fb672562 100644
--- a/modules/ui_txt2img.py
+++ b/modules/ui_txt2img.py
@@ -45,8 +45,8 @@ def create_ui():
                     txt2img_script_inputs = modules.scripts_manager.scripts_txt2img.setup_ui(parent='txt2img', accordion=True)
 
             txt2img_gallery, txt2img_generation_info, txt2img_html_info, _txt2img_html_info_formatted, txt2img_html_log = ui_common.create_output_panel("txt2img", preview=True, prompt=txt2img_prompt)
-            ui_common.connect_reuse_seed(seed, reuse_seed, txt2img_generation_info, is_subseed=False)
-            ui_common.connect_reuse_seed(subseed, reuse_subseed, txt2img_generation_info, is_subseed=True, subseed_strength=subseed_strength)
+            ui_common.reuse_seed(seed, reuse_seed, subseed=False)
+            ui_common.reuse_seed(subseed, reuse_subseed, subseed=True)
 
             dummy_component = gr.Textbox(visible=False, value='dummy')
 
diff --git a/modules/video_models/models_def.py b/modules/video_models/models_def.py
index c95fe8442..6ee8565c5 100644
--- a/modules/video_models/models_def.py
+++ b/modules/video_models/models_def.py
@@ -152,7 +152,7 @@ models = {
         Model(name='WAN 2.2 5B I2V',
               url='https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers',
               repo='Wan-AI/Wan2.2-TI2V-5B-Diffusers',
-              repo_cls=diffusers.WanPipeline,
+              repo_cls=diffusers.WanImageToVideoPipeline,
               te_cls=transformers.T5EncoderModel,
               dit_cls=diffusers.WanTransformer3DModel),
         Model(name='WAN 2.2 A14B T2V',
@@ -164,7 +164,7 @@ models = {
         Model(name='WAN 2.2 A14B I2V',
               url='https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers',
               repo='Wan-AI/Wan2.2-T2V-A14B-Diffusers',
-              repo_cls=diffusers.WanPipeline,
+              repo_cls=diffusers.WanImageToVideoPipeline,
               te_cls=transformers.T5EncoderModel,
               dit_cls=diffusers.WanTransformer3DModel),
         Model(name='WAN 2.1 1.3B T2V',
@@ -212,35 +212,35 @@ models = {
     ],
     'SkyReels V2': [
         Model(name='None'),
-        Model(name='SkyReels-V2 T2I-DF 1.3B-540P',
+        Model(name='SkyReels-V2 T2V-DF 1.3B-540P',
               url='https://huggingface.co/Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers',
               repo='Skywork/SkyReels-V2-DF-1.3B-540P-Diffusers',
               repo_cls=diffusers.SkyReelsV2DiffusionForcingPipeline,
               repo_revision='refs/pr/1',
               te_cls=transformers.UMT5EncoderModel,
               dit_cls=diffusers.SkyReelsV2Transformer3DModel),
-        Model(name='SkyReels-V2 T2I-DF 14B-720P',
+        Model(name='SkyReels-V2 T2V-DF 14B-720P',
               url='https://huggingface.co/Skywork/SkyReels-V2-DF-14B-720P-Diffusers',
               repo='Skywork/SkyReels-V2-DF-14B-720P-Diffusers',
               repo_cls=diffusers.SkyReelsV2DiffusionForcingPipeline,
               repo_revision='refs/pr/1',
               te_cls=transformers.UMT5EncoderModel,
               dit_cls=diffusers.SkyReelsV2Transformer3DModel),
-        Model(name='SkyReels-V2 I2I-DF 14B-720P',
+        Model(name='SkyReels-V2 I2V-DF 14B-720P',
               url='https://huggingface.co/Skywork/SkyReels-V2-DF-14B-720P-Diffusers',
               repo='Skywork/SkyReels-V2-DF-14B-720P-Diffusers',
               repo_cls=diffusers.SkyReelsV2DiffusionForcingImageToVideoPipeline,
               repo_revision='refs/pr/1',
               te_cls=transformers.UMT5EncoderModel,
               dit_cls=diffusers.SkyReelsV2Transformer3DModel),
-        Model(name='SkyReels-V2 T2I 14B-720P',
+        Model(name='SkyReels-V2 T2V 14B-720P',
               url='https://huggingface.co/Skywork/SkyReels-V2-T2V-14B-720P-Diffusers',
               repo='Skywork/SkyReels-V2-T2V-14B-720P-Diffusers',
               repo_cls=diffusers.SkyReelsV2Pipeline,
               repo_revision='refs/pr/1',
               te_cls=transformers.UMT5EncoderModel,
               dit_cls=diffusers.SkyReelsV2Transformer3DModel),
-        Model(name='SkyReels-V2 I2I 14B-720P',
+        Model(name='SkyReels-V2 I2V 14B-720P',
               url='https://huggingface.co/Skywork/SkyReels-V2-I2V-14B-720P-Diffusers',
               repo='Skywork/SkyReels-V2-I2V-14B-720P-Diffusers',
               repo_cls=diffusers.SkyReelsV2ImageToVideoPipeline,
@@ -320,4 +320,18 @@ models = {
               te_cls=transformers.T5EncoderModel,
               dit_cls=diffusers.CogVideoXTransformer3DModel),
     ],
+    'nVidia Cosmos': [
+        Model(name='nvidia Cosmos Predict2 2B I2V',
+              url='https://huggingface.co/nvidia/Cosmos-Predict2-2B-Text2Image',
+              repo='nvidia/Cosmos-Predict2-2B-Video2World',
+              repo_cls=diffusers.Cosmos2VideoToWorldPipeline,
+              te_cls=transformers.T5EncoderModel,
+              dit_cls=diffusers.CosmosTransformer3DModel),
+        Model(name='nvidia Cosmos Predict2 2B I2V',
+              url='https://huggingface.co/nvidia/Cosmos-Predict2-2B-Text2Image',
+              repo='nvidia/Cosmos-Predict2-2B-Video2World',
+              repo_cls=diffusers.Cosmos2VideoToWorldPipeline,
+              te_cls=transformers.T5EncoderModel,
+              dit_cls=diffusers.CosmosTransformer3DModel),
+    ],
 }
diff --git a/modules/video_models/video_load.py b/modules/video_models/video_load.py
index 53b4d66a0..0d4ea124c 100644
--- a/modules/video_models/video_load.py
+++ b/modules/video_models/video_load.py
@@ -1,12 +1,10 @@
-import os
 import copy
 import time
-from modules import shared, errors, sd_models, sd_checkpoint, model_quant, devices, sd_hijack_te
-from modules.video_models import models_def, video_utils, video_vae, video_overrides, video_cache
+from modules import shared, errors, sd_models, sd_checkpoint, model_quant, devices, sd_hijack_te, sd_hijack_vae
+from modules.video_models import models_def, video_utils, video_overrides, video_cache
 
 
 loaded_model = None
-debug = shared.log.trace if os.environ.get('SD_VIDEO_DEBUG', None) is not None else lambda *args, **kwargs: None
 
 
 def load_model(selected: models_def.Model):
@@ -24,7 +22,7 @@ def load_model(selected: models_def.Model):
     # text encoder
     try:
         quant_args = model_quant.create_config(module='TE')
-        debug(f'Video load: module=te repo="{selected.te or selected.repo}" folder="{selected.te_folder}" cls={selected.te_cls.__name__} quant={model_quant.get_quant_type(quant_args)}')
+        shared.log.debug(f'Video load: module=te repo="{selected.te or selected.repo}" folder="{selected.te_folder}" cls={selected.te_cls.__name__} quant={model_quant.get_quant_type(quant_args)}')
         text_encoder = selected.te_cls.from_pretrained(
             pretrained_model_name_or_path=selected.te or selected.repo,
             subfolder=selected.te_folder,
@@ -41,7 +39,7 @@ def load_model(selected: models_def.Model):
     # transformer
     try:
         quant_args = model_quant.create_config(module='Model')
-        debug(f'Video load: module=transformer repo="{selected.dit or selected.repo}" folder="{selected.dit_folder}" cls={selected.dit_cls.__name__} quant={model_quant.get_quant_type(quant_args)}')
+        shared.log.debug(f'Video load: module=transformer repo="{selected.dit or selected.repo}" folder="{selected.dit_folder}" cls={selected.dit_cls.__name__} quant={model_quant.get_quant_type(quant_args)}')
         transformer = selected.dit_cls.from_pretrained(
             pretrained_model_name_or_path=selected.dit or selected.repo,
             subfolder=selected.dit_folder,
@@ -60,7 +58,7 @@ def load_model(selected: models_def.Model):
 
     # model
     try:
-        debug(f'Video load: module=pipe repo="{selected.repo}" cls={selected.repo_cls.__name__}')
+        shared.log.debug(f'Video load: module=pipe repo="{selected.repo}" cls={selected.repo_cls.__name__}')
         shared.sd_model = selected.repo_cls.from_pretrained(
             pretrained_model_name_or_path=selected.repo,
             transformer=transformer,
@@ -81,13 +79,10 @@ def load_model(selected: models_def.Model):
     shared.sd_model.sd_checkpoint_info = sd_checkpoint.CheckpointInfo(selected.repo)
     shared.sd_model.sd_model_hash = None
     sd_models.set_diffuser_options(shared.sd_model, offload=False)
+
     if selected.vae_hijack and hasattr(shared.sd_model.vae, 'decode'):
-        shared.sd_model.vae.orig_decode = shared.sd_model.vae.decode
-        shared.sd_model.vae.decode = video_vae.hijack_vae_decode
-        shared.sd_model.vae.orig_encode = shared.sd_model.vae.encode
-        shared.sd_model.vae.encode = video_vae.hijack_vae_encode
+        sd_hijack_vae.init_hijack(shared.sd_model)
     if selected.te_hijack and hasattr(shared.sd_model, 'encode_prompt'):
-        # shared.sd_model.orig_encode_prompt = shared.sd_model.encode_prompt
         sd_hijack_te.init_hijack(shared.sd_model)
     if selected.image_hijack and hasattr(shared.sd_model, 'encode_image'):
         shared.sd_model.orig_encode_image = shared.sd_model.encode_image
diff --git a/modules/video_models/video_overrides.py b/modules/video_models/video_overrides.py
index 07cf1cd23..5ce02e20a 100644
--- a/modules/video_models/video_overrides.py
+++ b/modules/video_models/video_overrides.py
@@ -46,3 +46,5 @@ def set_overrides(p: processing.StableDiffusionProcessingVideo, selected: Model)
     if 'LTX' in cls:
         p.task_args['width'] = 32 * (p.width // 32)
         p.task_args['height'] = 32 * (p.height // 32)
+    if 'SkyReelsV2DiffusionForcing' in cls:
+        p.task_args['overlap_history'] = 17
diff --git a/modules/video_models/video_run.py b/modules/video_models/video_run.py
index 8e52ca2c1..508980bad 100644
--- a/modules/video_models/video_run.py
+++ b/modules/video_models/video_run.py
@@ -69,6 +69,8 @@ def generate(*args, **kwargs):
     elif 'T2V' in model:
         if init_image is not None:
             shared.log.warning('Video: op=T2V init image not supported')
+    else:
+        shared.log.warning(f'Video: unknown model type "{model}"')
 
     # cleanup memory
     shared.sd_model = sd_models.apply_balanced_offload(shared.sd_model)
diff --git a/modules/video_models/video_ui.py b/modules/video_models/video_ui.py
index 46b48a2d6..5b103d023 100644
--- a/modules/video_models/video_ui.py
+++ b/modules/video_models/video_ui.py
@@ -92,8 +92,8 @@ def create_ui(prompt, negative, styles, overrides):
                 with gr.Row():
                     frames = gr.Slider(label='Frames', minimum=1, maximum=1024, step=1, value=17, elem_id="video_frames")
                     seed = gr.Number(label='Initial seed', value=-1, elem_id="video_seed", container=True)
-                    random_seed = ToolButton(ui_symbols.random, elem_id="video_random_seed")
-                    reuse_seed = ToolButton(ui_symbols.reuse, elem_id="video_reuse_seed")
+                    random_seed = ToolButton(ui_symbols.random, elem_id="video_seed_random")
+                    reuse_seed = ToolButton(ui_symbols.reuse, elem_id="video_seed_reuse")
             with gr.Accordion(open=False, label="Parameters", elem_id='video_parameters_accordion'):
                 steps, sampler_index = ui_sections.create_sampler_and_steps_selection(None, "video", default_steps=50)
                 with gr.Row():
diff --git a/modules/video_models/video_vae.py b/modules/video_models/video_vae.py
index c870b9bec..3b7a52b70 100644
--- a/modules/video_models/video_vae.py
+++ b/modules/video_models/video_vae.py
@@ -1,7 +1,5 @@
 import os
-import time
-import torch
-from modules import shared, sd_models, devices, timer, errors
+from modules import shared, devices
 
 
 debug = shared.log.trace if os.environ.get('SD_VIDEO_DEBUG', None) is not None else lambda *args, **kwargs: None
@@ -45,57 +43,3 @@ def vae_decode_tiny(latents):
     images = vae.decode_video(latents, parallel=False).transpose(1, 2).mul_(2).sub_(1)
     images = images.transpose(1, 2).mul_(2).sub_(1)
     return (images, None)
-
-
-def hijack_vae_decode(*args, **kwargs):
-    shared.state.begin('VAE')
-    t0 = time.time()
-    res = None
-    if vae_type == 'Tiny':
-        res = vae_decode_tiny(args[0])
-    if vae_type == 'Remote':
-        pass
-    if res is None:
-        shared.sd_model = sd_models.apply_balanced_offload(shared.sd_model, exclude=['vae'])
-        try:
-            sd_models.move_model(shared.sd_model.vae, devices.device)
-            if torch.is_tensor(args[0]):
-                latents = args[0].to(device=devices.device, dtype=shared.sd_model.vae.dtype) # upcast to vae dtype
-                res = shared.sd_model.vae.orig_decode(latents, *args[1:], **kwargs)
-                t1 = time.time()
-                shared.log.debug(f'Decode: vae={shared.sd_model.vae.__class__.__name__} slicing={getattr(shared.sd_model.vae, "use_slicing", None)} tiling={getattr(shared.sd_model.vae, "use_tiling", None)} latents={list(latents.shape)}:{latents.device}:{latents.dtype} time={t1-t0:.3f}')
-            else:
-                res = shared.sd_model.vae.orig_decode(*args, **kwargs)
-        except Exception as e:
-            shared.log.error(f'Decode: type={vae_type} {e}')
-            errors.display(e, 'vae')
-            res = None
-    t1 = time.time()
-    timer.process.add('vae', t1-t0)
-    shared.state.end()
-    return res
-
-
-def hijack_vae_encode(*args, **kwargs):
-    shared.state.begin('VAE')
-    t0 = time.time()
-    res = None
-    if res is None:
-        shared.sd_model = sd_models.apply_balanced_offload(shared.sd_model, exclude=['vae'])
-        try:
-            sd_models.move_model(shared.sd_model.vae, devices.device)
-            if torch.is_tensor(args[0]):
-                latents = args[0].to(device=devices.device, dtype=shared.sd_model.vae.dtype) # upcast to vae dtype
-                res = shared.sd_model.vae.orig_encode(latents, *args[1:], **kwargs)
-                t1 = time.time()
-                shared.log.debug(f'Encode: vae={shared.sd_model.vae.__class__.__name__} slicing={getattr(shared.sd_model.vae, "use_slicing", None)} tiling={getattr(shared.sd_model.vae, "use_tiling", None)} latents={list(latents.shape)}:{latents.device}:{latents.dtype} time={t1-t0:.3f}')
-            else:
-                res = shared.sd_model.vae.orig_encode(*args, **kwargs)
-        except Exception as e:
-            shared.log.error(f'Encode: type={vae_type} {e}')
-            errors.display(e, 'vae')
-            res = None
-    t1 = time.time()
-    timer.process.add('vae', t1-t0)
-    shared.state.end()
-    return res
diff --git a/pipelines/hdm/__init__.py b/pipelines/hdm/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/pipelines/hdm/hdm/__init__.py b/pipelines/hdm/hdm/__init__.py
new file mode 100644
index 000000000..7d1d1c490
--- /dev/null
+++ b/pipelines/hdm/hdm/__init__.py
@@ -0,0 +1,3 @@
+from diffusers.models.modeling_utils import ModelMixin
+from .modules.xut import XUDiTConditionModel
+from .modules.unet_patch import HDUNet2DConditionModel, RoPEUNet2DConditionModel
diff --git a/pipelines/hdm/hdm/data/__init__.py b/pipelines/hdm/hdm/data/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/pipelines/hdm/hdm/data/base.py b/pipelines/hdm/hdm/data/base.py
new file mode 100644
index 000000000..91d025117
--- /dev/null
+++ b/pipelines/hdm/hdm/data/base.py
@@ -0,0 +1,252 @@
+import random
+
+import numpy as np
+import torch
+import torch.utils.data as Data
+from transformers import PreTrainedTokenizer
+from tqdm import tqdm, trange
+
+
+class BaseDataset(Data.Dataset):
+    def collate(self, batch):
+        samples = torch.stack([x["sample"] for x in batch])
+        caption = [x["caption"] for x in batch]
+        tokenizer_outs = [x["tokenizer_out"] for x in batch]
+        # TODO: change to stack and reduce dim?
+        add_time_ids = [x["add_time_ids"] for x in batch]
+        tokenizer_outputs = []
+        for tokenizer_out in zip(*tokenizer_outs):
+            input_ids = torch.concat([x["input_ids"] for x in tokenizer_out])
+            attention_mask = torch.concat([x["attention_mask"] for x in tokenizer_out])
+            tokenizer_outputs.append(
+                {"input_ids": input_ids, "attention_mask": attention_mask}
+            )
+        return (
+            samples,
+            caption,
+            tokenizer_outputs,
+            {"time_ids": torch.concat(add_time_ids).float()},
+        )
+
+
+class DummyDataset(BaseDataset):
+    def __init__(
+        self,
+        # (4, 128, 128) for latent
+        sample_size: tuple[int] = (3, 1024, 1024),
+        n_samples: int = 100,
+        tokenizers: list[PreTrainedTokenizer] = [],
+        **kwargs,
+    ):
+        if not isinstance(sample_size, tuple):
+            sample_size = tuple(sample_size)
+        self.samples = [torch.randn(sample_size) for _ in range(n_samples)]
+        if isinstance(tokenizers, list):
+            self.tokenizers = tokenizers
+        else:
+            self.tokenizers = [tokenizers]
+
+    def __len__(self):
+        return len(self.samples)
+
+    def __getitem__(self, index):
+        sample = self.samples[index]
+        caption = "DUMMY TEST"
+        return {
+            "sample": sample,
+            "caption": caption,
+            "tokenizer_out": [
+                tokenizer(
+                    caption,
+                    padding="max_length",
+                    truncation=True,
+                    return_tensors="pt",
+                )
+                for tokenizer in self.tokenizers
+            ],
+            # org_h, org_w, crop_top, crop_left, target_h, target_w
+            "add_time_ids": torch.tensor([[1024, 1024, 0, 0, 1024, 1024]]),
+        }
+
+
+class CombineDataset(Data.Dataset):
+    def __init__(
+        self,
+        datasets: list[Data.Dataset],
+        latent_scale: float = 1.0,
+        latent_shift: float = 0.0,
+        tokenizers: list[PreTrainedTokenizer] = [],
+        shuffle=True,
+        arb_mode=False,
+    ):
+        self.shuffle = shuffle
+        self.datasets_ref = datasets
+        self.datasets = datasets
+        self.shard_string = sum(
+            ([chr(i).encode()] * len(dataset) for i, dataset in enumerate(datasets)),
+            [],
+        )
+        self.tokenizers = tokenizers
+        self.latent_scale = latent_scale
+        self.latent_shift = latent_shift
+
+        if shuffle:
+            random.shuffle(self.shard_string)
+        self.arb_mode = arb_mode
+
+        dataset_ids = [0] * len(datasets)
+        for i, data in tqdm(
+            enumerate(self.shard_string),
+            total=len(self.shard_string),
+            smoothing=0.01,
+            desc="Dataset Indexing...",
+        ):
+            index = dataset_ids[data[0]]
+            dataset_ids[data[0]] += 1
+            self.shard_string[i] = data + np.base_repr(index, 36).encode()
+        self.shard_string = np.array(self.shard_string)
+
+    @torch.no_grad()
+    def collate(self, batch):
+        if self.arb_mode:
+            assert len(batch) == 1
+            latents = batch[0]["latent"]
+            caption = batch[0]["caption"]
+            pos_map = batch[0]["pos_map"]
+            tokenizer_outs = batch[0]["tokenizer_out"]
+            if "aspect_ratio" in batch[0]:
+                return (
+                    latents,
+                    caption,
+                    tokenizer_outs,
+                    pos_map,
+                    {"addon_info": batch[0]["aspect_ratio"]},
+                )
+            return latents, caption, tokenizer_outs, pos_map
+        latents = torch.stack([x["latent"] for x in batch])
+        caption = [x["caption"] for x in batch]
+        pos_map = torch.stack([x["pos_map"] for x in batch])
+        tokenizer_outs = [x["tokenizer_out"] for x in batch]
+
+        tokenizer_outputs = []
+        for tokenizer_out in zip(*tokenizer_outs):
+            input_ids = torch.concat([x["input_ids"] for x in tokenizer_out])
+            attention_mask = torch.concat([x["attention_mask"] for x in tokenizer_out])
+            tokenizer_outputs.append(
+                {"input_ids": input_ids, "attention_mask": attention_mask}
+            )
+        if "aspect_ratio" in batch[0]:
+            aspect_ratio = torch.tensor([x["aspect_ratio"] for x in batch])
+            return (
+                latents,
+                caption,
+                tokenizer_outputs,
+                pos_map,
+                {"addon_info": aspect_ratio},
+            )
+        return latents, caption, tokenizer_outputs, pos_map
+
+    def __len__(self):
+        return sum(len(dataset) for dataset in self.datasets)
+
+    @torch.no_grad()
+    def __getitem__(self, index):
+        choosed = self.shard_string[index]
+        dataset = self.datasets[choosed[0]]
+        index = int(choosed[1:], 36)
+        latent, caption, pos_map, *ar = dataset[index]
+        if self.arb_mode:
+            tokenizer_out = [
+                [
+                    tokenizer(
+                        c,
+                        padding="max_length",
+                        truncation=True,
+                        return_tensors="pt",
+                    )
+                    for tokenizer in self.tokenizers
+                ]
+                for c in caption
+            ]
+            data = {
+                "latent": (latent * self.latent_scale + self.latent_shift),
+                "caption": caption,
+                "pos_map": pos_map,
+                "tokenizer_out": tokenizer_out,
+            }
+            if len(ar) > 0:
+                aspect_ratio = ar[0]
+                data["aspect_ratio"] = aspect_ratio
+            return data
+        tokenizer_out = [
+            tokenizer(
+                caption,
+                padding="max_length",
+                truncation=True,
+                return_tensors="pt",
+            )
+            for tokenizer in self.tokenizers
+        ]
+        data = {
+            "latent": (latent * self.latent_scale + self.latent_shift),
+            "caption": caption,
+            "pos_map": pos_map,
+            "tokenizer_out": tokenizer_out,
+        }
+        if len(ar) > 0:
+            aspect_ratio = ar[0]
+            data["aspect_ratio"] = aspect_ratio
+        return data
+
+
+if __name__ == "__main__":
+    from transformers import Qwen2Tokenizer
+    from .kohya import *
+
+    tokenizer = Qwen2Tokenizer.from_pretrained("Qwen/Qwen3-0.6B")
+    dataset = KohyaDataset(
+        dataset_folder="/mp34-1/danbooru2023",
+        keep_token_seperator="|||",
+        tag_seperator="$$",
+        seperator=", ",
+        group_seperator="%%",
+        tag_shuffle=True,
+        group_shuffle=True,
+        tag_dropout_rate=0.0,
+        group_dropout_rate=0.0,
+        use_cached_meta=True,
+        transform=transforms.Compose(
+            [
+                transforms.ToTensor(),
+                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True),
+            ]
+        ),
+        use_arb=True,
+        arb_config={
+            "batch_size": 32,
+            "target_res": 1024,
+            "res_step": 16,
+            "seed": 0,
+        },
+        meta_postfix="_filtered",
+    )
+    combine = CombineDataset(
+        [dataset], tokenizers=[tokenizer], shuffle=True, arb_mode=True
+    )
+
+    print(len(combine))
+
+    dataloader = Data.DataLoader(
+        combine, batch_size=1, num_workers=0, shuffle=True, collate_fn=combine.collate
+    )
+    for batch in tqdm(dataloader):
+        latent, caption, tokenizer_out, pos_map, *ar = batch
+        print(latent.size(), pos_map.size())
+        print(len(caption), caption[0])
+        print(
+            len(tokenizer_out), len(tokenizer_out[0]), tokenizer_out[0][0]["input_ids"]
+        )
+        print(len(ar))
+        if len(ar) > 0:
+            print(ar[0])
+        break
diff --git a/pipelines/hdm/hdm/data/kohya.py b/pipelines/hdm/hdm/data/kohya.py
new file mode 100644
index 000000000..3d2dd9c55
--- /dev/null
+++ b/pipelines/hdm/hdm/data/kohya.py
@@ -0,0 +1,227 @@
+import os
+import sys
+import io
+import math
+import random
+import pickle
+import tempfile
+from collections import defaultdict
+
+import torch
+import torchvision.transforms as transforms
+import torch.utils.data as Data
+import numpy as np
+import imagesize
+from tqdm import tqdm
+from PIL import Image
+
+from xut.modules.axial_rope import make_cropped_pos, make_axial_pos_no_cache
+
+
+def get_files(folder):
+    if os.path.isdir(folder):
+        return [
+            os.path.join(folder, f)
+            for f in os.listdir(folder)
+            if any(f.endswith(ext) for ext in [".jpg", ".png", ".jpeg", ".webp"])
+        ]
+    else:
+        return None
+
+
+def load_npy(path):
+    with open(path, "rb") as f:
+        raw_data = f.read()
+    if sys.platform == "win32":
+        data = np.load(io.BytesIO(raw_data))
+    else:
+        with tempfile.NamedTemporaryFile() as tmp:
+            tmp.write(raw_data)
+            tmp.flush()
+            data = np.load(tmp.name, mmap_mode="r")
+    return data
+
+
+def load_pickle(path):
+    with open(path, "rb") as f:
+        raw_data = f.read()
+    data = pickle.loads(raw_data)
+    return data
+
+
+def conver_rgb(x):
+    return x.convert("RGB")
+
+
+class KohyaDataset(Data.Dataset):
+    def __init__(
+        self,
+        size=1024,
+        dataset_folder="/mp34-1/danbooru2023",
+        transform=None,
+        keep_token_seperator="|||",
+        tag_seperator="$$",
+        seperator=", ",
+        group_seperator="%%",
+        tag_shuffle=True,
+        group_shuffle=True,
+        tag_dropout_rate=0.25,
+        group_dropout_rate=0.3,
+        use_cached_meta=True,
+        meta_postfix="_filtered",
+    ):
+        self.dataset_folder = dataset_folder
+        if (
+            os.path.isfile(os.path.join(dataset_folder, f"metadata{meta_postfix}.npy"))
+            and use_cached_meta
+        ):
+            self.files = load_npy(
+                os.path.join(dataset_folder, f"metadata{meta_postfix}.npy")
+            )
+        else:
+            print("Cached metadata not found, generating...")
+            files = []
+            for entry in os.listdir(dataset_folder):
+                if os.path.isdir(os.path.join(dataset_folder, entry)):
+                    files.extend(get_files(os.path.join(dataset_folder, entry)))
+                elif any(
+                    entry.endswith(ext) for ext in [".jpg", ".png", ".jpeg", ".webp"]
+                ):
+                    files.append(entry)
+            files = [(i, os.path.splitext(i)[0] + ".txt") for i in files]
+            self.files = np.array(files)
+            np.save(os.path.join(dataset_folder, f"metadata{meta_postfix}.npy"), files)
+            print("Cached metadata generated and saved")
+
+        self.keep_token_seperator = keep_token_seperator
+        self.tag_seperator = tag_seperator
+        self.seperator = seperator
+        self.group_seperator = group_seperator
+        self.tag_shuffle = tag_shuffle
+        self.group_shuffle = group_shuffle
+        self.tag_dropout_rate = tag_dropout_rate
+        self.group_dropout_rate = group_dropout_rate
+
+        self.size = size
+        self.transform = transform or transforms.Compose(
+            [
+                transforms.Lambda(conver_rgb),
+                transforms.Resize(
+                    size, interpolation=transforms.InterpolationMode.BICUBIC
+                ),
+                transforms.ToTensor(),
+                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True),
+            ]
+        )
+
+    def __len__(self):
+        return len(self.files)
+
+    def get_caption(self, txt_file):
+        if not os.path.isfile(txt_file):
+            return ""
+        with open(txt_file, "r", encoding="utf-8") as f:
+            caption = f.read()
+
+        if self.keep_token_seperator in caption:
+            keep_tokens, rest = caption.split(self.keep_token_seperator)
+            keep_tokens = [
+                i.strip() for i in keep_tokens.split(self.tag_seperator) if i.strip()
+            ]
+        else:
+            keep_tokens = []
+            rest = caption
+
+        groups = [i.strip() for i in rest.split(self.group_seperator) if i.strip()]
+        if self.group_shuffle:
+            random.shuffle(groups)
+
+        for group in groups:
+            tags = [
+                i.strip()
+                for i in group.split(self.tag_seperator)
+                if i.strip() and random.random() > self.tag_dropout_rate
+            ]
+            if self.tag_shuffle:
+                random.shuffle(tags)
+            if random.random() > self.group_dropout_rate:
+                keep_tokens.extend(tags)
+
+        return self.seperator.join(keep_tokens)
+
+    def get_data_from_files(self, img_file, txt_file, resize=None):
+        img_path = os.path.join(self.dataset_folder, img_file)
+        txt_path = os.path.join(self.dataset_folder, txt_file)
+        caption = self.get_caption(txt_path)
+
+        with Image.open(img_path) as img:
+            if resize:
+                img = img.resize(resize, Image.Resampling.BICUBIC)
+            img_t = self.transform(img)
+        return img_t, caption
+
+    def make_cropped_pos(self, img_t, target_h, target_w):
+        aspect_ratio = target_w / target_h
+        aspect_ratio = math.log(
+            aspect_ratio
+        )  # so we have a:b and b:a have same abs value
+        crop_h, crop_w = 0, 0
+        if target_h > target_w:
+            crop_h = torch.randint(0, target_h - target_w, (1,)).item()
+            img = img_t[:, crop_h : crop_h + target_w, :]
+        elif target_h < target_w:
+            crop_w = torch.randint(0, target_w - target_h, (1,)).item()
+            img = img_t[:, :, crop_w : crop_w + target_h]
+        else:
+            img = img_t
+
+        return img, make_cropped_pos(crop_h, crop_w, target_h, target_w)
+
+    def _getitem(self, img_file, txt_file):
+        img_t, caption = self.get_data_from_files(img_file, txt_file)
+        target_h, target_w = img_t.shape[1:3]
+        aspect_ratio = target_w / target_h
+        img, pos_map = self.make_cropped_pos(img_t, target_h, target_w)
+
+        return img, caption, pos_map, aspect_ratio
+
+    def __getitem__(self, index):
+        img_file, txt_file = self.files[index]
+        return self._getitem(img_file, txt_file)
+
+
+if __name__ == "__main__":
+    import random
+
+    dataset = KohyaDataset(
+        dataset_folder="/mp34-1/danbooru2023",
+        keep_token_seperator="|||",
+        tag_seperator="$$",
+        seperator=", ",
+        group_seperator="%%",
+        tag_shuffle=True,
+        group_shuffle=True,
+        tag_dropout_rate=0.0,
+        group_dropout_rate=0.0,
+        use_cached_meta=True,
+        transform=transforms.Compose(
+            [
+                transforms.ToTensor(),
+                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5), inplace=True),
+            ]
+        ),
+        use_arb=True,
+        arb_config={
+            "batch_size": 32,
+            "target_res": 1024,
+            "res_step": 16,
+            "seed": 0,
+        },
+        meta_postfix="_filtered",
+    )
+
+    print(len(dataset.batches))
+    k, values = dataset.batches[0]
+    print(k)
+    for v in values:
+        print(v)
diff --git a/pipelines/hdm/hdm/loader.py b/pipelines/hdm/hdm/loader.py
new file mode 100644
index 000000000..d285556a3
--- /dev/null
+++ b/pipelines/hdm/hdm/loader.py
@@ -0,0 +1,124 @@
+import torch
+from diffusers import (
+    EulerDiscreteScheduler,
+    UNet2DConditionModel,
+    AutoencoderKL,
+)
+
+from .modules.text_encoders import BaseTextEncoder, SimpleTextEncoder
+from .trainer import DMTrainer, FlowTrainer
+from .utils import instantiate
+
+
+def model_loader(
+    unet: UNet2DConditionModel | None = None,
+    unet_class=UNet2DConditionModel,
+    unet_config=None,
+    te: BaseTextEncoder | None = None,
+    te_class=SimpleTextEncoder,
+    te_config={
+        "te_name": "apple/DFN5B-CLIP-ViT-H-14-378",
+        "device": "cpu",
+        "dtype": torch.float32,
+        "zero_for_padding": True,
+    },
+    te_name="",
+    tokenizers: list[dict] | None = None,
+    vae: AutoencoderKL | None = None,
+    vae_class=AutoencoderKL,
+    vae_config=None,
+    vae_name="",
+    scheduler: EulerDiscreteScheduler | None = None,
+    scheduler_class=EulerDiscreteScheduler,
+    scheduler_config=None,
+    scheduler_name=None,
+    type=None,
+):
+    if unet is None:
+        unet = instantiate(unet_class)(**unet_config)
+    else:
+        unet = instantiate(unet)
+    if te is None:
+        if te_name is not None and te_name != "":
+            te = instantiate(te_class).from_pretrained(te_name)
+        else:
+            te = instantiate(te_class)(**te_config)
+    else:
+        te = instantiate(te)
+    if vae is not None:
+        vae = instantiate(vae)
+    elif vae_class is not None:
+        if vae_name is not None and vae_name != "":
+            vae = instantiate(vae_class).from_pretrained(vae_name)
+        elif vae_config is not None:
+            vae = instantiate(vae_class)(**vae_config)
+    if scheduler is None:
+        if scheduler_name is not None and scheduler_name != "":
+            scheduler = instantiate(scheduler_class).from_pretrained(scheduler_name)
+        elif scheduler_class is not None and scheduler_config is not None:
+            scheduler = instantiate(scheduler_class)(**scheduler_config)
+        else:
+            scheduler = None
+    else:
+        scheduler = instantiate(scheduler)
+
+    if hasattr(te, "tokenizers"):
+        tokenizers = te.tokenizers
+    elif hasattr(te, "tokenizer") and te.tokenizer is not None:
+        tokenizers = [te.tokenizer]
+    elif isinstance(tokenizers, str) and tokenizers != "":
+        tokenizers = [instantiate(tokenizers)]
+    elif isinstance(tokenizers, list):
+        tokenizers = [instantiate(tokenizer) for tokenizer in tokenizers]
+    else:
+        tokenizers = None
+
+    return unet, te, tokenizers, vae, scheduler
+
+
+def load_trainer(conf: dict, unet=None, te=None, vae=None, scheduler=None, type=None):
+    conf = dict(**conf)
+    if unet is not None:
+        conf["unet"] = unet
+    if te is not None:
+        conf["te"] = te
+    if vae is not None:
+        conf["vae"] = vae
+    if scheduler is not None:
+        conf["scheduler"] = scheduler
+    type = type or conf.pop("type", "dm")
+    if type == "dm":
+        trainer = DMTrainer(**conf)
+    elif type == "flow":
+        conf.pop("scheduler")
+        trainer = FlowTrainer(**conf)
+    else:
+        raise NotImplementedError
+    return trainer
+
+
+def load_model(conf: dict):
+    """
+    return unet(dit)/te/vae/scheduler
+    """
+    if "model" in conf:
+        return model_loader(**conf["model"])
+    return model_loader(**conf)
+
+
+def load_dataset(conf: dict):
+    dataset = instantiate(conf)
+    return dataset
+
+
+def load_all(conf: dict):
+    dataset_conf = conf.pop("dataset")
+    dataset = load_dataset(dataset_conf)
+    model_conf = conf.pop("model")
+    unet, te, tokenizers, vae, scheduler = load_model(model_conf)
+    trainer = load_trainer(
+        conf.pop("trainer"), unet=unet, te=te, vae=vae, scheduler=scheduler
+    )
+    # TODO: there might be a better way to handle this
+    dataset.tokenizers = tokenizers
+    return dataset, trainer, (unet, te, tokenizers, vae, scheduler)
diff --git a/pipelines/hdm/hdm/modules/base.py b/pipelines/hdm/hdm/modules/base.py
new file mode 100644
index 000000000..51fdab237
--- /dev/null
+++ b/pipelines/hdm/hdm/modules/base.py
@@ -0,0 +1,417 @@
+from typing import Any, Dict, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+
+from diffusers import UNet2DConditionModel
+from diffusers.configuration_utils import ConfigMixin, register_to_config
+from diffusers.models.modeling_utils import ModelMixin
+from diffusers.models.unets.unet_2d_condition import UNet2DConditionOutput
+
+
+class BasicUNet(ModelMixin, ConfigMixin):
+    def enable_gradient_checkpointing(self):
+        raise NotImplementedError
+
+    def disable_gradient_checkpointing(self):
+        raise NotImplementedError
+
+    def forward(
+        self,
+        sample: torch.Tensor,
+        timestep: Union[torch.Tensor, float, int],
+        encoder_hidden_states: torch.Tensor,
+        class_labels: Optional[torch.Tensor] = None,
+        timestep_cond: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        cross_attention_kwargs: Optional[Dict[str, Any]] = None,
+        added_cond_kwargs: Optional[Dict[str, torch.Tensor]] = None,
+        down_block_additional_residuals: Optional[Tuple[torch.Tensor]] = None,
+        mid_block_additional_residual: Optional[torch.Tensor] = None,
+        down_intrablock_additional_residuals: Optional[Tuple[torch.Tensor]] = None,
+        encoder_attention_mask: Optional[torch.Tensor] = None,
+        return_dict: bool = True,
+    ) -> Union[UNet2DConditionOutput, Tuple]:
+        raise NotImplementedError
+
+
+class UNetWithPos(UNet2DConditionModel):
+    @register_to_config
+    def __init__(
+        self,
+        sample_size: Optional[Union[int, Tuple[int, int]]] = None,
+        in_channels: int = 4,
+        out_channels: int = 4,
+        center_input_sample: bool = False,
+        flip_sin_to_cos: bool = True,
+        freq_shift: int = 0,
+        down_block_types: Tuple[str] = (
+            "CrossAttnDownBlock2D",
+            "CrossAttnDownBlock2D",
+            "CrossAttnDownBlock2D",
+            "DownBlock2D",
+        ),
+        mid_block_type: Optional[str] = "UNetMidBlock2DCrossAttn",
+        up_block_types: Tuple[str] = (
+            "UpBlock2D",
+            "CrossAttnUpBlock2D",
+            "CrossAttnUpBlock2D",
+            "CrossAttnUpBlock2D",
+        ),
+        only_cross_attention: Union[bool, Tuple[bool]] = False,
+        block_out_channels: Tuple[int] = (320, 640, 1280, 1280),
+        layers_per_block: Union[int, Tuple[int]] = 2,
+        downsample_padding: int = 1,
+        mid_block_scale_factor: float = 1,
+        dropout: float = 0.0,
+        act_fn: str = "silu",
+        norm_num_groups: Optional[int] = 32,
+        norm_eps: float = 1e-5,
+        cross_attention_dim: Union[int, Tuple[int]] = 1280,
+        transformer_layers_per_block: Union[int, Tuple[int], Tuple[Tuple]] = 1,
+        reverse_transformer_layers_per_block: Optional[Tuple[Tuple[int]]] = None,
+        encoder_hid_dim: Optional[int] = None,
+        encoder_hid_dim_type: Optional[str] = None,
+        attention_head_dim: Union[int, Tuple[int]] = 8,
+        num_attention_heads: Optional[Union[int, Tuple[int]]] = None,
+        dual_cross_attention: bool = False,
+        use_linear_projection: bool = False,
+        class_embed_type: Optional[str] = None,
+        addition_embed_type: Optional[str] = None,
+        addition_time_embed_dim: Optional[int] = None,
+        num_class_embeds: Optional[int] = None,
+        upcast_attention: bool = False,
+        resnet_time_scale_shift: str = "default",
+        resnet_skip_time_act: bool = False,
+        resnet_out_scale_factor: float = 1.0,
+        time_embedding_type: str = "positional",
+        time_embedding_dim: Optional[int] = None,
+        time_embedding_act_fn: Optional[str] = None,
+        timestep_post_act: Optional[str] = None,
+        timestep_scale: Optional[float] = 1,
+        time_cond_proj_dim: Optional[int] = None,
+        conv_in_kernel: int = 3,
+        conv_out_kernel: int = 3,
+        projection_class_embeddings_input_dim: Optional[int] = None,
+        attention_type: str = "default",
+        class_embeddings_concat: bool = False,
+        mid_block_only_cross_attention: Optional[bool] = None,
+        cross_attention_norm: Optional[str] = None,
+        addition_embed_type_num_heads: int = 64,
+    ):
+        super().__init__(
+            sample_size=sample_size,
+            in_channels=in_channels,
+            out_channels=out_channels,
+            center_input_sample=center_input_sample,
+            flip_sin_to_cos=flip_sin_to_cos,
+            freq_shift=freq_shift,
+            down_block_types=down_block_types,
+            mid_block_type=mid_block_type,
+            up_block_types=up_block_types,
+            only_cross_attention=only_cross_attention,
+            block_out_channels=block_out_channels,
+            layers_per_block=layers_per_block,
+            downsample_padding=downsample_padding,
+            mid_block_scale_factor=mid_block_scale_factor,
+            dropout=dropout,
+            act_fn=act_fn,
+            norm_num_groups=norm_num_groups,
+            norm_eps=norm_eps,
+            cross_attention_dim=cross_attention_dim,
+            transformer_layers_per_block=transformer_layers_per_block,
+            reverse_transformer_layers_per_block=reverse_transformer_layers_per_block,
+            encoder_hid_dim=encoder_hid_dim,
+            encoder_hid_dim_type=encoder_hid_dim_type,
+            attention_head_dim=attention_head_dim,
+            num_attention_heads=num_attention_heads,
+            dual_cross_attention=dual_cross_attention,
+            use_linear_projection=use_linear_projection,
+            class_embed_type=class_embed_type,
+            addition_embed_type=addition_embed_type,
+            addition_time_embed_dim=addition_time_embed_dim,
+            num_class_embeds=num_class_embeds,
+            upcast_attention=upcast_attention,
+            resnet_time_scale_shift=resnet_time_scale_shift,
+            resnet_skip_time_act=resnet_skip_time_act,
+            resnet_out_scale_factor=resnet_out_scale_factor,
+            time_embedding_type=time_embedding_type,
+            time_embedding_dim=time_embedding_dim,
+            time_embedding_act_fn=time_embedding_act_fn,
+            timestep_post_act=timestep_post_act,
+            time_cond_proj_dim=time_cond_proj_dim,
+            conv_in_kernel=conv_in_kernel,
+            conv_out_kernel=conv_out_kernel,
+            projection_class_embeddings_input_dim=projection_class_embeddings_input_dim,
+            attention_type=attention_type,
+            class_embeddings_concat=class_embeddings_concat,
+            mid_block_only_cross_attention=mid_block_only_cross_attention
+            or False,  # default to False
+            cross_attention_norm=cross_attention_norm
+            or "default",  # default to "default"
+            addition_embed_type_num_heads=addition_embed_type_num_heads,
+        )
+        self.time_proj.scale = timestep_scale
+        self.pos_enc_conv = nn.Conv2d(2, self.conv_in.out_channels, 1, 1, 0)
+        nn.init.zeros_(self.pos_enc_conv.weight)
+        nn.init.zeros_(self.pos_enc_conv.bias)
+
+    def forward(
+        self,
+        sample: torch.Tensor,
+        timestep: Union[torch.Tensor, float, int],
+        encoder_hidden_states: torch.Tensor,
+        class_labels: Optional[torch.Tensor] = None,
+        timestep_cond: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        cross_attention_kwargs: Optional[Dict[str, Any]] = None,
+        added_cond_kwargs: Optional[Dict[str, torch.Tensor]] = None,
+        down_block_additional_residuals: Optional[Tuple[torch.Tensor]] = None,
+        mid_block_additional_residual: Optional[torch.Tensor] = None,
+        down_intrablock_additional_residuals: Optional[Tuple[torch.Tensor]] = None,
+        encoder_attention_mask: Optional[torch.Tensor] = None,
+        return_dict: bool = True,
+        pos_map: Optional[torch.Tensor] = None,
+    ) -> Union[UNet2DConditionOutput, Tuple]:
+        B, C, H, W = sample.shape
+        timestep = timestep.view(-1) if isinstance(timestep, torch.Tensor) else timestep
+        pos_map = (
+            pos_map
+            if pos_map is not None
+            else torch.zeros((B, H * W, 2), device=sample.device, dtype=sample.dtype)
+        )
+        pos_map = pos_map.view(B, H, W, 2).permute(0, 3, 1, 2)
+
+        # By default samples have to be AT least a multiple of the overall upsampling factor.
+        # The overall upsampling factor is equal to 2 ** (# num of upsampling layers).
+        # However, the upsampling interpolation output size can be forced to fit any upsampling size
+        # on the fly if necessary.
+        default_overall_up_factor = 2**self.num_upsamplers
+
+        # upsample size should be forwarded when sample is not a multiple of `default_overall_up_factor`
+        forward_upsample_size = False
+        upsample_size = None
+
+        for dim in sample.shape[-2:]:
+            if dim % default_overall_up_factor != 0:
+                # Forward upsample size to force interpolation output size.
+                forward_upsample_size = True
+                break
+
+        # ensure attention_mask is a bias, and give it a singleton query_tokens dimension
+        # expects mask of shape:
+        #   [batch, key_tokens]
+        # adds singleton query_tokens dimension:
+        #   [batch,                    1, key_tokens]
+        # this helps to broadcast it as a bias over attention scores, which will be in one of the following shapes:
+        #   [batch,  heads, query_tokens, key_tokens] (e.g. torch sdp attn)
+        #   [batch * heads, query_tokens, key_tokens] (e.g. xformers or classic attn)
+        if attention_mask is not None:
+            # assume that mask is expressed as:
+            #   (1 = keep,      0 = discard)
+            # convert mask into a bias that can be added to attention scores:
+            #       (keep = +0,     discard = -10000.0)
+            attention_mask = (1 - attention_mask.to(sample.dtype)) * -10000.0
+            attention_mask = attention_mask.unsqueeze(1)
+
+        # convert encoder_attention_mask to a bias the same way we do for attention_mask
+        if encoder_attention_mask is not None:
+            encoder_attention_mask = (
+                1 - encoder_attention_mask.to(sample.dtype)
+            ) * -10000.0
+            encoder_attention_mask = encoder_attention_mask.unsqueeze(1)
+
+        # 0. center input if necessary
+        if self.config.center_input_sample:
+            sample = 2 * sample - 1.0
+
+        # 1. time
+        t_emb = self.get_time_embed(sample=sample, timestep=timestep)
+        emb = self.time_embedding(t_emb, timestep_cond)
+
+        class_emb = self.get_class_embed(sample=sample, class_labels=class_labels)
+        if class_emb is not None:
+            if self.config.class_embeddings_concat:
+                emb = torch.cat([emb, class_emb], dim=-1)
+            else:
+                emb = emb + class_emb
+
+        aug_emb = self.get_aug_embed(
+            emb=emb,
+            encoder_hidden_states=encoder_hidden_states,
+            added_cond_kwargs=added_cond_kwargs,
+        )
+        if self.config.addition_embed_type == "image_hint":
+            aug_emb, hint = aug_emb
+            sample = torch.cat([sample, hint], dim=1)
+
+        emb = emb + aug_emb if aug_emb is not None else emb
+
+        if self.time_embed_act is not None:
+            emb = self.time_embed_act(emb)
+
+        encoder_hidden_states = self.process_encoder_hidden_states(
+            encoder_hidden_states=encoder_hidden_states,
+            added_cond_kwargs=added_cond_kwargs,
+        )
+
+        # 2. pre-process
+        sample = self.conv_in(sample)
+        pos_enc = self.pos_enc_conv(pos_map)
+        sample = sample + pos_enc
+
+        # 2.5 GLIGEN position net
+        if (
+            cross_attention_kwargs is not None
+            and cross_attention_kwargs.get("gligen", None) is not None
+        ):
+            cross_attention_kwargs = cross_attention_kwargs.copy()
+            gligen_args = cross_attention_kwargs.pop("gligen")
+            cross_attention_kwargs["gligen"] = {
+                "objs": self.position_net(**gligen_args)
+            }
+
+        # 3. down
+        # we're popping the `scale` instead of getting it because otherwise `scale` will be propagated
+        # to the internal blocks and will raise deprecation warnings. this will be confusing for our users.
+        if cross_attention_kwargs is not None:
+            cross_attention_kwargs = cross_attention_kwargs.copy()
+            lora_scale = cross_attention_kwargs.pop("scale", 1.0)
+        else:
+            lora_scale = 1.0
+
+        is_controlnet = (
+            mid_block_additional_residual is not None
+            and down_block_additional_residuals is not None
+        )
+        # using new arg down_intrablock_additional_residuals for T2I-Adapters, to distinguish from controlnets
+        is_adapter = down_intrablock_additional_residuals is not None
+        # maintain backward compatibility for legacy usage, where
+        #       T2I-Adapter and ControlNet both use down_block_additional_residuals arg
+        #       but can only use one or the other
+        if (
+            not is_adapter
+            and mid_block_additional_residual is None
+            and down_block_additional_residuals is not None
+        ):
+            down_intrablock_additional_residuals = down_block_additional_residuals
+            is_adapter = True
+
+        down_block_res_samples = (sample,)
+        for downsample_block in self.down_blocks:
+            if (
+                hasattr(downsample_block, "has_cross_attention")
+                and downsample_block.has_cross_attention
+            ):
+                # For t2i-adapter CrossAttnDownBlock2D
+                additional_residuals = {}
+                if is_adapter and len(down_intrablock_additional_residuals) > 0:
+                    additional_residuals["additional_residuals"] = (
+                        down_intrablock_additional_residuals.pop(0)
+                    )
+
+                sample, res_samples = downsample_block(
+                    hidden_states=sample,
+                    temb=emb,
+                    encoder_hidden_states=encoder_hidden_states,
+                    attention_mask=attention_mask,
+                    cross_attention_kwargs=cross_attention_kwargs,
+                    encoder_attention_mask=encoder_attention_mask,
+                    **additional_residuals,
+                )
+            else:
+                sample, res_samples = downsample_block(hidden_states=sample, temb=emb)
+                if is_adapter and len(down_intrablock_additional_residuals) > 0:
+                    sample += down_intrablock_additional_residuals.pop(0)
+
+            down_block_res_samples += res_samples
+
+        if is_controlnet:
+            new_down_block_res_samples = ()
+
+            for down_block_res_sample, down_block_additional_residual in zip(
+                down_block_res_samples, down_block_additional_residuals
+            ):
+                down_block_res_sample = (
+                    down_block_res_sample + down_block_additional_residual
+                )
+                new_down_block_res_samples = new_down_block_res_samples + (
+                    down_block_res_sample,
+                )
+
+            down_block_res_samples = new_down_block_res_samples
+
+        # 4. mid
+        if self.mid_block is not None:
+            if (
+                hasattr(self.mid_block, "has_cross_attention")
+                and self.mid_block.has_cross_attention
+            ):
+                sample = self.mid_block(
+                    sample,
+                    emb,
+                    encoder_hidden_states=encoder_hidden_states,
+                    attention_mask=attention_mask,
+                    cross_attention_kwargs=cross_attention_kwargs,
+                    encoder_attention_mask=encoder_attention_mask,
+                )
+            else:
+                sample = self.mid_block(sample, emb)
+
+            # To support T2I-Adapter-XL
+            if (
+                is_adapter
+                and len(down_intrablock_additional_residuals) > 0
+                and sample.shape == down_intrablock_additional_residuals[0].shape
+            ):
+                sample += down_intrablock_additional_residuals.pop(0)
+
+        if is_controlnet:
+            sample = sample + mid_block_additional_residual
+
+        # 5. up
+        for i, upsample_block in enumerate(self.up_blocks):
+            is_final_block = i == len(self.up_blocks) - 1
+
+            res_samples = down_block_res_samples[-len(upsample_block.resnets) :]
+            down_block_res_samples = down_block_res_samples[
+                : -len(upsample_block.resnets)
+            ]
+
+            # if we have not reached the final block and need to forward the
+            # upsample size, we do it here
+            if not is_final_block and forward_upsample_size:
+                upsample_size = down_block_res_samples[-1].shape[2:]
+
+            if (
+                hasattr(upsample_block, "has_cross_attention")
+                and upsample_block.has_cross_attention
+            ):
+                sample = upsample_block(
+                    hidden_states=sample,
+                    temb=emb,
+                    res_hidden_states_tuple=res_samples,
+                    encoder_hidden_states=encoder_hidden_states,
+                    cross_attention_kwargs=cross_attention_kwargs,
+                    upsample_size=upsample_size,
+                    attention_mask=attention_mask,
+                    encoder_attention_mask=encoder_attention_mask,
+                )
+            else:
+                sample = upsample_block(
+                    hidden_states=sample,
+                    temb=emb,
+                    res_hidden_states_tuple=res_samples,
+                    upsample_size=upsample_size,
+                )
+
+        # 6. post-process
+        if self.conv_norm_out:
+            sample = self.conv_norm_out(sample)
+            sample = self.conv_act(sample)
+        sample = self.conv_out(sample)
+
+        if not return_dict:
+            return (sample,)
+
+        return UNet2DConditionOutput(sample=sample)
diff --git a/pipelines/hdm/hdm/modules/rope.py b/pipelines/hdm/hdm/modules/rope.py
new file mode 100644
index 000000000..6ab8a6f20
--- /dev/null
+++ b/pipelines/hdm/hdm/modules/rope.py
@@ -0,0 +1,107 @@
+import math
+from functools import cache
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+@cache
+def bounding_box(h, w, pixel_aspect_ratio=1.0):
+    # Adjusted dimensions
+    w_adj = w
+    h_adj = h * pixel_aspect_ratio
+
+    # Adjusted aspect ratio
+    ar_adj = w_adj / h_adj
+
+    # Determine bounding box based on the adjusted aspect ratio
+    y_min, y_max, x_min, x_max = -1.0, 1.0, -1.0, 1.0
+    if ar_adj > 1:
+        y_min, y_max = -1 / ar_adj, 1 / ar_adj
+    elif ar_adj < 1:
+        x_min, x_max = -ar_adj, ar_adj
+
+    return y_min, y_max, x_min, x_max
+
+
+@cache
+def make_grid(h_pos, w_pos):
+    grid = torch.stack(torch.meshgrid(h_pos, w_pos, indexing="ij"), dim=-1)
+    h, w, d = grid.shape
+    return grid.view(h * w, d)
+
+
+@cache
+def centers(start, stop, num, dtype=None, device=None):
+    edges = torch.linspace(start, stop, num + 1, dtype=dtype, device=device)
+    return (edges[:-1] + edges[1:]) / 2
+
+
+@cache
+def make_axial_pos(
+    h, w, pixel_aspect_ratio=1.0, align_corners=False, dtype=None, device=None
+):
+    y_min, y_max, x_min, x_max = bounding_box(h, w, pixel_aspect_ratio)
+    if align_corners:
+        h_pos = torch.linspace(y_min, y_max, h, dtype=dtype, device=device)
+        w_pos = torch.linspace(x_min, x_max, w, dtype=dtype, device=device)
+    else:
+        h_pos = centers(y_min, y_max, h, dtype=dtype, device=device)
+        w_pos = centers(x_min, x_max, w, dtype=dtype, device=device)
+    return make_grid(h_pos, w_pos)
+
+
+def rotate_half(x):
+    x = torch.stack((-x[..., 0::2], x[..., 1::2]), dim=-1)
+    return x.flatten(-2, -1)
+
+
+def apply_rotary_emb(freqs, t, start_index=0, scale=1.0):
+    freqs = freqs.to(t)
+    rot_dim = freqs.shape[-1]
+    end_index = start_index + rot_dim
+    t_left, t, t_right = (
+        t[..., :start_index],
+        t[..., start_index:end_index],
+        t[..., end_index:],
+    )
+    t = (t * freqs.cos() * scale) + (rotate_half(t) * freqs.sin() * scale)
+    return torch.cat((t_left, t, t_right), dim=-1)
+
+
+def freqs_pixel_log(max_freq=10.0):
+    def init(shape):
+        log_min = math.log(math.pi)
+        log_max = math.log(max_freq * math.pi / 2)
+        return torch.linspace(log_min, log_max, shape[-1]).expand(shape)
+
+    return init
+
+
+class AxialRoPE(nn.Module):
+    def __init__(
+        self, dim, n_heads, start_index=0, freqs_init=freqs_pixel_log(max_freq=10.0)
+    ):
+        super().__init__()
+        self.n_heads = n_heads
+        self.start_index = start_index
+        log_freqs = freqs_init((n_heads, dim // 4))
+        self.freqs_h = nn.Parameter(log_freqs.clone())
+        self.freqs_w = nn.Parameter(log_freqs.clone())
+
+    def extra_repr(self):
+        dim = (self.freqs_h.shape[-1] + self.freqs_w.shape[-1]) * 2
+        return f"dim={dim}, n_heads={self.n_heads}, start_index={self.start_index}"
+
+    def get_freqs(self, pos):
+        if pos.shape[-1] != 2:
+            raise ValueError("input shape must be (..., 2)")
+        freqs_h = pos[..., None, None, 0] * self.freqs_h.exp()
+        freqs_w = pos[..., None, None, 1] * self.freqs_w.exp()
+        freqs = torch.cat((freqs_h, freqs_w), dim=-1).repeat_interleave(2, dim=-1)
+        return freqs
+
+    def forward(self, x, pos):
+        freqs = self.get_freqs(pos)
+        return apply_rotary_emb(freqs, x, self.start_index)
diff --git a/pipelines/hdm/hdm/modules/text_encoders.py b/pipelines/hdm/hdm/modules/text_encoders.py
new file mode 100644
index 000000000..727352750
--- /dev/null
+++ b/pipelines/hdm/hdm/modules/text_encoders.py
@@ -0,0 +1,330 @@
+from typing import Any
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from transformers import AutoTokenizer, CLIPTextModel, T5EncoderModel, Qwen2Model
+
+from ..utils import remove_none, instantiate
+
+
+class BaseTextEncoder(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.tokenizer = None
+        self.text_model = None
+
+    def tokenize(self, text: str) -> list[int] | list[list[int]] | torch.LongTensor:
+        raise NotImplementedError
+
+    def encode(self, text: str) -> torch.Tensor:
+        raise NotImplementedError
+
+    def forward(self, tokenizer_outputs: list[dict[str, torch.Tensor]]):
+        raise NotImplementedError
+
+
+class SimpleTextEncoder(BaseTextEncoder):
+    def __init__(
+        self,
+        te_name: str = "apple/DFN5B-CLIP-ViT-H-14-378",
+        te_cls: type = CLIPTextModel,
+        te_kwargs: dict[str, Any] = {},
+        zero_for_padding: bool = True,
+        max_length: int = 256,
+    ):
+        super().__init__()
+        self.tokenizers = [AutoTokenizer.from_pretrained(te_name, **te_kwargs)]
+        for tokenizer in self.tokenizers:
+            if not tokenizer.pad_token:
+                tokenizer.pad_token = tokenizer.eos_token
+            if tokenizer.model_max_length > max_length:
+                tokenizer.model_max_length = max_length
+
+        self.text_model = (
+            instantiate(te_cls).from_pretrained(te_name).to(self.device_type)
+        )
+        self.zero_for_padding = zero_for_padding
+
+    def tokenize(self, text, **kwargs):
+        return [self.tokenizers[0](text, **kwargs)]
+
+    def encode(self, text, **kwargs):
+        return self.forward(self.tokenize(text, **kwargs))
+
+    def forward(self, tokenizers_outputs):
+        tokens = tokenizers_outputs[0]
+        text_model = self.text_model
+
+        input_ids = tokens["input_ids"].to(self.device_type.device)
+        attn_mask = tokens["attention_mask"].to(self.device_type.device)
+
+        # In CLIP we have `last_hidden_state = self.final_layer_norm(last_hidden_state)`
+        # The pooled embedding is also normalized
+        normed_embedding, pooled_embedding, *embeddings = text_model(
+            input_ids,
+            attention_mask=attn_mask,
+            output_hidden_states=True,
+            return_dict=False,
+        )
+        if len(embeddings):
+            embedding = embeddings[-1][-1]
+        else:
+            embedding = pooled_embedding[-1]
+            pooled_embedding = None
+        if self.zero_for_padding:
+            while embedding.ndim > attn_mask.ndim:
+                attn_mask = attn_mask.unsqueeze(-1)
+            embedding = embedding * attn_mask
+            normed_embedding = normed_embedding * attn_mask
+        return embedding, normed_embedding, pooled_embedding, attn_mask
+
+
+class ConcatTextEncoders(BaseTextEncoder):
+    DEFAULT_SETTINGS = {
+        "disable_autocast": False,
+        "concat_buckets": 0,
+        "use_pooled": False,
+        "need_mask": False,
+        "layer_ids": -1,
+    }
+
+    def __init__(
+        self,
+        tokenizers: list[str] = [],
+        text_models: list[dict] = [],
+        zero_for_padding: bool = True,
+        max_length: int = 256,
+        model_dim: int = -1,
+        output_dim: int = -1,
+        pooled_dim: int = -1,
+        extra_mlp: bool = False,
+    ):
+        """
+        A text encoder wrapper for multiple tokenizers and text models.
+        Can support tricky concat config like what SD3 need
+
+        SDXL:
+            tes: [CLIP-L, openCLIP-G]
+            concat_buckets: [0, 0]
+            use_pooled: [True, True]
+            layer_index: [-1, -2]
+        SD3:
+            tes: [CLIP-L, openCLIP-G, T5-xxl]
+            concat_buckets: [0, 0, 1]
+            use_pooled: [True, True, False]
+        """
+        super().__init__()
+        self.tokenizers = [
+            AutoTokenizer.from_pretrained(tokenizer) for tokenizer in tokenizers
+        ]
+        for tokenizer in self.tokenizers:
+            if not tokenizer.pad_token:
+                tokenizer.pad_token = tokenizer.eos_token
+            if tokenizer.model_max_length > max_length:
+                tokenizer.model_max_length = max_length
+
+        text_models_configs = [
+            (instantiate(config.pop("model")), {**config}) for config in text_models
+        ]
+        self.max_bucket = max([i[1]["concat_buckets"] for i in text_models_configs])
+        self.register_buffer("_device", torch.tensor(0), persistent=False)
+
+        self.text_models = nn.ModuleList([i[0] for i in text_models_configs])
+        self.configs = [i[1] for i in text_models_configs]
+        self.zero_for_padding = zero_for_padding
+
+        self.emb_mlp = self.pool_mlp = None
+        if extra_mlp and model_dim != -1:
+            if output_dim != -1:
+                self.emb_mlp = nn.Sequential(
+                    nn.LayerNorm(model_dim),
+                    nn.Linear(model_dim, model_dim * 4),
+                    nn.Mish(),
+                    nn.Linear(model_dim * 4, output_dim),
+                )
+            if pooled_dim != -1:
+                self.pool_mlp = nn.Sequential(
+                    nn.LayerNorm(model_dim),
+                    nn.Linear(model_dim, model_dim * 4),
+                    nn.Mish(),
+                    nn.Linear(model_dim * 4, pooled_dim),
+                )
+
+    def trainable_modules(self):
+        results = []
+        if self.emb_mlp is not None:
+            results.append(self.emb_mlp)
+        if self.pool_mlp is not None:
+            results.append(self.pool_mlp)
+        return results
+
+    def trainable_params(self):
+        results = []
+        if self.emb_mlp is not None:
+            results.extend(self.emb_mlp.parameters())
+        if self.pool_mlp is not None:
+            results.extend(self.pool_mlp.parameters())
+        return results
+
+    @property
+    def device(self):
+        return self._device.device
+
+    def tokenize(self, text, **kwargs):
+        results = []
+        for tokenizer in self.tokenizers:
+            results.append(tokenizer(text, **kwargs, return_tensors="pt"))
+        return results
+
+    def encode(self, text, **kwargs):
+        return self.forward(self.tokenize(text, **kwargs))
+
+    def forward(
+        self, tokenizers_outputs
+    ) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
+        """
+        Returns:
+            embedding: torch.Tensor
+            normed_embedding: torch.Tensor
+            pooled_embedding: torch.Tensor
+            attn_mask: torch.Tensor
+        """
+        attn_masks = [None for _ in range(self.max_bucket + 1)]
+        text_embeddings = [[] for _ in range(self.max_bucket + 1)]
+        normed_text_embeddings = [[] for _ in range(self.max_bucket + 1)]
+        pooled_text_embeddings = [[] for _ in range(self.max_bucket + 1)]
+        for idx, (tokens, text_model, config) in enumerate(
+            zip(tokenizers_outputs, self.text_models, self.configs)
+        ):
+            bucket = config["concat_buckets"]
+            need_mask = config["need_mask"]
+            use_pooled = config["use_pooled"]
+            layer_idx = config["layer_idx"]
+            disable_autocast = config["disable_autocast"]
+
+            input_ids = tokens["input_ids"].to(self.device)
+            attn_mask = tokens["attention_mask"].to(self.device)
+            if attn_masks[bucket] is None and need_mask:
+                attn_masks[bucket] = attn_mask
+
+            with torch.autocast("cuda", enabled=not disable_autocast):
+                output = text_model(
+                    input_ids,
+                    attention_mask=attn_mask,
+                    output_hidden_states=True,
+                    return_dict=True,
+                )
+                normed_embedding = output.last_hidden_state
+                # The case of CLIP
+                if hasattr(output, "pooler_output"):
+                    # embeddings is tuple
+                    embedding = output.hidden_states[layer_idx]
+                    pooled_embedding = output.pooler_output
+                # The case of T5 or other models
+                else:
+                    embedding = output.hidden_states[-1]
+                    pooled_embedding = torch.zeros_like(embedding[:, 0, :])
+
+            if self.zero_for_padding:
+                while embedding.ndim > attn_mask.ndim:
+                    attn_mask = attn_mask.unsqueeze(-1)
+                embedding = embedding * attn_mask
+                normed_embedding = normed_embedding * attn_mask
+            text_embeddings[bucket].append(embedding)
+            normed_text_embeddings[bucket].append(normed_embedding)
+            if use_pooled:
+                pooled_text_embeddings[bucket].append(pooled_embedding)
+
+        for i in range(len(text_embeddings)):
+            if text_embeddings[i] == []:
+                text_embeddings[i] = None
+                normed_text_embeddings[i] = None
+                pooled_text_embeddings[i] = None
+                continue
+            text_embeddings[i] = torch.cat(text_embeddings[i], dim=-1)
+            normed_text_embeddings[i] = torch.cat(normed_text_embeddings[i], dim=-1)
+            if pooled_text_embeddings[i] == []:
+                pooled_text_embeddings[i] = None
+                continue
+            pooled_text_embeddings[i] = torch.cat(pooled_text_embeddings[i], dim=-1)
+
+        max_dim = max(
+            embedding.size(-1) for embedding in text_embeddings if embedding is not None
+        )
+        for idx, embedding in enumerate(text_embeddings):
+            if embedding is None:
+                continue
+            if embedding.size(-1) < max_dim:
+                text_embeddings[idx] = torch.nn.functional.pad(
+                    embedding, (0, max_dim - embedding.size(-1))
+                )
+        for idx, embedding in enumerate(normed_text_embeddings):
+            if embedding is None:
+                continue
+            if embedding.size(-1) < max_dim:
+                normed_text_embeddings[idx] = torch.nn.functional.pad(
+                    embedding, (0, max_dim - embedding.size(-1))
+                )
+        if any(mask is not None for mask in attn_masks):
+            for idx, embedding in enumerate(text_embeddings):
+                if embedding is None:
+                    continue
+                elif attn_masks[idx] is None:
+                    attn_masks[idx] = torch.ones(
+                        embedding.size(0), embedding.size(1), device=embedding.device
+                    ).long()
+            attn_masks = torch.cat(remove_none(attn_masks), dim=1)
+        else:
+            attn_masks = None
+        if any(pooled is not None for pooled in pooled_text_embeddings):
+            pooled_text_embeddings = torch.cat(
+                remove_none(pooled_text_embeddings), dim=-1
+            )
+        else:
+            pooled_text_embeddings = None
+        text_embeddings = torch.cat(remove_none(text_embeddings), dim=1)
+        normed_text_embeddings = torch.cat(remove_none(normed_text_embeddings), dim=1)
+
+        if self.emb_mlp is not None:
+            text_embeddings = self.emb_mlp(text_embeddings)
+            normed_text_embeddings = self.emb_mlp(normed_text_embeddings)
+        if self.pool_mlp is not None and pooled_text_embeddings is not None:
+            pooled_text_embeddings = self.pool_mlp(pooled_text_embeddings)
+
+        return (
+            normed_text_embeddings,
+            text_embeddings,
+            pooled_text_embeddings,
+            attn_masks,
+        )
+
+
+if __name__ == "__main__":
+    te = ConcatTextEncoders(
+        tokenizers=[
+            "openai/clip-vit-large-patch14",
+            "laion/CLIP-ViT-bigG-14-laion2B-39B-b160k",
+            "google/t5-v1_1-xxl",
+        ],
+        text_models=[
+            (CLIPTextModel, "openai/clip-vit-large-patch14", {}),
+            (CLIPTextModel, "laion/CLIP-ViT-bigG-14-laion2B-39B-b160k", {}),
+            (
+                T5EncoderModel,
+                "google/t5-v1_1-xxl",
+                {},
+            ),  # Need `pip install sentencepiece`
+        ],
+        concat_buckets=[0, 0, 1],
+        use_pooled=[True, True, False],
+        layer_idx=[-1, -2, -1],
+        need_mask=[False, False, True],
+        device="cuda" if torch.cuda.is_available() else "cpu",
+    )
+    with torch.no_grad():
+        text_embeddings, normed_text_embeddings, pooled_text_embeddings, attn_masks = (
+            te.encode("hello")
+        )
+        print(text_embeddings.shape, normed_text_embeddings.shape)
+        print(pooled_text_embeddings.shape)
diff --git a/pipelines/hdm/hdm/modules/unet_patch.py b/pipelines/hdm/hdm/modules/unet_patch.py
new file mode 100644
index 000000000..a42edc0e4
--- /dev/null
+++ b/pipelines/hdm/hdm/modules/unet_patch.py
@@ -0,0 +1,611 @@
+import json
+from typing import Any, Optional, Dict
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from diffusers import UNet2DConditionModel
+from diffusers.models.unets.unet_2d_blocks import (
+    ResnetBlock2D,
+)
+from diffusers.models.transformers.transformer_2d import (
+    Transformer2DModel,
+    Transformer2DModelOutput,
+)
+from diffusers.models.attention import BasicTransformerBlock
+from diffusers.models.attention_processor import (
+    Attention,
+    XFormersAttnProcessor,
+    AttnProcessor2_0,
+)
+
+try:
+    import xformers
+    import xformers.ops
+except ImportError:
+    xformers = None
+
+from .rope import AxialRoPE, make_axial_pos
+from ..utils import instantiate
+
+
+class RoPEAttention(Attention):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        head_dim = self.inner_dim // self.heads
+        self.axial_rope = AxialRoPE(head_dim, self.heads)
+        self.set_processor(RoPEAttnProcessor2_0())
+
+    @classmethod
+    def apply_to(cls, original: Attention):
+        original.axial_rope = AxialRoPE(
+            original.inner_dim // original.heads, original.heads
+        )
+        original.set_processor(RoPEAttnProcessor2_0())
+        original.forward = lambda *args, **kwargs: cls.forward(
+            original, *args, **kwargs
+        )
+        return original
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        position_map: torch.Tensor,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        **cross_attention_kwargs,
+    ) -> torch.Tensor:
+        return self.processor(
+            self,
+            hidden_states,
+            position_map,
+            encoder_hidden_states,
+            attention_mask,
+            **cross_attention_kwargs,
+        )
+
+
+class RoPEAttnProcessor2_0(AttnProcessor2_0):
+    def __call__(
+        self,
+        attn: RoPEAttention,
+        hidden_states: torch.Tensor,
+        position_map: torch.Tensor,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        temb: Optional[torch.Tensor] = None,
+        *args,
+        **kwargs,
+    ) -> torch.Tensor:
+        residual = hidden_states
+        if attn.spatial_norm is not None:
+            hidden_states = attn.spatial_norm(hidden_states, temb)
+
+        input_ndim = hidden_states.ndim
+
+        if input_ndim == 4:
+            batch_size, channel, height, width = hidden_states.shape
+            hidden_states = hidden_states.view(
+                batch_size, channel, height * width
+            ).transpose(1, 2)
+
+        batch_size, sequence_length, _ = (
+            hidden_states.shape
+            if encoder_hidden_states is None
+            else encoder_hidden_states.shape
+        )
+
+        if attention_mask is not None:
+            attention_mask = attn.prepare_attention_mask(
+                attention_mask, sequence_length, batch_size
+            )
+            # scaled_dot_product_attention expects attention_mask shape to be
+            # (batch, heads, source_length, target_length)
+            attention_mask = attention_mask.view(
+                batch_size, attn.heads, -1, attention_mask.shape[-1]
+            )
+
+        if attn.group_norm is not None:
+            hidden_states = attn.group_norm(hidden_states.transpose(1, 2)).transpose(
+                1, 2
+            )
+
+        query = attn.to_q(hidden_states)
+
+        rotary_k = False
+        if encoder_hidden_states is None:
+            rotary_k = True
+            encoder_hidden_states = hidden_states
+        elif attn.norm_cross:
+            encoder_hidden_states = attn.norm_encoder_hidden_states(
+                encoder_hidden_states
+            )
+
+        key = attn.to_k(encoder_hidden_states)
+        value = attn.to_v(encoder_hidden_states)
+
+        inner_dim = key.shape[-1]
+        head_dim = inner_dim // attn.heads
+
+        query = query.view(batch_size, -1, attn.heads, head_dim)
+
+        key = key.view(batch_size, -1, attn.heads, head_dim)
+        value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
+
+        query = attn.axial_rope(query, position_map).transpose(1, 2)
+        if rotary_k:
+            key = attn.axial_rope(key, position_map).transpose(1, 2)
+        else:
+            key = key.transpose(1, 2)
+
+        # the output of sdp = (batch, num_heads, seq_len, head_dim)
+        # TODO: add support for attn.scale when we move to Torch 2.1
+        hidden_states = F.scaled_dot_product_attention(
+            query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
+        )
+
+        hidden_states = hidden_states.transpose(1, 2).reshape(
+            batch_size, -1, attn.heads * head_dim
+        )
+        hidden_states = hidden_states.to(query.dtype)
+
+        # linear proj
+        hidden_states = attn.to_out[0](hidden_states)
+        # dropout
+        hidden_states = attn.to_out[1](hidden_states)
+
+        if input_ndim == 4:
+            hidden_states = hidden_states.transpose(-1, -2).reshape(
+                batch_size, channel, height, width
+            )
+
+        if attn.residual_connection:
+            hidden_states = hidden_states + residual
+
+        hidden_states = hidden_states / attn.rescale_output_factor
+
+        return hidden_states
+
+
+class RoPEXFormersAttnProcessor(XFormersAttnProcessor):
+    def __call__(
+        self,
+        attn: RoPEAttention,
+        hidden_states: torch.Tensor,
+        position_map: torch.Tensor,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        temb: Optional[torch.Tensor] = None,
+        *args,
+        **kwargs,
+    ) -> torch.Tensor:
+        residual = hidden_states
+
+        if attn.spatial_norm is not None:
+            hidden_states = attn.spatial_norm(hidden_states, temb)
+
+        input_ndim = hidden_states.ndim
+
+        if input_ndim == 4:
+            batch_size, channel, height, width = hidden_states.shape
+            hidden_states = hidden_states.view(
+                batch_size, channel, height * width
+            ).transpose(1, 2)
+
+        batch_size, key_tokens, _ = (
+            hidden_states.shape
+            if encoder_hidden_states is None
+            else encoder_hidden_states.shape
+        )
+
+        attention_mask = attn.prepare_attention_mask(
+            attention_mask, key_tokens, batch_size
+        )
+        if attention_mask is not None:
+            _, query_tokens, _ = hidden_states.shape
+            attention_mask = attention_mask.expand(-1, query_tokens, -1)
+        if attention_mask is not None and attention_mask.ndim == 3:
+            attention_mask = attention_mask.reshape(
+                batch_size, -1, *attention_mask.shape[-2:]
+            )
+
+        if attn.group_norm is not None:
+            hidden_states = attn.group_norm(hidden_states.transpose(1, 2)).transpose(
+                1, 2
+            )
+
+        query = attn.to_q(hidden_states)
+
+        rotary_k = False
+        if encoder_hidden_states is None:
+            rotary_k = True
+            encoder_hidden_states = hidden_states
+        elif attn.norm_cross:
+            encoder_hidden_states = attn.norm_encoder_hidden_states(
+                encoder_hidden_states
+            )
+
+        key = attn.to_k(encoder_hidden_states)
+        value = attn.to_v(encoder_hidden_states)
+
+        inner_dim = key.shape[-1]
+        head_dim = inner_dim // attn.heads
+        query = query.reshape(batch_size, -1, attn.heads, head_dim)
+        key = key.reshape(batch_size, -1, attn.heads, head_dim)
+        value = value.reshape(batch_size, -1, attn.heads, head_dim)
+
+        query = attn.axial_rope(query, position_map)
+        if rotary_k:
+            key = attn.axial_rope(key, position_map)
+
+        if attention_mask is not None:
+            attention_mask = attention_mask.to(query)
+        hidden_states = xformers.ops.memory_efficient_attention(
+            query,
+            key,
+            value,
+            attn_bias=attention_mask,
+            op=self.attention_op,
+            scale=attn.scale,
+        )
+        hidden_states = hidden_states.to(query.dtype)
+        hidden_states = hidden_states.reshape(batch_size, -1, inner_dim)
+
+        # linear proj
+        hidden_states = attn.to_out[0](hidden_states)
+        # dropout
+        hidden_states = attn.to_out[1](hidden_states)
+
+        if input_ndim == 4:
+            hidden_states = hidden_states.transpose(-1, -2).reshape(
+                batch_size, channel, height, width
+            )
+
+        if attn.residual_connection:
+            hidden_states = hidden_states + residual
+
+        hidden_states = hidden_states / attn.rescale_output_factor
+
+        return hidden_states
+
+
+class RoPEBasicTransformerBlock(BasicTransformerBlock):
+    @classmethod
+    def apply_to(cls, original: BasicTransformerBlock):
+        original.forward = lambda *args, **kwargs: cls.forward(
+            original, *args, **kwargs
+        )
+        for module in original.modules():
+            if isinstance(module, Attention):
+                RoPEAttention.apply_to(module)
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        position_map: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+        encoder_attention_mask: Optional[torch.Tensor] = None,
+        timestep: Optional[torch.LongTensor] = None,
+        cross_attention_kwargs: Dict[str, Any] = None,
+        class_labels: Optional[torch.LongTensor] = None,
+        added_cond_kwargs: Optional[Dict[str, torch.Tensor]] = None,
+    ) -> torch.Tensor:
+        # Notice that normalization is always applied before the real computation in the following blocks.
+        # 0. Self-Attention
+        batch_size = hidden_states.shape[0]
+
+        if self.norm_type == "ada_norm":
+            norm_hidden_states = self.norm1(hidden_states, timestep)
+        elif self.norm_type == "ada_norm_zero":
+            norm_hidden_states, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.norm1(
+                hidden_states, timestep, class_labels, hidden_dtype=hidden_states.dtype
+            )
+        elif self.norm_type in ["layer_norm", "layer_norm_i2vgen"]:
+            norm_hidden_states = self.norm1(hidden_states)
+        elif self.norm_type == "ada_norm_continuous":
+            norm_hidden_states = self.norm1(
+                hidden_states, added_cond_kwargs["pooled_text_emb"]
+            )
+        elif self.norm_type == "ada_norm_single":
+            shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = (
+                self.scale_shift_table[None] + timestep.reshape(batch_size, 6, -1)
+            ).chunk(6, dim=1)
+            norm_hidden_states = self.norm1(hidden_states)
+            norm_hidden_states = norm_hidden_states * (1 + scale_msa) + shift_msa
+            norm_hidden_states = norm_hidden_states.squeeze(1)
+        else:
+            raise ValueError("Incorrect norm used")
+
+        if self.pos_embed is not None:
+            norm_hidden_states = self.pos_embed(norm_hidden_states)
+
+        # 1. Prepare GLIGEN inputs
+        cross_attention_kwargs = (
+            cross_attention_kwargs.copy() if cross_attention_kwargs is not None else {}
+        )
+        gligen_kwargs = cross_attention_kwargs.pop("gligen", None)
+
+        attn_output = self.attn1(
+            norm_hidden_states,
+            position_map,
+            encoder_hidden_states=(
+                encoder_hidden_states if self.only_cross_attention else None
+            ),
+            attention_mask=attention_mask,
+            **cross_attention_kwargs,
+        )
+        if self.norm_type == "ada_norm_zero":
+            attn_output = gate_msa.unsqueeze(1) * attn_output
+        elif self.norm_type == "ada_norm_single":
+            attn_output = gate_msa * attn_output
+
+        hidden_states = attn_output + hidden_states
+        if hidden_states.ndim == 4:
+            hidden_states = hidden_states.squeeze(1)
+
+        # 1.2 GLIGEN Control
+        if gligen_kwargs is not None:
+            hidden_states = self.fuser(hidden_states, gligen_kwargs["objs"])
+
+        # 3. Cross-Attention
+        if self.attn2 is not None:
+            if self.norm_type == "ada_norm":
+                norm_hidden_states = self.norm2(hidden_states, timestep)
+            elif self.norm_type in ["ada_norm_zero", "layer_norm", "layer_norm_i2vgen"]:
+                norm_hidden_states = self.norm2(hidden_states)
+            elif self.norm_type == "ada_norm_single":
+                # For PixArt norm2 isn't applied here:
+                # https://github.com/PixArt-alpha/PixArt-alpha/blob/0f55e922376d8b797edd44d25d0e7464b260dcab/diffusion/model/nets/PixArtMS.py#L70C1-L76C103
+                norm_hidden_states = hidden_states
+            elif self.norm_type == "ada_norm_continuous":
+                norm_hidden_states = self.norm2(
+                    hidden_states, added_cond_kwargs["pooled_text_emb"]
+                )
+            else:
+                raise ValueError("Incorrect norm")
+
+            if self.pos_embed is not None and self.norm_type != "ada_norm_single":
+                norm_hidden_states = self.pos_embed(norm_hidden_states)
+
+            attn_output = self.attn2(
+                norm_hidden_states,
+                position_map,
+                encoder_hidden_states=encoder_hidden_states,
+                attention_mask=encoder_attention_mask,
+                **cross_attention_kwargs,
+            )
+            hidden_states = attn_output + hidden_states
+
+        # 4. Feed-forward
+        # i2vgen doesn't have this norm 🤷‍♂️
+        if self.norm_type == "ada_norm_continuous":
+            norm_hidden_states = self.norm3(
+                hidden_states, added_cond_kwargs["pooled_text_emb"]
+            )
+        elif not self.norm_type == "ada_norm_single":
+            norm_hidden_states = self.norm3(hidden_states)
+
+        if self.norm_type == "ada_norm_zero":
+            norm_hidden_states = (
+                norm_hidden_states * (1 + scale_mlp[:, None]) + shift_mlp[:, None]
+            )
+
+        if self.norm_type == "ada_norm_single":
+            norm_hidden_states = self.norm2(hidden_states)
+            norm_hidden_states = norm_hidden_states * (1 + scale_mlp) + shift_mlp
+
+        ff_output = self.ff(norm_hidden_states)
+
+        if self.norm_type == "ada_norm_zero":
+            ff_output = gate_mlp.unsqueeze(1) * ff_output
+        elif self.norm_type == "ada_norm_single":
+            ff_output = gate_mlp * ff_output
+
+        hidden_states = ff_output + hidden_states
+        if hidden_states.ndim == 4:
+            hidden_states = hidden_states.squeeze(1)
+
+        return hidden_states
+
+
+class RoPETransformer2DModel(Transformer2DModel):
+    _org_init = Transformer2DModel.__init__
+
+    def __init__(self, *args, **kwargs):
+        RoPETransformer2DModel._org_init(self, *args, **kwargs)
+        for block in self.transformer_blocks:
+            if isinstance(block, BasicTransformerBlock):
+                RoPEBasicTransformerBlock.apply_to(block)
+
+    def forward(
+        self,
+        hidden_states: torch.Tensor,
+        encoder_hidden_states: Optional[torch.Tensor] = None,
+        timestep: Optional[torch.LongTensor] = None,
+        added_cond_kwargs: Dict[str, torch.Tensor] = None,
+        class_labels: Optional[torch.LongTensor] = None,
+        cross_attention_kwargs: Dict[str, Any] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        encoder_attention_mask: Optional[torch.Tensor] = None,
+        position_map: Optional[torch.Tensor] = None,
+        return_dict: bool = True,
+    ):
+        if attention_mask is not None and attention_mask.ndim == 2:
+            # assume that mask is expressed as:
+            #   (1 = keep,      0 = discard)
+            # convert mask into a bias that can be added to attention scores:
+            #       (keep = +0,     discard = -10000.0)
+            attention_mask = (1 - attention_mask.to(hidden_states.dtype)) * -10000.0
+            attention_mask = attention_mask.unsqueeze(1)
+
+        # convert encoder_attention_mask to a bias the same way we do for attention_mask
+        if encoder_attention_mask is not None and encoder_attention_mask.ndim == 2:
+            encoder_attention_mask = (
+                1 - encoder_attention_mask.to(hidden_states.dtype)
+            ) * -10000.0
+            encoder_attention_mask = encoder_attention_mask.unsqueeze(1)
+
+        # 1. Input
+        if self.is_input_continuous:
+            batch_size, _, height, width = hidden_states.shape
+            residual = hidden_states
+            hidden_states, inner_dim = self._operate_on_continuous_inputs(hidden_states)
+        elif self.is_input_vectorized:
+            height = self.latent_image_embedding.height
+            width = self.latent_image_embedding.width
+            hidden_states = self.latent_image_embedding(hidden_states)
+        elif self.is_input_patches:
+            height, width = (
+                hidden_states.shape[-2] // self.patch_size,
+                hidden_states.shape[-1] // self.patch_size,
+            )
+            hidden_states, encoder_hidden_states, timestep, embedded_timestep = (
+                self._operate_on_patched_inputs(
+                    hidden_states, encoder_hidden_states, timestep, added_cond_kwargs
+                )
+            )
+        if position_map is None:
+            position_map = make_axial_pos(
+                h=height,
+                w=width,
+                device=hidden_states.device,
+                dtype=hidden_states.dtype,
+            )
+        else:
+            position_map = position_map.to(hidden_states)
+            assert position_map.shape[-3:] == (height, width, 2)
+
+        # 2. Blocks
+        for block in self.transformer_blocks:
+            if self.training and self.gradient_checkpointing:
+
+                def create_custom_forward(module, return_dict=None):
+                    def custom_forward(*inputs):
+                        if return_dict is not None:
+                            return module(*inputs, return_dict=return_dict)
+                        else:
+                            return module(*inputs)
+
+                    return custom_forward
+
+                ckpt_kwargs: Dict[str, Any] = {"use_reentrant": False}
+                hidden_states = torch.utils.checkpoint.checkpoint(
+                    create_custom_forward(block),
+                    hidden_states,
+                    position_map,
+                    attention_mask,
+                    encoder_hidden_states,
+                    encoder_attention_mask,
+                    timestep,
+                    cross_attention_kwargs,
+                    class_labels,
+                    **ckpt_kwargs,
+                )
+            else:
+                hidden_states = block(
+                    hidden_states,
+                    position_map,
+                    attention_mask=attention_mask,
+                    encoder_hidden_states=encoder_hidden_states,
+                    encoder_attention_mask=encoder_attention_mask,
+                    timestep=timestep,
+                    cross_attention_kwargs=cross_attention_kwargs,
+                    class_labels=class_labels,
+                )
+
+        # 3. Output
+        if self.is_input_continuous:
+            output = self._get_output_for_continuous_inputs(
+                hidden_states=hidden_states,
+                residual=residual,
+                batch_size=batch_size,
+                height=height,
+                width=width,
+                inner_dim=inner_dim,
+            )
+        elif self.is_input_vectorized:
+            output = self._get_output_for_vectorized_inputs(hidden_states)
+        elif self.is_input_patches:
+            output = self._get_output_for_patched_inputs(
+                hidden_states=hidden_states,
+                timestep=timestep,
+                class_labels=class_labels,
+                embedded_timestep=embedded_timestep,
+                height=height,
+                width=width,
+            )
+
+        if not return_dict:
+            return (output,)
+
+        return Transformer2DModelOutput(sample=output)
+
+
+org_init = Transformer2DModel.__init__
+org_forward = Transformer2DModel.forward
+
+
+def apply_patch():
+    import diffusers.models.transformers.transformer_2d as transformer_2d
+
+    transformer_2d.Transformer2DModel.__init__ = RoPETransformer2DModel.__init__
+    transformer_2d.Transformer2DModel.forward = RoPETransformer2DModel.forward
+
+
+def restore():
+    import diffusers.models.transformers.transformer_2d as transformer_2d
+
+    transformer_2d.Transformer2DModel.__init__ = org_init
+    transformer_2d.Transformer2DModel.forward = org_forward
+
+
+class HDUNet2DConditionModel(UNet2DConditionModel):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        for module in self.modules():
+            if isinstance(module, BasicTransformerBlock):
+                nn.init.constant_(module.attn1.to_out[0].weight, 0.0)
+                if module.attn2 is not None:
+                    nn.init.constant_(module.attn2.to_out[0].weight, 0.0)
+                if isinstance(module.ff.net[-2], nn.Linear):
+                    nn.init.constant_(module.ff.net[-2].weight, 0.0)
+                    nn.init.constant_(module.ff.net[-2].bias, 0.0)
+                else:
+                    nn.init.constant_(module.ff.net[-1].weight, 0.0)
+                    nn.init.constant_(module.ff.net[-1].bias, 0.0)
+            if isinstance(module, ResnetBlock2D):
+                nn.init.constant_(module.conv2.weight, 0.0)
+                nn.init.constant_(module.conv2.bias, 0.0)
+        nn.init.constant_(self.conv_out.weight, 0.0)
+
+    @classmethod
+    def from_config(cls, arch: dict):
+        if isinstance(arch, str):
+            with open(arch, "r") as f:
+                arch = json.load(f)
+        return cls(**instantiate(arch))
+
+
+class RoPEUNet2DConditionModel(HDUNet2DConditionModel):
+    def __init__(self, *args, **kwargs):
+        apply_patch()
+        super().__init__(*args, **kwargs)
+        restore()
+        if xformers is not None:
+            self.set_attn_processor(RoPEXFormersAttnProcessor())
+
+    @classmethod
+    def from_config(cls, arch: dict):
+        if isinstance(arch, str):
+            with open(arch, "r") as f:
+                arch = json.load(f)
+        return cls(**instantiate(arch))
+
+    def forward(self, *args, **kwargs):
+        apply_patch()
+        result = super().forward(*args, **kwargs)
+        restore()
+        return result
diff --git a/pipelines/hdm/hdm/modules/xut.py b/pipelines/hdm/hdm/modules/xut.py
new file mode 100644
index 000000000..619b79903
--- /dev/null
+++ b/pipelines/hdm/hdm/modules/xut.py
@@ -0,0 +1,98 @@
+import json
+import torch
+
+from diffusers.configuration_utils import ConfigMixin, register_to_config
+from diffusers.models.modeling_utils import ModelMixin
+
+from ...xut.xut import XUDiT
+from .base import *
+
+
+class XUDiTConditionModel(ModelMixin, ConfigMixin):
+    _supports_gradient_checkpointing = True
+
+    @register_to_config
+    def __init__(
+        self,
+        patch_size=2,
+        input_dim=4,
+        dim=1024,
+        ctx_dim=1024,
+        ctx_size=256,
+        heads=16,
+        dim_head=64,
+        mlp_dim=3072,
+        depth=8,
+        enc_blocks=1,
+        dec_blocks=2,
+        dec_ctx=False,
+        class_cond=0,
+        shared_adaln=True,
+        concat_ctx=True,
+        use_dyt=False,
+        double_t=False,
+        addon_info_embs_dim=None,
+        tread_config=None,
+    ):
+        super().__init__()
+        self.model = XUDiT(
+            patch_size=patch_size,
+            input_dim=input_dim,
+            dim=dim,
+            ctx_dim=ctx_dim,
+            ctx_size=ctx_size,
+            heads=heads,
+            dim_head=dim_head,
+            mlp_dim=mlp_dim,
+            depth=depth,
+            enc_blocks=enc_blocks,
+            dec_blocks=dec_blocks,
+            dec_ctx=dec_ctx,
+            class_cond=class_cond,
+            shared_adaln=shared_adaln,
+            concat_ctx=concat_ctx,
+            use_dyt=use_dyt,
+            double_t=double_t,
+            addon_info_embs_dim=addon_info_embs_dim,
+            tread_config=tread_config,
+        )
+
+    @classmethod
+    def from_config(cls, config: Dict[str, Any] | str) -> "XUDiTConditionModel":
+        if isinstance(config, str):
+            with open(config, "r") as f:
+                config = json.load(f)
+        return cls(**config)
+
+    def enable_gradient_checkpointing(self):
+        return self.model.set_grad_ckpt(True)
+
+    def disable_gradient_checkpointing(self):
+        return self.model.set_grad_ckpt(False)
+
+    def forward(
+        self,
+        sample: torch.Tensor,
+        timestep: Union[torch.Tensor, float, int],
+        encoder_hidden_states: torch.Tensor,
+        class_labels: Optional[torch.Tensor] = None,
+        timestep_cond: Optional[torch.Tensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        cross_attention_kwargs: Optional[Dict[str, Any]] = None,
+        added_cond_kwargs: Optional[Dict[str, torch.Tensor]] = None,
+        down_block_additional_residuals: Optional[Tuple[torch.Tensor]] = None,
+        mid_block_additional_residual: Optional[torch.Tensor] = None,
+        down_intrablock_additional_residuals: Optional[Tuple[torch.Tensor]] = None,
+        encoder_attention_mask: Optional[torch.Tensor] = None,
+        return_dict: bool = True,
+        pos_map: Optional[torch.Tensor] = None,
+    ) -> Union[UNet2DConditionOutput, Tuple]:
+        if added_cond_kwargs is None:
+            added_cond_kwargs = {}
+        result = self.model(
+            sample, timestep, encoder_hidden_states, pos_map, **added_cond_kwargs
+        )
+        if return_dict:
+            return UNet2DConditionOutput(sample=result)
+        else:
+            return (sample,)
diff --git a/pipelines/hdm/hdm/pipeline.py b/pipelines/hdm/hdm/pipeline.py
new file mode 100644
index 000000000..1f9eaa4b2
--- /dev/null
+++ b/pipelines/hdm/hdm/pipeline.py
@@ -0,0 +1,165 @@
+from typing import Optional, Tuple, Union
+
+import torch
+
+from diffusers import DiffusionPipeline, ImagePipelineOutput
+from diffusers import AutoencoderKL
+from transformers import Qwen3Model, Qwen2Tokenizer
+
+from .modules.xut import XUDiTConditionModel
+from ..xut.modules.axial_rope import make_axial_pos_no_cache
+
+
+class HDMXUTPipeline(DiffusionPipeline):
+    transformer: XUDiTConditionModel
+    tokenizer = Qwen2Tokenizer
+    text_encoder: Qwen3Model
+    vae: AutoencoderKL
+
+    def __init__(
+        self,
+        transformer: XUDiTConditionModel,
+        text_encoder: Qwen3Model,
+        tokenizer: Qwen2Tokenizer,
+        vae: AutoencoderKL,
+        scheduler,
+    ):
+        super().__init__()
+        self.register_modules(
+            transformer=transformer,
+            text_encoder=text_encoder,
+            tokenizer=tokenizer,
+            vae=vae,
+            scheduler=scheduler,
+        )
+        self.vae_mean = torch.tensor(self.vae.config.latents_mean)[None, :, None, None]
+        self.vae_std = torch.tensor(self.vae.config.latents_std)[None, :, None, None]
+
+    def apply_compile(self, *args, **kwargs):
+        self.transformer.model.prev_tread_trns = torch.compile(
+            self.transformer.model.prev_tread_trns, *args, **kwargs
+        )
+        self.transformer.model.backbone = torch.compile(
+            self.transformer.model.backbone, *args, **kwargs
+        )
+        self.transformer.model.post_tread_trns = torch.compile(
+            self.transformer.model.post_tread_trns, *args, **kwargs
+        )
+        self.vae.encoder = torch.compile(self.vae.encoder, *args, **kwargs)
+        self.vae.decoder = torch.compile(self.vae.decoder, *args, **kwargs)
+
+    @torch.no_grad()
+    def __call__(
+        self,
+        prompt: str = "a photo of a dog",
+        negative_prompt: str = "",
+        width: int = 1024,
+        height: int = 1024,
+        cfg_scale: float = 3.0,
+        num_inference_steps: int = 16,
+        camera_param: dict[str, float] = {
+            "zoom": 1.0,
+            "x_shift": 0.0,
+            "y_shift": 0.0,
+        },
+        tread_gamma1: float = 0.0,
+        tread_gamma2: float = 0.25,
+        generator: Optional[torch.Generator] = None,
+        output_type: Optional[str] = "pil",
+        return_dict: bool = True,
+        **kwargs,
+    ) -> Union[ImagePipelineOutput, Tuple]:
+        if isinstance(prompt, str):
+            prompt = [prompt]
+        if isinstance(negative_prompt, str):
+            negative_prompt = [negative_prompt]
+        if len(negative_prompt) == 1:
+            negative_prompt = negative_prompt * len(prompt)
+
+        prompt_tokens = self.tokenizer(
+            prompt,
+            padding="longest",
+            return_tensors="pt",
+        )
+        negative_prompt_tokens = self.tokenizer(
+            negative_prompt,
+            padding="longest",
+            return_tensors="pt",
+        )
+
+        prompt_emb = self.text_encoder(
+            input_ids=prompt_tokens.input_ids.to(self.device),
+            attention_mask=prompt_tokens.attention_mask.to(self.device),
+        ).last_hidden_state
+        negative_prompt_emb = self.text_encoder(
+            input_ids=negative_prompt_tokens.input_ids.to(self.device),
+            attention_mask=negative_prompt_tokens.attention_mask.to(self.device),
+        ).last_hidden_state
+
+        # Sample gaussian noise to begin loop
+        image = torch.randn(
+            (
+                len(prompt),
+                self.transformer.config.input_dim,
+                height // 16 * 2,
+                width // 16 * 2,
+            ),
+            generator=generator[0],
+        )
+        image = image.to(self.device).to(self.dtype)
+        aspect_ratio = (
+            torch.tensor([width / height], device=self.device)
+            .log()
+            .repeat(image.size(0))
+        ).to(self.dtype)
+
+        latent_h, latent_w = image.shape[-2:]
+        pos_map = make_axial_pos_no_cache(latent_h, latent_w, device=self.device)
+        pos_map[..., 0] = pos_map[..., 0] + camera_param.get("y_shift", 0.0)
+        pos_map[..., 1] = pos_map[..., 1] + camera_param.get("x_shift", 0.0)
+        pos_map = pos_map / camera_param.get("zoom", 1.0)
+        pos_map = pos_map[None].expand(image.size(0), -1, -1).to(self.dtype)
+
+        t = torch.tensor([1] * image.size(0), device=self.device).to(self.dtype)
+        current_t = 1.0
+        dt = 1.0 / num_inference_steps
+
+        for _ in (pbar := self.progress_bar(range(num_inference_steps))):
+            cond = self.transformer(
+                image.to(self.dtype),
+                t,
+                prompt_emb,
+                added_cond_kwargs={
+                    "addon_info": aspect_ratio,
+                    "tread_rate": tread_gamma1,
+                },
+                pos_map=pos_map,
+            ).sample.float()
+            uncond = self.transformer(
+                image.to(self.dtype),
+                t,
+                negative_prompt_emb,
+                added_cond_kwargs={
+                    "addon_info": aspect_ratio,
+                    "tread_rate": tread_gamma2,
+                },
+                pos_map=pos_map,
+            ).sample.float()
+            cfg_flow = uncond + cfg_scale * (cond - uncond)
+            image = image - dt * cfg_flow
+            t = t - dt
+            current_t -= dt
+
+        torch.cuda.empty_cache()
+        image = image * self.vae_std.to(self.device) + self.vae_mean.to(self.device)
+        image = torch.concat([self.vae.decode(i[None].to(self.dtype)).sample for i in image])
+        image = (image.float() / 2 + 0.5).clamp(0, 1)
+        image = image.cpu().permute(0, 2, 3, 1).numpy()
+
+        if output_type == "pil":
+            image = self.numpy_to_pil(image)
+
+        if not return_dict:
+            return (image,)
+
+        return ImagePipelineOutput(images=image)
diff --git a/pipelines/hdm/hdm/trainer/__init__.py b/pipelines/hdm/hdm/trainer/__init__.py
new file mode 100644
index 000000000..7bdfa7e7b
--- /dev/null
+++ b/pipelines/hdm/hdm/trainer/__init__.py
@@ -0,0 +1 @@
+from .trainer import DMTrainer, FlowTrainer
diff --git a/pipelines/hdm/hdm/trainer/callbacks.py b/pipelines/hdm/hdm/trainer/callbacks.py
new file mode 100644
index 000000000..16fb22fdf
--- /dev/null
+++ b/pipelines/hdm/hdm/trainer/callbacks.py
@@ -0,0 +1,66 @@
+from operator import is_
+import os
+
+import torch
+import wandb
+
+from lightning.pytorch import Callback, Trainer
+from hdm.trainer import DMTrainer
+
+
+class ImageGenCallback(Callback):
+    def __init__(self, config, img_gen_func):
+        self.config = {
+            "period": 100,
+            "num": 4,
+            "preview_num": 4,
+            "batch_size": 4,
+            "steps": 24,
+        }
+        self.config.update(config)
+        self.img_gen = img_gen_func
+
+    @torch.no_grad()
+    def on_train_batch_start(
+        self, trainer: Trainer, pl_module: DMTrainer, batch, batch_idx
+    ):
+        if batch_idx % self.config["period"] == 0:
+            is_training = pl_module.training
+            pl_module.eval()
+            torch.cuda.empty_cache()
+            captions, images = self.img_gen(pl_module, batch, self.config)
+            torch.cuda.empty_cache()
+
+            if hasattr(trainer.logger, "id"):
+                id = trainer.logger.id
+            elif hasattr(trainer.logger, "experiment"):
+                id = getattr(trainer.logger.experiment, "id", self.config.get("id", 0))
+            else:
+                id = self.config.get("id", 0)
+            if not isinstance(id, (str, bytes, int, float)):
+                id = self.config.get("id", 0)
+            if "id" in self.config:
+                id = self.config["id"]
+
+            rank = trainer.local_rank
+            base_idx = rank * self.config["num"]
+
+            os.makedirs(f"./sample/{id}/{trainer.global_step}", exist_ok=True)
+            data = []
+            for idx, (caption, image) in enumerate(zip(captions, images)):
+                idx = base_idx + idx
+                image.save(f"./sample/{id}/{trainer.global_step}/{idx}.png")
+                data.append(
+                    [
+                        caption,
+                        wandb.Image(f"./sample/{id}/{trainer.global_step}/{idx}.png"),
+                    ]
+                )
+            if trainer.is_global_zero:
+                trainer.logger.log_table(
+                    key="sample/images",
+                    columns=["caption", "image"],
+                    data=data[: self.config["preview_num"]],
+                )
+            torch.cuda.empty_cache()
+            pl_module.train(is_training)
diff --git a/pipelines/hdm/hdm/trainer/diffusion.py b/pipelines/hdm/hdm/trainer/diffusion.py
new file mode 100644
index 000000000..7d2209c3b
--- /dev/null
+++ b/pipelines/hdm/hdm/trainer/diffusion.py
@@ -0,0 +1,68 @@
+import torch
+
+# import torch.nn as nn
+# import torch.nn.functional as F
+# import torch.optim as optim
+
+from diffusers import EulerDiscreteScheduler
+
+
+def get_noise_noisy_latents_and_timesteps(
+    noise_scheduler: EulerDiscreteScheduler, latents
+):
+    noise = torch.randn_like(latents, device=latents.device)
+    b_size = latents.shape[0]
+    min_timestep = 0
+    max_timestep = noise_scheduler.config.num_train_timesteps
+
+    timesteps = torch.randint(
+        min_timestep, max_timestep, (b_size,), device=latents.device
+    )
+
+    sigmas = noise_scheduler.sigmas.to(device=latents.device, dtype=latents.dtype)
+    schedule_timesteps = noise_scheduler.timesteps.to(latents.device)
+    timesteps = timesteps.to(latents.device)
+    step_indices = [(schedule_timesteps == t).nonzero().item() for t in timesteps]
+    sigma = sigmas[step_indices].flatten()
+    while len(sigma.shape) < len(latents.shape):
+        sigma = sigma.unsqueeze(-1)
+
+    # Diffusion Forward process
+    noisy_samples = latents + noise * sigma
+    scale = 1 / (sigma**2 + 1) ** 0.5
+    return noisy_samples * scale, noise, timesteps
+
+
+def apply_snr_weight(loss, timesteps, noise_scheduler, gamma, v_prediction=False):
+    snr = torch.stack([noise_scheduler.all_snr[t] for t in timesteps])
+    min_snr_gamma = torch.minimum(snr, torch.full_like(snr, gamma))
+    if v_prediction:
+        snr_weight = torch.div(min_snr_gamma, snr + 1).float().to(loss.device)
+    else:
+        snr_weight = torch.div(min_snr_gamma, snr).float().to(loss.device)
+    loss = loss * snr_weight
+    return loss
+
+
+def apply_debiased_estimation(loss, timesteps, noise_scheduler):
+    snr_t = torch.stack([noise_scheduler.all_snr[t] for t in timesteps])  # batch_size
+    snr_t = torch.minimum(
+        snr_t, torch.ones_like(snr_t) * 1000
+    )  # if timestep is 0, snr_t is inf, so limit it to 1000
+    weight = 1 / torch.sqrt(snr_t)
+    loss = weight * loss
+    return loss
+
+
+def prepare_scheduler_for_custom_training(noise_scheduler, device):
+    if hasattr(noise_scheduler, "all_snr"):
+        return
+
+    alphas_cumprod = noise_scheduler.alphas_cumprod
+    sqrt_alphas_cumprod = torch.sqrt(alphas_cumprod)
+    sqrt_one_minus_alphas_cumprod = torch.sqrt(1.0 - alphas_cumprod)
+    alpha = sqrt_alphas_cumprod
+    sigma = sqrt_one_minus_alphas_cumprod
+    all_snr = (alpha / sigma) ** 2
+
+    noise_scheduler.all_snr = all_snr.to(device)
diff --git a/pipelines/hdm/hdm/trainer/trainer.py b/pipelines/hdm/hdm/trainer/trainer.py
new file mode 100644
index 000000000..a195d39f3
--- /dev/null
+++ b/pipelines/hdm/hdm/trainer/trainer.py
@@ -0,0 +1,473 @@
+import os
+from typing import Any, Iterator
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+import lightning.pytorch as pl
+
+from diffusers import (
+    EulerDiscreteScheduler,
+    UNet2DConditionModel,
+    AutoencoderKL,
+)
+from anyschedule import AnySchedule
+
+from ..utils import instantiate
+from ..modules.text_encoders import BaseTextEncoder
+from .diffusion import (
+    get_noise_noisy_latents_and_timesteps,
+    prepare_scheduler_for_custom_training,
+)
+
+
+class BaseTrainer(pl.LightningModule):
+    def __init__(
+        self,
+        *args,
+        name: str = "",
+        lr: float = 1e-5,
+        optimizer: type[optim.Optimizer] = optim.AdamW,
+        opt_configs: dict[str, Any] = {
+            "weight_decay": 0.01,
+            "betas": (0.9, 0.999),
+        },
+        lr_sch_configs: dict[str, Any] = {
+            "lr": {
+                "mode": "cosine",
+                "end": 100000,
+                "min_value": 0.001,
+            }
+        },
+        use_warm_up: bool = True,
+        warm_up_period: int = 1000,
+        **kwargs,
+    ):
+        super().__init__()
+        self.name = name
+        self.train_params: Iterator[nn.Parameter] = None
+        self.optimizer = instantiate(optimizer)
+        self.opt_configs = opt_configs
+        self.lr = lr
+        self.lr_sch_configs = lr_sch_configs
+        self.use_warm_up = use_warm_up
+        self.warm_up_period = warm_up_period
+
+    def configure_optimizers(self):
+        parameters = []
+        assert self.train_params is not None
+        for param in self.train_params:
+            if param.ndim < 2:  # bias, norm, ...
+                fan_in = param.numel()
+            elif param.ndim > 2:  # Conv layer in patch embedding
+                # For conv layers, fan_in is channels_in * kernel_size^2
+                fan_ins = param.shape[1:]
+                fan_in = 1
+                for fan_in_i in fan_ins:
+                    fan_in *= fan_in_i
+            else:  # Linear layers, including attention and MLP
+                fan_in = param.shape[1]
+            parameters.append(
+                {
+                    "params": param,
+                    "lr": self.lr / fan_in,
+                }
+            )
+        optimizer = self.optimizer(parameters, lr=self.lr, **self.opt_configs)
+
+        lr_scheduler = None
+        if bool(self.lr_sch_configs):
+            lr_scheduler = AnySchedule(optimizer=optimizer, config=self.lr_sch_configs)
+
+        if lr_scheduler is None:
+            return optimizer
+        else:
+            return {
+                "optimizer": optimizer,
+                "lr_scheduler": {"scheduler": lr_scheduler, "interval": "step"},
+            }
+
+
+class DMTrainer(BaseTrainer):
+    def __init__(
+        self,
+        unet: UNet2DConditionModel,
+        te: BaseTextEncoder,
+        vae: AutoencoderKL | None = None,
+        unet_compile: bool = False,
+        te_compile: bool = False,
+        vae_compile: bool = False,
+        te_use_normed_ctx: bool = False,
+        te_freeze: bool = True,
+        vae_std: float = 7.5,
+        vae_mean: float = 1.125,
+        scheduler: EulerDiscreteScheduler | None = None,
+        lycoris_model: nn.Module | None = None,
+        *args,
+        name: str = "",
+        lr: float = 1e-5,
+        optimizer: type[optim.Optimizer] = optim.AdamW,
+        opt_configs: dict[str, Any] = {
+            "weight_decay": 0.01,
+            "betas": (0.9, 0.999),
+        },
+        lr_sch_configs: dict[str, Any] = {},
+        use_warm_up: bool = True,
+        warm_up_period: int = 1000,
+        full_config: dict[str, Any] = {},
+        **kwargs,
+    ):
+        super(DMTrainer, self).__init__(
+            name=name,
+            lr=lr,
+            optimizer=optimizer,
+            opt_configs=opt_configs,
+            lr_sch_configs=lr_sch_configs,
+            use_warm_up=use_warm_up,
+            warm_up_period=warm_up_period,
+        )
+        self.save_hyperparameters(
+            ignore=["unet", "scheduler", "te", "vae", "lycoris_model", "args", "kwargs"]
+        )
+        prepare_scheduler_for_custom_training(scheduler, self.device)
+
+        if unet_compile:
+            unet = torch.compile(unet)
+
+        if te_compile:
+            te = torch.compile(te)
+
+        if vae_compile and vae is not None:
+            vae = torch.compile(vae)
+
+        if te_freeze:
+            te.requires_grad_(False).eval()
+
+        if vae is not None:
+            vae.requires_grad_(False).eval()
+
+        self.unet = unet
+        self.te = te
+        self.vae = vae
+        self.scheduler = scheduler
+
+        self.te_use_normed_ctx = te_use_normed_ctx
+        self.vae_std = vae_std
+        self.vae_mean = vae_mean
+
+        self.lycoris_model = lycoris_model
+
+        self.epoch = 0
+        self.opt_step = 0
+        self.ema_loss = 0
+        self.ema_decay = 0.99
+
+        if lycoris_model is not None:
+            self.lycoris_model.train()
+            self.train_params = self.lycoris_model.parameters()
+        else:
+            self.unet.requires_grad_(True).train()
+            self.train_params = self.unet.parameters()
+
+    def on_train_epoch_end(self) -> None:
+        self.epoch += 1
+        if self.lycoris_model is not None:
+            dir = "./lycoris_weight"
+            epoch = self.epoch
+            if self._trainer is not None:
+                trainer = self._trainer
+                epoch = trainer.current_epoch
+                if len(trainer.loggers) > 0:
+                    if trainer.loggers[0].save_dir is not None:
+                        save_dir = trainer.loggers[0].save_dir
+                    else:
+                        save_dir = trainer.default_root_dir
+                    name = trainer.loggers[0].name
+                    version = trainer.loggers[0].version
+                    version = (
+                        version if isinstance(version, str) else f"version_{version}"
+                    )
+                    dir = os.path.join(save_dir, str(name), version, "lycoris_weight")
+                else:
+                    # if no loggers, use default_root_dir
+                    dir = os.path.join(trainer.default_root_dir, "lycoris_weight")
+            os.makedirs(dir, exist_ok=True)
+            model_weight = {
+                k: v for k, v in self.unet.named_parameters() if v.requires_grad
+            }
+            lycoris_weight = self.lycoris_model.state_dict() | model_weight
+            torch.save(lycoris_weight, os.path.join(dir, f"epoch={epoch}.pt"))
+
+    def training_step(self, batch, idx):
+        x, captions, tokenizer_outputs = batch
+        # print(type(x), type(captions), type(tokenizer_outputs), type(added_cond))
+
+        if self.vae is not None:
+            with torch.no_grad():
+                latent_dist = self.vae.encode(x).latent_dist
+                x = latent_dist.sample()
+                x = (x - self.vae_mean) / self.vae_std
+
+        b, c, h, w = x.shape
+
+        noisy_latent, noise, timesteps = get_noise_noisy_latents_and_timesteps(
+            self.scheduler, x
+        )
+
+        if self.scheduler.config.prediction_type == "epsilon":
+            target = noise
+        elif self.scheduler.config.prediction_type == "v_prediction":
+            target = self.scheduler.get_velocity(x, noise, timesteps)
+        elif self.scheduler.config.prediction_type == "sample":
+            target = x
+        else:
+            raise ValueError(
+                f"Unknown prediction type {self.scheduler.config.prediction_type}"
+            )
+
+        with torch.no_grad():
+            if isinstance(self.te, BaseTextEncoder):
+                normed_embedding, embedding, pooled_embedding, attn_mask = self.te(
+                    tokenizer_outputs
+                )
+            else:
+                normed_embedding, pooled_embedding, *embeddings = self.te(
+                    **tokenizer_outputs[0], return_dict=False, output_hidden_states=True
+                )
+                embedding = embeddings[-1][-1]
+            if self.te_use_normed_ctx:
+                ctx = normed_embedding
+            else:
+                ctx = embedding
+
+        model_output = self.unet(
+            noisy_latent.to(self.dtype),
+            timesteps,
+            encoder_hidden_states=ctx.to(self.dtype),
+            encoder_attention_mask=attn_mask,
+        )[0]
+        loss = F.mse_loss(model_output, target)
+
+        ema_decay = min(self.opt_step / (10 + self.opt_step), self.ema_decay)
+        self.ema_loss = ema_decay * self.ema_loss + (1 - ema_decay) * loss.item()
+        self.opt_step += 1
+
+        if self._trainer is not None:
+            self.log("train/loss", loss.item(), on_step=True, logger=True)
+            self.log(
+                "train/ema_loss",
+                self.ema_loss,
+                on_step=True,
+                logger=True,
+                prog_bar=True,
+            )
+        return loss
+
+
+class FlowTrainer(BaseTrainer):
+    def __init__(
+        self,
+        unet: nn.Module,
+        te: BaseTextEncoder,
+        vae: AutoencoderKL | None = None,
+        unet_compile: bool = False,
+        te_compile: bool = False,
+        vae_compile: bool = False,
+        te_use_normed_ctx: bool = False,
+        te_freeze: bool = True,
+        vae_std: float = 7.5,
+        vae_mean: float = 1.125,
+        lycoris_model: nn.Module | None = None,
+        *args,
+        name: str = "",
+        lr: float = 1e-5,
+        optimizer: type[optim.Optimizer] = optim.AdamW,
+        opt_configs: dict[str, Any] = {
+            "weight_decay": 0.01,
+            "betas": (0.9, 0.999),
+        },
+        lr_sch_configs: dict[str, Any] = {},
+        use_warm_up: bool = True,
+        warm_up_period: int = 1000,
+        full_config: dict[str, Any] = {},
+        **kwargs,
+    ):
+        super(FlowTrainer, self).__init__(
+            name=name,
+            lr=lr,
+            optimizer=optimizer,
+            opt_configs=opt_configs,
+            lr_sch_configs=lr_sch_configs,
+            use_warm_up=use_warm_up,
+            warm_up_period=warm_up_period,
+        )
+        self.save_hyperparameters(
+            ignore=[
+                "unet",
+                "te",
+                "vae",
+                "lycoris_model",
+                "args",
+                "kwargs",
+                "full_config",
+                "opt_configs",
+                "lr_sch_configs",
+            ]
+        )
+
+        if unet_compile:
+            unet = torch.compile(unet)
+
+        if te_compile:
+            te = torch.compile(te)
+
+        if vae_compile and vae is not None:
+            vae = torch.compile(vae)
+
+        if te_freeze:
+            te.requires_grad_(False).eval()
+
+        if vae is not None:
+            vae.requires_grad_(False).eval()
+
+        self.unet = unet
+        self.te = te
+        self.vae = vae
+
+        self.te_use_normed_ctx = te_use_normed_ctx
+        if self.vae is not None:
+            vae_std = self.vae.config["latents_std"]
+            vae_mean = self.vae.config["latents_mean"]
+            self.register_buffer("vae_std", torch.tensor(vae_std).view(1, -1, 1, 1))
+            self.register_buffer("vae_mean", torch.tensor(vae_mean).view(1, -1, 1, 1))
+        else:
+            self.vae_std = vae_std
+            self.vae_mean = vae_mean
+
+        self.lycoris_model = lycoris_model
+
+        self.epoch = 0
+        self.opt_step = 0
+        self.ema_loss = 0
+        self.ema_decay = 0.995
+
+        if lycoris_model is not None:
+            self.lycoris_model.train()
+            self.train_params = self.lycoris_model.parameters()
+        else:
+            self.unet.requires_grad_(True).train()
+            self.train_params = self.unet.parameters()
+
+    def on_train_epoch_end(self) -> None:
+        self.epoch += 1
+        if self.lycoris_model is not None:
+            dir = "./lycoris_weight"
+            epoch = self.epoch
+            if self._trainer is not None:
+                trainer = self._trainer
+                epoch = trainer.current_epoch
+                if len(trainer.loggers) > 0:
+                    if trainer.loggers[0].save_dir is not None:
+                        save_dir = trainer.loggers[0].save_dir
+                    else:
+                        save_dir = trainer.default_root_dir
+                    name = trainer.loggers[0].name
+                    version = trainer.loggers[0].version
+                    version = (
+                        version if isinstance(version, str) else f"version_{version}"
+                    )
+                    dir = os.path.join(save_dir, str(name), version, "lycoris_weight")
+                else:
+                    # if no loggers, use default_root_dir
+                    dir = os.path.join(trainer.default_root_dir, "lycoris_weight")
+            os.makedirs(dir, exist_ok=True)
+            model_weight = {
+                k: v for k, v in self.unet.named_parameters() if v.requires_grad
+            }
+            lycoris_weight = self.lycoris_model.state_dict() | model_weight
+            torch.save(lycoris_weight, os.path.join(dir, f"epoch={epoch}.pt"))
+
+    def training_step(self, batch, idx):
+        x, captions, tokenizer_outputs, pos_map, *addon_info = batch
+
+        if self.vae is not None:
+            if pos_map is not None:
+                pos_map = pos_map.unflatten(1, x.shape[-2:])  # (B, H, W, 2)
+            with torch.no_grad():
+                x = x.to(self.device)
+                x = torch.concat(
+                    [
+                        self.vae.encode(x[i : i + 4]).latent_dist.sample()
+                        for i in range(0, x.shape[0], 4)
+                    ]
+                )
+                x = (x - self.vae_mean) / self.vae_std
+            pos_map = pos_map.permute(0, 3, 1, 2)
+            pos_map = (
+                F.interpolate(pos_map, x.shape[-2:], mode="area")
+                .permute(0, 2, 3, 1)
+                .flatten(1, 2)
+            )
+
+        b, c, h, w = x.shape
+
+        noise = torch.randn_like(x)
+        t = torch.sigmoid(torch.randn(b, 1, 1, 1, device=x.device))
+        noisy_latent = t * noise + (1 - t) * x
+        target = noise - x
+
+        with torch.no_grad():
+            if isinstance(self.te, BaseTextEncoder):
+                normed_embedding, embedding, pooled_embedding, attn_mask = self.te(
+                    tokenizer_outputs
+                )
+            else:
+                normed_embedding, pooled_embedding, *embeddings = self.te(
+                    **tokenizer_outputs[0], return_dict=False, output_hidden_states=True
+                )
+                embedding = embeddings[-1][-1]
+            if self.te_use_normed_ctx:
+                ctx = normed_embedding
+            else:
+                ctx = embedding
+
+        if pooled_embedding is not None:
+            added_cond_kwargs = {
+                "time_ids": torch.tensor([[1024, 1024, 0, 0, 1024, 1024]]).to(
+                    noisy_latent
+                ),
+                "text_embeds": pooled_embedding.to(noisy_latent),
+            }
+        else:
+            added_cond_kwargs = {}
+
+        if len(addon_info) > 0:
+            for addon in addon_info:
+                added_cond_kwargs.update(addon)
+
+        model_output = self.unet(
+            noisy_latent.to(self.dtype),
+            t,
+            encoder_hidden_states=ctx.to(self.dtype),
+            encoder_attention_mask=attn_mask,
+            pos_map=pos_map,
+            added_cond_kwargs=added_cond_kwargs,
+        )[0]
+        loss = F.mse_loss(model_output, target)
+        if torch.isnan(loss):
+            raise ValueError("loss is nan")
+
+        ema_decay = min(self.opt_step / (10 + self.opt_step), self.ema_decay)
+        self.ema_loss = ema_decay * self.ema_loss + (1 - ema_decay) * loss.item()
+        self.opt_step += 1
+
+        if self._trainer is not None:
+            self.log("train/loss", loss.item(), logger=True)
+            self.log(
+                "train/ema_loss",
+                self.ema_loss,
+                logger=True,
+                prog_bar=True,
+            )
+        return loss
diff --git a/pipelines/hdm/hdm/utils/__init__.py b/pipelines/hdm/hdm/utils/__init__.py
new file mode 100644
index 000000000..e590bbd2c
--- /dev/null
+++ b/pipelines/hdm/hdm/utils/__init__.py
@@ -0,0 +1,72 @@
+import importlib
+import omegaconf
+from inspect import isfunction
+from random import shuffle
+
+import torch
+import torch.nn as nn
+
+
+def get_obj_from_str(string, reload=False):
+    module, cls = string.rsplit(".", 1)
+    if reload:
+        module_imp = importlib.import_module(module)
+        importlib.reload(module_imp)
+    return getattr(importlib.import_module(module, package=None), cls)
+
+
+def instantiate(obj):
+    if isinstance(obj, omegaconf.DictConfig):
+        obj = dict(**obj)
+    if isinstance(obj, dict) and "class" in obj:
+        obj_factory = instantiate(obj["class"])
+        if "factory" in obj:
+            obj_factory = getattr(obj_factory, obj["factory"])
+        return obj_factory(*obj.get("args", []), **obj.get("kwargs", {}))
+    if isinstance(obj, str):
+        return get_obj_from_str(obj)
+    return obj
+
+
+def exists(val):
+    return val is not None
+
+
+def uniq(arr):
+    return {el: True for el in arr}.keys()
+
+
+def default(val, d):
+    if val is not None:
+        return val
+    return d() if isfunction(d) else d
+
+
+def zero_module(module: nn.Module):
+    """
+    Zero out the parameters of a module and return it.
+    """
+    for p in module.parameters():
+        p.detach().zero_()
+    return module
+
+
+def random_choice(
+    x: torch.Tensor,
+    num: int,
+):
+    rand_x = list(x)
+    shuffle(rand_x)
+
+    return torch.stack(rand_x[:num])
+
+
+def count_params(model, verbose=False):
+    total_params = sum(p.numel() for p in model.parameters())
+    if verbose:
+        print(f"{model.__class__.__name__} has {total_params * 1.e-6:.2f} M params.")
+    return total_params
+
+
+def remove_none(list_x):
+    return [i for i in list_x if i is not None]
diff --git a/pipelines/hdm/hdm/utils/config.py b/pipelines/hdm/hdm/utils/config.py
new file mode 100644
index 000000000..4fa2e6636
--- /dev/null
+++ b/pipelines/hdm/hdm/utils/config.py
@@ -0,0 +1,34 @@
+import os
+import toml
+import omegaconf
+
+
+def load_train_config(file):
+    config = toml.load(file)
+
+    model = config["model"]
+    model["config"] = omegaconf.OmegaConf.to_container(
+        omegaconf.OmegaConf.load(model["config"]), resolve=True
+    )
+    dataset = config["dataset"]
+    trainer = config["trainer"]
+    lightning = config["lightning"]
+
+    if "logger" in lightning and not lightning["logger"].get("version", None):
+        lightning["logger"]["version"] = os.urandom(4).hex()
+
+    if "scaling_factor" in model and "scaling_factor" not in dataset:
+        dataset["scaling_factor"] = model["scaling_factor"]
+    if "scaling_factor" in dataset and "scaling_factor" not in model:
+        model["scaling_factor"] = dataset["scaling_factor"]
+    if "scaling_factor" not in model and "scaling_factor" not in dataset:
+        model["scaling_factor"] = dataset["scaling_factor"] = 1.0
+
+    if "latent_shift" in model and "latent_shift" not in dataset:
+        dataset["latent_shift"] = model["latent_shift"]
+    if "latent_shift" in dataset and "latent_shift" not in model:
+        model["latent_shift"] = dataset["latent_shift"]
+    if "latent_shift" not in model and "latent_shift" not in dataset:
+        model["latent_shift"] = dataset["latent_shift"] = 0.0
+
+    return model, dataset, trainer, lightning
diff --git a/pipelines/hdm/xut/__init__.py b/pipelines/hdm/xut/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/pipelines/hdm/xut/env.py b/pipelines/hdm/xut/env.py
new file mode 100644
index 000000000..17a0874df
--- /dev/null
+++ b/pipelines/hdm/xut/env.py
@@ -0,0 +1,9 @@
+TORCH_COMPILE = False
+USE_LIGER = True
+USE_VANILLA = True
+USE_XFORMERS = False
+USE_XFORMERS_LAYERS = False
+COMPILE_ARGS = {
+    "mode": "default",
+    "dynamic": True,
+}
diff --git a/pipelines/hdm/xut/modules/__init__.py b/pipelines/hdm/xut/modules/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/pipelines/hdm/xut/modules/adaln.py b/pipelines/hdm/xut/modules/adaln.py
new file mode 100644
index 000000000..a4fb1f0e7
--- /dev/null
+++ b/pipelines/hdm/xut/modules/adaln.py
@@ -0,0 +1,28 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+
+from .norm import RMSNorm
+
+
+class AdaLN(nn.Module):
+    def __init__(self, dim, y_dim, gate=True, norm_layer=RMSNorm, shared=False):
+        super().__init__()
+        self.norm = norm_layer(dim)
+        self.gate = gate
+        if shared:
+            self.adaln = None
+        else:
+            self.adaln = nn.Linear(y_dim, dim * (2 + bool(gate)))
+            nn.init.constant_(self.adaln.bias, 0)
+            nn.init.constant_(self.adaln.weight, 0)
+
+    def forward(self, x, y, shared_adaln=None):
+        if shared_adaln is None:
+            scale, shift, *gate = self.adaln(y).chunk(2 + bool(self.gate), dim=-1)
+        else:
+            scale, shift, *gate = shared_adaln
+        normed_x, _ = self.norm(x)
+        result = normed_x * (scale + 1.0) + shift
+        return result, (gate[0] + 1) if self.gate else 1
diff --git a/pipelines/hdm/xut/modules/attention.py b/pipelines/hdm/xut/modules/attention.py
new file mode 100644
index 000000000..ce39ca9c6
--- /dev/null
+++ b/pipelines/hdm/xut/modules/attention.py
@@ -0,0 +1,332 @@
+import math
+from functools import cache
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+try:
+    import xformers
+
+    XFORMERS_AVAILABLE = True
+except ImportError:
+    XFORMERS_AVAILABLE = False
+if XFORMERS_AVAILABLE:
+    from xformers.ops import memory_efficient_attention
+else:
+    memory_efficient_attention = None
+
+from .. import env
+from ..utils import compile_wrapper
+from .axial_rope import AxialRoPE
+
+
+if not env.USE_XFORMERS:
+    memory_efficient_attention = None
+if env.USE_VANILLA:
+
+    @compile_wrapper
+    def memory_efficient_attention(query, key, value, attn_bias=None, p=0.0):
+        scale = 1.0 / query.shape[-1] ** 0.5
+        query = query * scale
+        query = query.transpose(1, 2)
+        key = key.transpose(1, 2)
+        value = value.transpose(1, 2)
+        attn = query @ key.transpose(-2, -1)
+        if attn_bias is not None:
+            attn = attn + attn_bias
+        attn = attn.softmax(-1)
+        attn = F.dropout(attn, p)
+        attn = attn @ value
+        return attn.transpose(1, 2).contiguous()
+
+
+class SelfAttention(nn.Module):
+    def __init__(self, dim, n_heads=8, head_dim=-1, pos_dim=2):
+        super().__init__()
+        self.dim = dim
+        self.n_heads = n_heads
+        self.head_dim = head_dim if head_dim > 0 else dim // n_heads
+        self.n_heads = dim // self.head_dim
+        assert (
+            self.n_heads * self.head_dim == dim
+        ), "dim must be divisible by n_heads or head_dim"
+
+        self.qkv = nn.Linear(dim, dim * 3, bias=False)
+        self.out = nn.Linear(dim, dim)
+        self.rope = AxialRoPE(self.head_dim, self.n_heads, pos_dim)
+        self.attn = memory_efficient_attention or F.scaled_dot_product_attention
+        self.xformers = memory_efficient_attention is not None
+
+    def forward(self, x, pos_map=None, mask=None):
+        b, n, _, h = *x.shape, self.n_heads
+        q, k, v = self.qkv(x).chunk(3, dim=-1)
+
+        if pos_map is not None:
+            q = self.rope(q.reshape(b, n, h, -1).transpose(1, 2), pos_map)
+            k = self.rope(k.reshape(b, n, h, -1).transpose(1, 2), pos_map)
+            v = v.reshape(b, n, h, -1)
+            if self.xformers:
+                q = q.transpose(1, 2)
+                k = k.transpose(1, 2)
+            else:
+                v = v.transpose(1, 2)
+        else:
+            q, k, v = map(lambda t: t.reshape(b, n, h, -1), (q, k, v))
+            if not self.xformers:
+                q = q.transpose(1, 2)
+                k = k.transpose(1, 2)
+                v = v.transpose(1, 2)
+
+        if mask is not None:
+            if mask.ndim == 2:
+                mask = mask[None, None]
+            elif mask.ndim == 3:
+                mask = mask[:, None]
+            if n % 8 and self.xformers:
+                align_n = math.ceil(n / 8) * 8
+                mask_align = torch.empty(
+                    *mask.shape[:3], align_n, device=mask.device, dtype=mask.dtype
+                )
+                mask_align[..., :n] = mask
+                mask = mask_align.to(q).expand(b, h, n, align_n)[..., :n]
+            else:
+                mask = mask.to(q).expand(b, h, n, n)
+
+        attn = self.attn(q, k, v, mask)
+        if not self.xformers:
+            attn = attn.transpose(1, 2)
+        attn = attn.reshape(b, n, h * self.head_dim)
+        attn = self.out(attn)
+        return attn
+
+
+class CrossAttention(nn.Module):
+    def __init__(self, dim, ctx_dim, n_heads=8, head_dim=-1, pos_dim=2):
+        super().__init__()
+        self.dim = dim
+        self.n_heads = n_heads
+        self.head_dim = head_dim if head_dim > 0 else dim // n_heads
+        self.n_heads = dim // self.head_dim
+        assert (
+            self.n_heads * self.head_dim == dim
+        ), "dim must be divisible by n_heads or head_dim"
+
+        self.q = nn.Linear(dim, dim, bias=False)
+        self.kv = nn.Linear(ctx_dim, dim * 2, bias=False)
+        self.out = nn.Linear(dim, dim)
+        self.rope = AxialRoPE(self.head_dim, self.n_heads, pos_dim)
+        self.attn = memory_efficient_attention or F.scaled_dot_product_attention
+        self.xformers = memory_efficient_attention is not None
+
+    def forward(self, x, ctx, pos_map=None, ctx_pos_map=None, mask=None):
+        b, n, _, h = *x.shape, self.n_heads
+        ctx_n = ctx.shape[1]
+        q = self.q(x)
+        k, v = self.kv(ctx).chunk(2, dim=-1)
+
+        if pos_map is not None:
+            q = self.rope(q.reshape(b, n, h, -1).transpose(1, 2), pos_map)
+            q = q if not self.xformers else q.transpose(1, 2)
+        else:
+            q = q.reshape(b, n, h, -1)
+            q = q if self.xformers else q.transpose(1, 2)
+        if ctx_pos_map is not None:
+            k = self.rope(k.reshape(b, ctx_n, h, -1).transpose(1, 2), ctx_pos_map)
+            k = k if not self.xformers else k.transpose(1, 2)
+        else:
+            k = k.reshape(b, ctx_n, h, -1)
+            k = k if self.xformers else k.transpose(1, 2)
+        v = v.reshape(b, ctx_n, h, -1)
+        v = v if self.xformers else v.transpose(1, 2)
+
+        if mask is not None:
+            if mask.ndim == 2:
+                mask = mask[None, None]
+            elif mask.ndim == 3:
+                mask = mask[:, None]
+            if ctx_n % 8 and self.xformers:
+                align_n = math.ceil(ctx_n / 8) * 8
+                mask_align = torch.empty(
+                    *mask.shape[:3], align_n, device=mask.device, dtype=mask.dtype
+                )
+                mask_align[..., :ctx_n] = mask
+                mask = mask_align.to(q).expand(b, h, n, align_n)[..., :ctx_n]
+            else:
+                mask = mask.to(q).expand(b, h, n, ctx_n)
+
+        attn = self.attn(q, k, v, mask)
+        if not self.xformers:
+            attn = attn.transpose(1, 2)
+        attn = attn.reshape(b, n, h * self.head_dim)
+        attn = self.out(attn)
+        return attn
+
+
+class AttentionPooling(CrossAttention):
+    def __init__(self, dim, n_heads=8, head_dim=-1, pos_dim=2):
+        super().__init__(dim, dim, n_heads, head_dim, pos_dim)
+        self.query_token = nn.Parameter(torch.randn(1, 1, dim) * 1 / dim**0.5)
+
+    def forward(self, x, pos_map=None, mask=None):
+        query = self.query_token.expand(x.shape[0], -1, -1)
+        return super().forward(query, x, None, pos_map, mask).squeeze(1)
+
+
+class AttentiveProbe(CrossAttention):
+    def __init__(self, dim, out_dim, n_heads=8, head_dim=-1, pos_dim=2, n_probes=1):
+        super().__init__(dim, dim, n_heads, head_dim, pos_dim)
+        self.query_token = nn.Parameter(torch.randn(1, n_probes, dim) * 1 / dim**0.5)
+        self.token_proj = nn.Linear(dim * n_probes, out_dim)
+
+    def forward(self, x, pos_map=None, mask=None):
+        query = self.query_token.expand(x.shape[0], -1, -1)
+        output_embedding = super().forward(query, x, None, pos_map, mask)
+        output_embedding = output_embedding.flatten(-2, -1)
+        return self.token_proj(output_embedding)
+
+
+@cache
+def prefix_causal_attention_mask(
+    q_len, kv_len, prefix_len=0, is_self_attn=False, dtype=None, device=None
+):
+    """
+    **Made by claude 3.7 sonnet without thinking**
+    Generate attention masks and biases for transformer models.
+
+    Parameters:
+    -----------
+    q_len : int
+        Length of the query sequence
+    kv_len : int
+        Length of the key/value sequence
+    prefix_len : int, optional
+        Length of the prefix for which we allow full attention (no causal masking)
+        Default: 0 (standard causal mask)
+    is_self_attn : bool, optional
+        Whether this is for self-attention (q_len == kv_len and they represent the same sequence)
+        Enables faster mask generation
+        Default: False
+    dtype : torch.dtype, optional
+        Data type for the output tensors
+        Default: None (will use torch.bool for mask, torch.float for bias)
+    device : torch.device, optional
+        Device on which to create the tensors
+        Default: None (will use the default torch device)
+
+    Returns:
+    --------
+    tuple: (attention_mask, attention_bias)
+        - attention_mask: Boolean tensor of shape (q_len, kv_len) where True values indicate
+          positions that should be attended to
+        - attention_bias: Tensor of same shape with dtype specified (or float), containing
+          0.0 for positions to attend to and -float('inf') for positions to mask out
+    """
+    # Fast path for self-attention with no prefix
+    if is_self_attn and prefix_len == 0:
+        # Simple lower triangular matrix for standard causal self-attention
+        attention_mask = torch.tril(
+            torch.ones(q_len, q_len, dtype=torch.bool, device=device)
+        )
+
+    # Fast path for self-attention with prefix
+    elif is_self_attn and prefix_len > 0:
+        attention_mask = torch.tril(
+            torch.ones(q_len, q_len, dtype=torch.bool, device=device)
+        )
+
+        # Add the prefix part (allow full attention to the prefix)
+        if prefix_len < q_len:
+            # Set the prefix columns to all True (we use indexing which is faster than cat)
+            attention_mask[:, :prefix_len] = True
+
+    # General case for cross-attention or when fast path is not used
+    else:
+        # Create base causal mask (lower triangular)
+        # Each query position i can attend to key positions j where j <= i
+        causal_mask = torch.tril(
+            torch.ones(q_len, kv_len, dtype=torch.bool, device=device)
+        )
+
+        # If there's a prefix, allow full attention within that prefix
+        if prefix_len > 0:
+            # Combine masks:
+            # - For the prefix part of kv, use all True
+            # - For the rest, use causal mask
+            if prefix_len < kv_len:
+                attention_mask = torch.cat(
+                    [
+                        torch.ones(q_len, prefix_len, dtype=torch.bool, device=device),
+                        causal_mask[:, prefix_len:],
+                    ],
+                    dim=1,
+                )
+            else:
+                # If prefix_len >= kv_len, the entire sequence gets full attention
+                attention_mask = torch.ones(
+                    q_len, kv_len, dtype=torch.bool, device=device
+                )
+        else:
+            # Without prefix, just use the causal mask
+            attention_mask = causal_mask
+
+    # Convert boolean mask to attention bias
+    # True -> 0.0, False -> -inf
+    float_dtype = torch.float if dtype is None else dtype
+    attention_bias = torch.zeros_like(attention_mask, dtype=float_dtype, device=device)
+    attention_bias = attention_bias.masked_fill(~attention_mask, float("-inf"))
+
+    return attention_mask, attention_bias
+
+
+# Example usage:
+if __name__ == "__main__":
+    # Standard causal mask for sequence length 6
+    mask, bias = prefix_causal_attention_mask(q_len=6, kv_len=6)
+    print("Standard causal mask:")
+    print(mask)
+    print("\nStandard causal bias:")
+    print(bias)
+
+    # Same with self-attention flag
+    mask_self, bias_self = prefix_causal_attention_mask(
+        q_len=6, kv_len=6, is_self_attn=True
+    )
+    print("\nSelf-attention causal mask (should be identical):")
+    print(mask_self)
+    print("Masks are identical:", torch.all(mask == mask_self).item())
+
+    # Causal mask with prefix_len=3 (first 2 tokens get full attention)
+    mask, bias = prefix_causal_attention_mask(q_len=6, kv_len=6, prefix_len=3)
+    print("\nCausal mask with prefix_len=3:")
+    print(mask)
+    print("\nCausal bias with prefix_len=3:")
+    print(bias)
+
+    # Same with self-attention flag
+    mask_self, bias_self = prefix_causal_attention_mask(
+        q_len=6, kv_len=6, prefix_len=3, is_self_attn=True
+    )
+    print("\nSelf-attention mask with prefix_len=3 (should be identical):")
+    print(mask_self)
+    print("Masks are identical:", torch.all(mask == mask_self).item())
+
+    # Handling different q_len and kv_len (for cross-attention)
+    mask, bias = prefix_causal_attention_mask(q_len=4, kv_len=6, prefix_len=3)
+    print("\nCross-attention mask with q_len=4, kv_len=6, prefix_len=3:")
+    print(mask)
+    print("\nCross-attention bias:")
+    print(bias)
+
+    self_attn = SelfAttention(64, 8).cuda().half()
+    x = torch.randn(1, 16, 64).cuda().half()
+    mask, bias = prefix_causal_attention_mask(
+        16, 16, is_self_attn=True, device=x.device, dtype=x.dtype
+    )
+    test_out = self_attn(x, mask=bias)
+    torch.sum(test_out).backward()
+
+    print(x.shape, mask.shape, bias.shape)
+    print(test_out.shape)
+    print(torch.isnan(test_out).any())
+    print(torch.norm(next(self_attn.parameters()).grad))
diff --git a/pipelines/hdm/xut/modules/axial_rope.py b/pipelines/hdm/xut/modules/axial_rope.py
new file mode 100644
index 000000000..e5cadcd25
--- /dev/null
+++ b/pipelines/hdm/xut/modules/axial_rope.py
@@ -0,0 +1,179 @@
+import math
+from functools import lru_cache
+
+import torch
+from torch import nn
+
+from ..utils import compile_wrapper
+
+
+@compile_wrapper
+def rotate_half(x):
+    x1, x2 = x[..., 0::2], x[..., 1::2]
+    x = torch.stack((-x2, x1), dim=-1)
+    *shape, d, r = x.shape
+    return x.view(*shape, d * r)
+
+
+@compile_wrapper
+def apply_rotary_emb(freqs, t, start_index=0, scale=1.0):
+    freqs = freqs.to(t)
+    rot_dim = freqs.shape[-1]
+    end_index = start_index + rot_dim
+    assert (
+        rot_dim <= t.shape[-1]
+    ), f"feature dimension {t.shape[-1]} is not of sufficient size to rotate in all the positions {rot_dim}"
+    t_left, t, t_right = (
+        t[..., :start_index],
+        t[..., start_index:end_index],
+        t[..., end_index:],
+    )
+    t = (t * freqs.cos() * scale) + (rotate_half(t) * freqs.sin() * scale)
+    return torch.cat((t_left, t, t_right), dim=-1)
+
+
+def centers(start, stop, num, dtype=None, device=None):
+    edges = torch.linspace(start, stop, num + 1, dtype=dtype, device=device)
+    return (edges[:-1] + edges[1:]) / 2
+
+
+def make_grid(h_pos, w_pos):
+    grid = torch.stack(torch.meshgrid(h_pos, w_pos, indexing="ij"), dim=-1)
+    return grid.flatten(0, 1)
+
+
+def bounding_box(h, w, pixel_aspect_ratio=1.0):
+    # Adjusted dimensions
+    w_adj = w
+    h_adj = h * pixel_aspect_ratio
+
+    # Adjusted aspect ratio
+    ar_adj = w_adj / h_adj
+
+    # Determine bounding box based on the adjusted aspect ratio
+    y_min, y_max, x_min, x_max = -1.0, 1.0, -1.0, 1.0
+    if ar_adj > 1:
+        y_min, y_max = -1 / ar_adj, 1 / ar_adj
+    elif ar_adj < 1:
+        x_min, x_max = -ar_adj, ar_adj
+
+    return torch.tensor([y_min, y_max, x_min, x_max])
+
+
+@lru_cache(maxsize=8)
+def make_axial_pos(
+    h, w, pixel_aspect_ratio=1.0, align_corners=False, dtype=None, device=None
+):
+    y_min, y_max, x_min, x_max = bounding_box(h, w, pixel_aspect_ratio)
+    if align_corners:
+        h_pos = torch.linspace(y_min, y_max, h, dtype=dtype, device=device)
+        w_pos = torch.linspace(x_min, x_max, w, dtype=dtype, device=device)
+    else:
+        h_pos = centers(y_min, y_max, h, dtype=dtype, device=device)
+        w_pos = centers(x_min, x_max, w, dtype=dtype, device=device)
+    return make_grid(h_pos, w_pos)
+
+
+def make_axial_pos_no_cache(
+    h, w, pixel_aspect_ratio=1.0, align_corners=False, dtype=None, device=None
+):
+    y_min, y_max, x_min, x_max = bounding_box(h, w, pixel_aspect_ratio)
+    if align_corners:
+        h_pos = torch.linspace(y_min, y_max, h, dtype=dtype, device=device)
+        w_pos = torch.linspace(x_min, x_max, w, dtype=dtype, device=device)
+    else:
+        h_pos = centers(y_min, y_max, h, dtype=dtype, device=device)
+        w_pos = centers(x_min, x_max, w, dtype=dtype, device=device)
+    return make_grid(h_pos, w_pos)
+
+
+def make_cropped_pos(crop_h, crop_w, target_h, target_w):
+    pos_map = make_axial_pos_no_cache(target_h, target_w).unflatten(
+        0, (target_h, target_w)
+    )
+    if target_h > target_w:
+        pos_map = pos_map[crop_h : crop_h + target_w, :]
+    elif target_h < target_w:
+        pos_map = pos_map[:, crop_w : crop_w + target_h]
+    return pos_map.flatten(0, 1)
+
+
+def freqs_pixel(max_freq=10.0):
+    def init(shape):
+        freqs = torch.linspace(1.0, max_freq / 2, shape[-1]) * math.pi
+        return freqs.log().expand(shape)
+
+    return init
+
+
+def freqs_pixel_log(max_freq=10.0):
+    def init(shape):
+        log_min = math.log(math.pi)
+        log_max = math.log(max_freq * math.pi / 2)
+        return torch.linspace(log_min, log_max, shape[-1]).expand(shape)
+
+    return init
+
+
+class AxialRoPE(nn.Module):
+    def __init__(
+        self,
+        dim,
+        n_heads,
+        pos_dim=2,
+        start_index=0,
+        freqs_init=freqs_pixel_log(max_freq=10.0),
+    ):
+        super().__init__()
+        self.n_heads = n_heads
+        self.start_index = start_index
+        log_freqs = freqs_init((n_heads, dim // (2 * pos_dim), 1))
+        self.freqs = nn.Parameter(log_freqs.clone().repeat(1, 1, pos_dim))
+
+    def extra_repr(self):
+        dim = self.freqs.shape[-1]
+        return f"dim={dim}, n_heads={self.n_heads}, start_index={self.start_index}"
+
+    def get_freqs(self, pos):
+        if pos.shape[-1] != self.freqs.shape[-1]:
+            raise ValueError(f"input shape must be (..., {self.freqs.shape[-1]})")
+        freqs = pos[..., None, None, :] * self.freqs.exp()
+        freqs = freqs.flatten(-2, -1).repeat_interleave(2, dim=-1)
+        return freqs.transpose(-2, -3)
+
+    @compile_wrapper
+    def forward(self, x, pos):
+        freqs = self.get_freqs(pos)
+        return apply_rotary_emb(freqs, x, self.start_index)
+
+
+class AdditiveAxialRoPE(AxialRoPE):
+    """
+    https://arxiv.org/abs/2405.10436
+    """
+
+    def __init__(
+        self,
+        dim,
+        n_heads,
+        pos_dim=2,
+        start_index=0,
+        freqs_init=freqs_pixel_log(max_freq=10.0),
+    ):
+        super().__init__(dim, n_heads, pos_dim, start_index, freqs_init)
+        self.emb = nn.Parameter(torch.randn(dim) / dim**0.5)
+
+    def forward(self, x, pos):
+        pos_emb = torch.zeros_like(x)
+        pos_emb = pos_emb + self.emb
+        freqs = self.get_freqs(pos)
+        if x.ndim == 3:
+            pos_emb = pos_emb.unsqueeze(1)
+        return x + apply_rotary_emb(freqs, pos_emb, self.start_index).view(x.shape)
+
+
+if __name__ == "__main__":
+    x = torch.randn(2, 1, 4 * 4, 16)
+    pos = torch.randn(2, 16, 1)
+    model = AxialRoPE(16, 1, 1)
+    print(model(x, pos).shape)
diff --git a/pipelines/hdm/xut/modules/layers.py b/pipelines/hdm/xut/modules/layers.py
new file mode 100644
index 000000000..bf8ec4060
--- /dev/null
+++ b/pipelines/hdm/xut/modules/layers.py
@@ -0,0 +1,56 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+try:
+    import xformers
+
+    XFORMERS_AVAILABLE = True
+except ImportError:
+    XFORMERS_AVAILABLE = False
+
+from .. import env
+from ..utils import compile_wrapper
+
+
+class SwiGLUTorch(nn.Module):
+    def __init__(
+        self, in_features, hidden_features, out_features, bias=True, _pack_weights=True
+    ):
+        super().__init__()
+        self.in_features = in_features
+        self.hidden_features = hidden_features or in_features
+        self.out_features = out_features or in_features
+        if _pack_weights:
+            self.w12 = torch.nn.Linear(in_features, 2 * hidden_features, bias=bias)
+        else:
+            self.w1 = torch.nn.Linear(in_features, hidden_features, bias=bias)
+            self.w2 = torch.nn.Linear(in_features, hidden_features, bias=bias)
+        self.w3 = torch.nn.Linear(hidden_features, out_features, bias=bias)
+
+    @compile_wrapper
+    def forward(self, x):
+        if self.w12 is not None:
+            x1, x2 = self.w12(x).chunk(2, dim=-1)
+        else:
+            x1 = self.w1(x)
+            x2 = self.w2(x)
+        return self.w3(F.silu(x1) * x2)
+
+
+if XFORMERS_AVAILABLE:
+    from xformers.ops import SwiGLU
+else:
+    SwiGLU = SwiGLUTorch
+if not env.USE_XFORMERS_LAYERS:
+    SwiGLU = SwiGLUTorch
+
+
+if __name__ == "__main__":
+    x = torch.randn(2, 16, 128)
+    model1 = SwiGLU(128, 256, 128)
+    model2 = SwiGLUTorch(128, 256, 128)
+
+    model1.load_state_dict(model2.state_dict())
+
+    print(F.mse_loss(model1(x), model2(x)), torch.norm(model1(x)))
diff --git a/pipelines/hdm/xut/modules/norm.py b/pipelines/hdm/xut/modules/norm.py
new file mode 100644
index 000000000..f51b6074a
--- /dev/null
+++ b/pipelines/hdm/xut/modules/norm.py
@@ -0,0 +1,97 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+try:
+    from liger_kernel.transformers.rms_norm import LigerRMSNorm
+except ImportError:
+    LigerRMSNorm = None
+
+from .. import env
+from ..utils import compile_wrapper
+
+
+class DyT(nn.Module):
+    """
+    Transformers without Normalization
+    https://arxiv.org/abs/2503.10622
+    """
+
+    def __init__(self, hidden_size, init_alpha=1.0):
+        super().__init__()
+        self.hidden_size = hidden_size
+        self.in_weight = nn.Parameter(torch.ones(hidden_size) * init_alpha)
+
+    @compile_wrapper
+    def forward(self, hidden_states):
+        hidden_states = torch.tanh(self.in_weight * hidden_states)
+        return hidden_states, 1.0
+
+
+class RMSNormTorch(nn.RMSNorm):
+    def __init__(self, hidden_size, *args, eps=1e-6, offset=0.0, **kwargs):
+        super().__init__((hidden_size,), *args, eps=eps, **kwargs)
+        self.offset = offset
+
+    @compile_wrapper
+    def forward(self, hidden_states):
+        return (
+            F.rms_norm(
+                hidden_states,
+                self.normalized_shape,
+                self.weight + self.offset,
+                self.eps,
+            ),
+            1.0,
+        )
+
+
+if LigerRMSNorm is None or not env.USE_LIGER:
+    RMSNorm = RMSNormTorch
+
+else:
+
+    class RMSNorm(LigerRMSNorm):
+        def __init__(
+            self,
+            hidden_size,
+            eps=1e-6,
+            offset=0.0,
+            casting_mode="llama",
+            init_fn="ones",
+            in_place=True,
+        ):
+            super().__init__(
+                hidden_size,
+                eps=eps,
+                offset=offset,
+                casting_mode=casting_mode,
+                init_fn=init_fn,
+                in_place=in_place,
+            )
+
+        def forward(self, hidden_states):
+            return super().forward(hidden_states), 1.0
+
+
+def Norm(module: nn.Module):
+    module.org_forward = module.forward
+    module.forward = lambda *args, **kwargs: module.org_forward(*args, **kwargs)[0]
+    return module
+
+
+if __name__ == "__main__":
+    if LigerRMSNorm is None:
+        print("LigerRMSNorm is available")
+        exit()
+
+    hidden_size = 512
+    hidden_states = torch.randn(2, hidden_size).cuda()
+
+    norm1 = RMSNorm(hidden_size).cuda()
+    norm2 = RMSNormTorch(hidden_size).cuda()
+
+    nn.init.normal_(norm1.weight)
+    norm2.load_state_dict(norm1.state_dict())
+
+    print(F.mse_loss(norm1(hidden_states)[0], norm2(hidden_states)[0]))
diff --git a/pipelines/hdm/xut/modules/patch.py b/pipelines/hdm/xut/modules/patch.py
new file mode 100644
index 000000000..127ddcb59
--- /dev/null
+++ b/pipelines/hdm/xut/modules/patch.py
@@ -0,0 +1,74 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+
+class PatchEmbed(nn.Module):
+    def __init__(
+        self,
+        patch_size=4,
+        in_channels=3,
+        embed_dim=512,
+        norm_layer=None,
+        flatten=True,
+        bias=True,
+    ):
+        super().__init__()
+        self.patch_size = patch_size
+        self.flatten = flatten
+
+        self.proj = nn.Conv2d(in_channels, embed_dim, patch_size, patch_size, bias=bias)
+        self.norm = nn.Identity() if norm_layer is None else norm_layer(embed_dim)
+
+    def forward(self, x, pos_map=None):
+        b, _, h, w = x.shape
+        x = self.proj(x)
+        b, _, new_h, new_w = x.shape
+        if pos_map is not None:
+            pos_map = (
+                F.interpolate(
+                    pos_map.reshape(b, h, w, -1).permute(0, 3, 1, 2),
+                    (new_h, new_w),
+                    mode="bilinear",
+                    antialias=True,
+                )
+                .permute(0, 2, 3, 1)
+                .flatten(1, 2)
+            )
+        if self.flatten:
+            x = x.flatten(2).transpose(1, 2)
+        x = self.norm(x)
+        return x, pos_map
+
+
+class UnPatch(nn.Module):
+    def __init__(self, patch_size=4, input_dim=512, out_channel=3, proj=True):
+        super().__init__()
+        self.patch_size = patch_size
+        self.c = out_channel
+
+        if proj:
+            self.proj = nn.Linear(input_dim, patch_size**2 * out_channel)
+        else:
+            self.proj = nn.Identity()
+
+    def forward(self, x: torch.Tensor, axis1=None, axis2=None, loss_mask=None):
+        b, n, _ = x.shape
+        p = q = self.patch_size
+        if axis1 is None and axis2 is None:
+            w = h = int(n**0.5)
+            assert h * w == n
+        else:
+            h = axis1 // p if axis1 else n // (axis2 // p)
+            w = axis2 // p if axis2 else n // h
+            assert h * w == n
+
+        x = self.proj(x)
+        if loss_mask is not None:
+            x = torch.where(loss_mask[..., None], x, x.detach())
+        x = (
+            x.reshape(b, h, w, p, q, self.c)
+            .permute(0, 5, 1, 3, 2, 4)
+            .reshape(b, self.c, h * p, w * q)
+        )
+        return x
diff --git a/pipelines/hdm/xut/modules/time_emb.py b/pipelines/hdm/xut/modules/time_emb.py
new file mode 100644
index 000000000..290e75149
--- /dev/null
+++ b/pipelines/hdm/xut/modules/time_emb.py
@@ -0,0 +1,34 @@
+import math
+
+import torch
+import torch.nn as nn
+
+from ..utils import compile_wrapper
+
+
+class TimestepEmbedding(nn.Module):
+    def __init__(self, dim, max_period=10000, time_factor: float = 1000.0):
+        super().__init__()
+        self.dim = dim
+        self.max_period = max_period
+        self.time_factor = time_factor
+        self.register_buffer(
+            "freqs",
+            torch.exp(
+                -math.log(max_period)
+                * torch.arange(start=0, end=dim // 2, dtype=torch.float32)
+                / (dim // 2)
+            )[None],
+        )
+        self.proj = nn.Sequential(nn.Linear(dim, dim), nn.Mish())
+
+    @compile_wrapper
+    def forward(self, t):
+        t = self.time_factor * t
+        args = t[:, None] * self.freqs
+        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
+        if self.dim % 2:
+            embedding = torch.cat(
+                [embedding, torch.zeros_like(embedding[:, :1])], dim=-1
+            )
+        return self.proj(embedding)
diff --git a/pipelines/hdm/xut/modules/transformer.py b/pipelines/hdm/xut/modules/transformer.py
new file mode 100644
index 000000000..4172ee648
--- /dev/null
+++ b/pipelines/hdm/xut/modules/transformer.py
@@ -0,0 +1,79 @@
+import torch.nn as nn
+
+from .layers import SwiGLU
+from .attention import SelfAttention, CrossAttention
+from .norm import RMSNorm
+from .adaln import AdaLN
+
+
+class TransformerBlock(nn.Module):
+    def __init__(
+        self,
+        dim,
+        ctx_dim,
+        heads,
+        dim_head,
+        mlp_dim,
+        pos_dim,
+        use_adaln=False,
+        use_shared_adaln=False,
+        ctx_from_self=False,
+        norm_layer=RMSNorm,
+    ):
+        super().__init__()
+        self.use_adaln = use_adaln
+        self.attn = SelfAttention(dim, heads, dim_head, pos_dim)
+        if ctx_dim is None:
+            self.xattn_pre_norm = None
+            self.xattn = None
+        else:
+            self.ctx_from_self = ctx_from_self
+            self.xattn = CrossAttention(dim, ctx_dim, heads, dim_head, pos_dim)
+        self.mlp = SwiGLU(dim, mlp_dim, dim)
+
+        if self.use_adaln:
+            self.attn_pre_norm = AdaLN(
+                dim, dim, norm_layer=norm_layer, shared=use_shared_adaln
+            )
+            self.mlp_pre_norm = AdaLN(
+                dim, dim, norm_layer=norm_layer, shared=use_shared_adaln
+            )
+            if self.xattn is not None:
+                self.xattn_pre_norm = AdaLN(
+                    dim, dim, norm_layer=norm_layer, shared=use_shared_adaln
+                )
+        else:
+            self.attn_pre_norm = norm_layer(dim)
+            self.mlp_pre_norm = norm_layer(dim)
+            if self.xattn is not None:
+                self.xattn_pre_norm = norm_layer(dim)
+
+    def forward(
+        self,
+        x,
+        ctx,
+        pos_map=None,
+        ctx_pos_map=None,
+        y=None,
+        x_mask=None,
+        ctx_mask=None,
+        shared_adaln=None,
+    ):
+        y = [y] if y is not None else []
+        y = y if shared_adaln is None else [y[0], shared_adaln[0]]
+        x, gate = self.attn_pre_norm(x, *y)
+        x = x + self.attn(x, pos_map, mask=x_mask) * gate
+
+        if self.xattn is not None:
+            if shared_adaln is not None:
+                y[1] = shared_adaln[1]
+            x, gate = self.xattn_pre_norm(x, *y)
+            if self.ctx_from_self:
+                ctx_mask = x_mask
+            x = x + self.xattn(x, ctx, pos_map, ctx_pos_map, mask=ctx_mask) * gate
+
+        if shared_adaln is not None:
+            y[1] = shared_adaln[-1]
+        x, gate = self.mlp_pre_norm(x, *y)
+        x = x + self.mlp(x) * gate
+        return x
diff --git a/pipelines/hdm/xut/utils/__init__.py b/pipelines/hdm/xut/utils/__init__.py
new file mode 100644
index 000000000..2d6a9cb72
--- /dev/null
+++ b/pipelines/hdm/xut/utils/__init__.py
@@ -0,0 +1,23 @@
+import torch
+from .. import env
+
+
+def isiterable(obj):
+    try:
+        iter(obj)
+    except TypeError:
+        return False
+    return True
+
+
+def compile_wrapper(func, **kwargs):
+    kwargs.update(env.COMPILE_ARGS)
+    compiled = torch.compile(func, **kwargs)
+
+    def runner(*args, **kwargs):
+        if env.TORCH_COMPILE:
+            return compiled(*args, **kwargs)
+        else:
+            return func(*args, **kwargs)
+
+    return runner
diff --git a/pipelines/hdm/xut/xut.py b/pipelines/hdm/xut/xut.py
new file mode 100644
index 000000000..2dfdab9dc
--- /dev/null
+++ b/pipelines/hdm/xut/xut.py
@@ -0,0 +1,556 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.utils.checkpoint import checkpoint
+
+from .modules.norm import RMSNorm
+from .modules.transformer import TransformerBlock
+from .modules.patch import PatchEmbed, UnPatch
+from .modules.axial_rope import make_axial_pos
+from .modules.time_emb import TimestepEmbedding
+from .modules.norm import RMSNorm, DyT
+from .utils import isiterable
+
+
+class TBackBone(nn.Module):
+    """
+    Basic backbone of transformer
+    """
+
+    def __init__(
+        self,
+        dim=1024,
+        ctx_dim=1024,
+        heads=16,
+        dim_head=64,
+        mlp_dim=3072,
+        pos_dim=2,
+        depth=8,
+        use_adaln=False,
+        use_shared_adaln=False,
+        use_dyt=False,
+    ):
+        super().__init__()
+        self.blocks = nn.ModuleList(
+            [
+                TransformerBlock(
+                    dim,
+                    ctx_dim,
+                    heads,
+                    dim_head,
+                    mlp_dim,
+                    pos_dim,
+                    use_adaln,
+                    use_shared_adaln,
+                    norm_layer=DyT if use_dyt else RMSNorm,
+                )
+                for _ in range(depth)
+            ]
+        )
+        self.grad_ckpt = False
+
+    def init_weight(self):
+        for param in self.parameters():
+            if param.ndim == 1:
+                nn.init.normal_(param, mean=0.0, std=(1 / param.size(0)) ** 0.5)
+            elif param.ndim == 2:
+                fan_in = param.size(1)
+                nn.init.normal_(param, mean=0.0, std=(1 / fan_in) ** 0.5)
+            elif param.ndim >= 3:
+                fan_out, *fan_ins = param.shape
+                # cumprod
+                fan_in = 1
+                for f in fan_ins:
+                    fan_in *= f
+                nn.init.normal_(param, mean=0.0, std=(1 / fan_in) ** 0.5)
+
+    def forward(
+        self,
+        x,
+        ctx=None,
+        x_mask=None,
+        ctx_mask=None,
+        pos_map=None,
+        y=None,
+        shared_adaln=None,
+    ):
+        if pos_map is not None:
+            assert pos_map.size(1) == x.size(1)
+
+        for block in self.blocks:
+            if self.grad_ckpt:
+                x = checkpoint(
+                    block,
+                    x,
+                    ctx,
+                    pos_map,
+                    None,
+                    y,
+                    x_mask,
+                    ctx_mask,
+                    shared_adaln,
+                    use_reentrant=False,
+                )
+            else:
+                x = block(x, ctx, pos_map, None, y, x_mask, ctx_mask, shared_adaln)
+
+        return x
+
+
+class XUTBackBone(nn.Module):
+    """
+    Basic backbone of cross-U-transformer.
+    """
+
+    def __init__(
+        self,
+        dim=1024,
+        ctx_dim=None,
+        heads=16,
+        dim_head=64,
+        mlp_dim=3072,
+        pos_dim=2,
+        depth=8,
+        enc_blocks=1,
+        dec_blocks=2,
+        dec_ctx=False,
+        use_adaln=False,
+        use_shared_adaln=False,
+        use_dyt=False,
+    ):
+        super().__init__()
+        if isiterable(enc_blocks):
+            enc_blocks = list(enc_blocks)
+            assert len(enc_blocks) == depth
+        else:
+            enc_blocks = [int(enc_blocks)] * depth
+        if isiterable(dec_blocks):
+            dec_blocks = list(dec_blocks)
+            assert len(dec_blocks) == depth
+        else:
+            dec_blocks = [int(dec_blocks)] * depth
+
+        self.enc_blocks = nn.ModuleList()
+        for i in range(depth):
+            blocks = [
+                TransformerBlock(
+                    dim,
+                    ctx_dim,
+                    heads,
+                    dim_head,
+                    mlp_dim,
+                    pos_dim,
+                    use_adaln,
+                    use_shared_adaln,
+                    norm_layer=DyT if use_dyt else RMSNorm,
+                )
+                for _ in range(enc_blocks[i])
+            ]
+            self.enc_blocks.append(nn.ModuleList(blocks))
+
+        self.dec_ctx = dec_ctx
+        self.dec_blocks = nn.ModuleList()
+        for i in range(depth):
+            blocks = [
+                TransformerBlock(
+                    dim,
+                    dim if bid == 0 else ctx_dim if dec_ctx else None,
+                    heads,
+                    dim_head,
+                    mlp_dim,
+                    pos_dim,
+                    use_adaln,
+                    use_shared_adaln,
+                    ctx_from_self=bid == 0,
+                    norm_layer=DyT if use_dyt else RMSNorm,
+                )
+                for bid in range(dec_blocks[i])
+            ]
+            self.dec_blocks.append(nn.ModuleList(blocks))
+
+        self.grad_ckpt = False
+
+    def init_weight(self):
+        for param in self.parameters():
+            if param.ndim == 1:
+                nn.init.normal_(param, mean=0.0, std=(1 / param.size(0)) ** 0.5)
+            elif param.ndim == 2:
+                fan_in = param.size(1)
+                nn.init.normal_(param, mean=0.0, std=(1 / fan_in) ** 0.5)
+            elif param.ndim >= 3:
+                fan_out, *fan_ins = param.shape
+                # cumprod
+                fan_in = 1
+                for f in fan_ins:
+                    fan_in *= f
+                nn.init.normal_(param, mean=0.0, std=(1 / fan_in) ** 0.5)
+
+    def forward(
+        self,
+        x,
+        ctx=None,
+        x_mask=None,
+        ctx_mask=None,
+        pos_map=None,
+        y=None,
+        shared_adaln=None,
+        return_enc_out=False,
+    ):
+        if pos_map is not None:
+            assert pos_map.size(1) == x.size(1)
+
+        self_ctx = []
+        for blocks in self.enc_blocks:
+            for block in blocks:
+                if self.grad_ckpt:
+                    x = checkpoint(
+                        block,
+                        x,
+                        ctx,
+                        pos_map,
+                        None,
+                        y,
+                        x_mask,
+                        ctx_mask,
+                        shared_adaln,
+                        use_reentrant=False,
+                    )
+                else:
+                    x = block(x, ctx, pos_map, None, y, x_mask, ctx_mask, shared_adaln)
+            self_ctx.append(x)
+        enc_out = x
+
+        for blocks in self.dec_blocks:
+            first_block = blocks[0]
+            if self.grad_ckpt:
+                x = checkpoint(
+                    first_block,
+                    x,
+                    self_ctx[-1],
+                    pos_map,
+                    pos_map,
+                    y,
+                    x_mask,
+                    ctx_mask,
+                    shared_adaln,
+                    use_reentrant=False,
+                )
+            else:
+                x = first_block(
+                    x, self_ctx[-1], pos_map, pos_map, y, x_mask, ctx_mask, shared_adaln
+                )
+
+            for block in blocks[1:]:
+                if self.grad_ckpt:
+                    x = checkpoint(
+                        block,
+                        x,
+                        ctx if self.dec_ctx else None,
+                        pos_map,
+                        None,
+                        y,
+                        x_mask,
+                        ctx_mask,
+                        shared_adaln,
+                        use_reentrant=False,
+                    )
+                else:
+                    x = block(
+                        x,
+                        ctx if self.dec_ctx else None,
+                        pos_map,
+                        None,
+                        y,
+                        x_mask,
+                        ctx_mask,
+                        shared_adaln,
+                    )
+
+        if return_enc_out:
+            return x, enc_out
+        return x
+
+
+class XUDiT(nn.Module):
+    """
+    Xross-U-Transformer for Image Gen (XUDiT).
+    """
+
+    def __init__(
+        self,
+        patch_size=2,
+        input_dim=4,
+        dim=1024,
+        ctx_dim=1024,
+        ctx_size=256,
+        heads=16,
+        dim_head=64,
+        mlp_dim=3072,
+        depth=8,
+        enc_blocks=1,
+        dec_blocks=2,
+        dec_ctx=False,
+        class_cond=0,
+        shared_adaln=True,
+        concat_ctx=True,
+        use_dyt=False,
+        double_t=False,
+        addon_info_embs_dim=None,
+        tread_config=None,
+    ):
+        super().__init__()
+        self.backbone = XUTBackBone(
+            dim,
+            None if concat_ctx else ctx_dim,
+            heads,
+            dim_head,
+            mlp_dim,
+            2,
+            depth,
+            enc_blocks,
+            dec_blocks,
+            use_adaln=True,
+            use_shared_adaln=shared_adaln,
+            dec_ctx=dec_ctx,
+            use_dyt=use_dyt,
+        )
+
+        self.use_tread = False
+        if tread_config is not None:
+            self.use_tread = True
+            self.dropout_ratio = tread_config["dropout_ratio"]
+            self.prev_tread_trns = TBackBone(
+                dim,
+                None if concat_ctx else ctx_dim,
+                heads,
+                dim_head,
+                mlp_dim,
+                2,
+                tread_config["prev_trns_depth"],
+                use_adaln=True,
+                use_shared_adaln=shared_adaln,
+                use_dyt=use_dyt,
+            )
+            self.post_tread_trns = TBackBone(
+                dim,
+                None if concat_ctx else ctx_dim,
+                heads,
+                dim_head,
+                mlp_dim,
+                2,
+                tread_config["post_trns_depth"],
+                use_adaln=True,
+                use_shared_adaln=shared_adaln,
+                use_dyt=use_dyt,
+            )
+
+        self.patch_size = patch_size
+        self.in_patch = PatchEmbed(patch_size, input_dim, dim)
+        self.out_patch = UnPatch(patch_size, dim, input_dim)
+        self.time_emb = TimestepEmbedding(dim)
+        if double_t:
+            self.r_emb = TimestepEmbedding(dim)
+        if shared_adaln:
+            self.shared_adaln_attn = nn.Sequential(
+                nn.LayerNorm(dim),
+                nn.Linear(dim, dim * 4),
+                nn.Mish(),
+                nn.Linear(dim * 4, dim * 3),
+            )
+            nn.init.constant_(self.shared_adaln_attn[-1].bias, 0)
+            nn.init.constant_(self.shared_adaln_attn[-1].weight, 0)
+            self.shared_adaln_xattn = nn.Sequential(
+                nn.LayerNorm(dim),
+                nn.Linear(dim, dim * 4),
+                nn.Mish(),
+                nn.Linear(dim * 4, dim * 3),
+            )
+            nn.init.constant_(self.shared_adaln_xattn[-1].bias, 0)
+            nn.init.constant_(self.shared_adaln_xattn[-1].weight, 0)
+            self.shared_adaln_ffw = nn.Sequential(
+                nn.LayerNorm(dim),
+                nn.Linear(dim, dim * 4),
+                nn.Mish(),
+                nn.Linear(dim * 4, dim * 3),
+            )
+            nn.init.constant_(self.shared_adaln_ffw[-1].bias, 0)
+            nn.init.constant_(self.shared_adaln_ffw[-1].weight, 0)
+        if class_cond > 0:
+            self.class_token = nn.Embedding(class_cond, dim)
+        else:
+            self.class_token = None
+        if concat_ctx and ctx_dim is not None:
+            self.ctx_proj = nn.Linear(ctx_dim, dim)
+        else:
+            self.ctx_proj = None
+        if addon_info_embs_dim is not None:
+            self.addon_info_embs_proj = nn.Sequential(
+                nn.Linear(addon_info_embs_dim, dim), nn.Mish(), nn.Linear(dim, dim)
+            )
+            nn.init.constant_(self.addon_info_embs_proj[-1].bias, 0)
+            nn.init.constant_(self.addon_info_embs_proj[-1].weight, 0)
+
+        self.concat_ctx = concat_ctx
+        self.shared_adaln = shared_adaln
+        self.need_ctx = ctx_dim is not None
+        self.ctx_dim = ctx_dim
+        self.ctx_size = ctx_size
+        self.grad_ckpt = False
+        self.init_weight()
+
+    def init_weight(self):
+        if isinstance(self.out_patch.proj, nn.Linear):
+            nn.init.normal_(
+                self.out_patch.proj.weight,
+                mean=0.0,
+                std=1 / self.out_patch.proj.in_features**2,
+            )
+
+    def set_grad_ckpt(self, grad_ckpt):
+        self.backbone.grad_ckpt = grad_ckpt
+        self.grad_ckpt = grad_ckpt
+        if self.use_tread:
+            self.prev_tread_trns.grad_ckpt = grad_ckpt
+            self.post_tread_trns.grad_ckpt = grad_ckpt
+
+    def forward(
+        self,
+        x,
+        t,
+        ctx=None,
+        pos_map=None,
+        r=None,
+        addon_info=None,
+        tread_rate=None,
+        return_enc_out=False,
+    ):
+        n, c, h, w = x.size()
+        t = t.reshape(n, -1)
+        x, pos_map = self.in_patch(x, pos_map)
+        x = x.contiguous()
+        if pos_map is None:
+            pos_map = (
+                make_axial_pos(
+                    h // self.patch_size,
+                    w // self.patch_size,
+                    dtype=x.dtype,
+                    device=x.device,
+                )
+                .unsqueeze(0)
+                .expand(n, -1, -1)
+            )
+        t_emb = self.time_emb(t)
+        if r is not None:
+            t_emb = t_emb + self.r_emb((t - r.reshape(n, -1)))
+        if self.class_token is not None and ctx is not None:
+            if ctx.ndim == 1:
+                ctx = ctx[:, None]
+            t_emb = t_emb + self.class_token(ctx)
+            ctx = None
+        if addon_info is not None:
+            if addon_info.ndim == 1:
+                # [B] -> [B, 1] for single value info
+                addon_info = addon_info[:, None]
+            # [B, D] -> [B, 1, D] for t_emb shape
+            addon_embs = self.addon_info_embs_proj(addon_info)[:, None]
+            t_emb = t_emb + addon_embs
+        if ctx == None and self.need_ctx:
+            ctx = torch.zeros(n, self.ctx_size, self.ctx_dim, device=x.device)
+
+        if self.shared_adaln:
+            shared_adaln_state = [
+                self.shared_adaln_attn(t_emb).chunk(3, dim=-1),
+                self.shared_adaln_xattn(t_emb).chunk(3, dim=-1),
+                self.shared_adaln_ffw(t_emb).chunk(3, dim=-1),
+            ]
+        else:
+            shared_adaln_state = None
+
+        length = x.size(1)
+        if self.ctx_proj is not None:
+            ctx = self.ctx_proj(ctx)
+            x = torch.cat([x, ctx], dim=1)
+            if pos_map is not None:
+                pos_map = torch.cat(
+                    [
+                        pos_map,
+                        torch.zeros(n, ctx.size(1), pos_map.size(2), device=x.device),
+                    ],
+                    dim=1,
+                )
+            ctx = None
+
+        if self.use_tread:
+            x = self.prev_tread_trns(
+                x,
+                ctx=ctx,
+                pos_map=pos_map,
+                y=t_emb,
+                shared_adaln=shared_adaln_state,
+            )
+            if self.training or tread_rate is not None:
+                xt_selection_length = selection_length = length - int(
+                    length * (tread_rate or self.dropout_ratio)
+                )
+                selection = torch.stack(
+                    [
+                        torch.randperm(length, device=x.device) < selection_length
+                        for _ in range(n)
+                    ]
+                )
+                if self.ctx_proj is not None:
+                    ctx_length = x.size(1) - length
+                    selection = torch.concat(
+                        [
+                            selection,
+                            torch.ones(
+                                n, ctx_length, device=x.device, dtype=torch.bool
+                            ),
+                        ],
+                        dim=1,
+                    )
+                    selection_length += ctx_length
+                full_length = x.size(1)
+                not_masked_part = x[~selection, :]
+                masked_part = x[selection, :].unflatten(0, (n, selection_length))
+                x = masked_part
+                raw_pos_map = pos_map
+                pos_map = pos_map[selection, :].unflatten(0, (n, selection_length))
+        backbone_out = self.backbone(
+            x,
+            ctx=ctx,
+            pos_map=pos_map,
+            y=t_emb,
+            shared_adaln=shared_adaln_state,
+            return_enc_out=return_enc_out,
+        )
+        if return_enc_out:
+            backbone_out, enc_out = backbone_out
+        if self.use_tread:
+            if self.training or tread_rate is not None:
+                out = torch.empty(
+                    n, full_length, x.size(2), device=x.device, dtype=x.dtype
+                )
+                out[~selection, :] = not_masked_part
+                out[selection, :] = backbone_out.flatten(0, 1)
+                pos_map = raw_pos_map
+            else:
+                out = backbone_out
+            out = self.post_tread_trns(
+                out,
+                ctx=ctx,
+                pos_map=pos_map,
+                y=t_emb,
+                shared_adaln=shared_adaln_state,
+            )
+        else:
+            out = backbone_out
+        out = out[:, :length]
+        out = self.out_patch(out, h, w)
+
+        if return_enc_out:
+            length = (
+                xt_selection_length if self.use_tread and self.training else full_length
+            )
+            return out, enc_out[:, :length]
+        return out
diff --git a/pipelines/model_bria.py b/pipelines/model_bria.py
index 900c58a2a..7baca470f 100644
--- a/pipelines/model_bria.py
+++ b/pipelines/model_bria.py
@@ -2,7 +2,7 @@ import os
 import sys
 import transformers
 import diffusers
-from modules import shared, devices, sd_models, model_quant, sd_hijack_te
+from modules import shared, devices, sd_models, model_quant, sd_hijack_te, sd_hijack_vae
 from pipelines import generic
 
 
@@ -33,11 +33,8 @@ def load_bria(checkpoint_info, diffusers_load_config={}):
 
     del text_encoder
     del transformer
-
     sd_hijack_te.init_hijack(pipe)
-    from modules.video_models import video_vae
-    pipe.vae.orig_decode = pipe.vae.decode
-    pipe.vae.decode = video_vae.hijack_vae_decode
+    sd_hijack_vae.init_hijack(pipe)
 
     devices.torch_gc()
     return pipe
diff --git a/pipelines/model_chroma.py b/pipelines/model_chroma.py
index adac997a8..69a1afab1 100644
--- a/pipelines/model_chroma.py
+++ b/pipelines/model_chroma.py
@@ -27,5 +27,6 @@ def load_chroma(checkpoint_info, diffusers_load_config={}):
     del text_encoder
     del transformer
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_cogview.py b/pipelines/model_cogview.py
index d3ac6f274..544368c26 100644
--- a/pipelines/model_cogview.py
+++ b/pipelines/model_cogview.py
@@ -48,5 +48,6 @@ def load_cogview4(checkpoint_info, diffusers_load_config={}):
     sd_hijack_te.init_hijack(pipe)
     del transformer
     del text_encoder
+
     devices.torch_gc()
     return pipe
diff --git a/pipelines/model_cosmos.py b/pipelines/model_cosmos.py
index 839c2d9c5..0004b09ce 100644
--- a/pipelines/model_cosmos.py
+++ b/pipelines/model_cosmos.py
@@ -1,6 +1,6 @@
 import transformers
 import diffusers
-from modules import shared, devices, sd_models, model_quant, sd_hijack_te
+from modules import shared, devices, sd_models, model_quant, sd_hijack_te, sd_hijack_vae
 from pipelines import generic
 
 
@@ -29,9 +29,7 @@ def load_cosmos_t2i(checkpoint_info, diffusers_load_config={}):
     del transformer
 
     sd_hijack_te.init_hijack(pipe)
-    from modules.video_models import video_vae
-    pipe.vae.orig_decode = pipe.vae.decode
-    pipe.vae.decode = video_vae.hijack_vae_decode
+    sd_hijack_vae.init_hijack(pipe)
 
     devices.torch_gc()
     return pipe
diff --git a/pipelines/model_flex.py b/pipelines/model_flex.py
index 9b40ba1ec..cb33e325a 100644
--- a/pipelines/model_flex.py
+++ b/pipelines/model_flex.py
@@ -29,5 +29,6 @@ def load_flex(checkpoint_info, diffusers_load_config={}):
     del text_encoder_2
     del transformer
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc()
     return pipe
diff --git a/pipelines/model_flite.py b/pipelines/model_flite.py
index 57577385f..3ef092b0b 100644
--- a/pipelines/model_flite.py
+++ b/pipelines/model_flite.py
@@ -32,5 +32,6 @@ def load_flite(checkpoint_info, diffusers_load_config={}):
     del text_encoder
     del dit_model
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc()
     return pipe
diff --git a/pipelines/model_flux.py b/pipelines/model_flux.py
index ccede0948..c5cc50eb7 100644
--- a/pipelines/model_flux.py
+++ b/pipelines/model_flux.py
@@ -7,7 +7,7 @@ from pipelines import generic
 
 def load_flux(checkpoint_info, diffusers_load_config={}):
     repo_id = sd_models.path_to_repo(checkpoint_info)
-    sd_models.hf_auth_check(checkpoint_info)
+    sd_models.hf_auth_check(checkpoint_info, force=True)
 
     if 'Fill' in repo_id:
         cls_name = diffusers.FluxFillPipeline
@@ -38,13 +38,9 @@ def load_flux(checkpoint_info, diffusers_load_config={}):
     transformer = None
     text_encoder_2 = None
 
-    # handle transformer svdquant if available, t5 is handled inside load_text_encoder
-    prequantized = model_quant.get_quant(checkpoint_info.path)
-    if model_quant.check_nunchaku('Model'):
-        from pipelines.flux.flux_nunchaku import load_flux_nunchaku
-        transformer = load_flux_nunchaku(repo_id)
     # handle prequantized models
-    elif prequantized == 'nf4':
+    prequantized = model_quant.get_quant(checkpoint_info.path)
+    if prequantized == 'nf4':
         from pipelines.flux.flux_nf4 import load_flux_nf4
         transformer, text_encoder_2 = load_flux_nf4(checkpoint_info)
     elif prequantized == 'qint8' or prequantized == 'qint4':
@@ -54,6 +50,11 @@ def load_flux(checkpoint_info, diffusers_load_config={}):
         from pipelines.flux.flux_bnb import load_flux_bnb
         transformer = load_flux_bnb(checkpoint_info, diffusers_load_config)
 
+    # handle transformer svdquant if available, t5 is handled inside load_text_encoder
+    if transformer is None and model_quant.check_nunchaku('Model'):
+        from pipelines.flux.flux_nunchaku import load_flux_nunchaku
+        transformer = load_flux_nunchaku(repo_id)
+
     # finally load transformer and text encoder if not already loaded
     if transformer is None:
         transformer = generic.load_transformer(repo_id, cls_name=diffusers.FluxTransformer2DModel, load_config=diffusers_load_config)
@@ -83,5 +84,6 @@ def load_flux(checkpoint_info, diffusers_load_config={}):
         apply_cache_on_pipe(pipe, residual_diff_threshold=0.12)
 
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_hdm.py b/pipelines/model_hdm.py
new file mode 100644
index 000000000..6b6348dfb
--- /dev/null
+++ b/pipelines/model_hdm.py
@@ -0,0 +1,31 @@
+import sys
+import torch
+import diffusers
+from modules import shared, devices, sd_models, errors
+
+
+def load_hdm(checkpoint_info, diffusers_load_config={}): # pylint: disable=unused-argument
+    repo_id = sd_models.path_to_repo(checkpoint_info)
+    sd_models.hf_auth_check(checkpoint_info)
+
+    try:
+        devices.dtype = torch.float16
+        diffusers_load_config['torch_dtype'] = torch.float16
+        torch.set_float32_matmul_precision("high")
+        from pipelines.hdm import hdm
+        sys.modules['hdm'] = hdm
+        from pipelines.hdm.hdm.pipeline import HDMXUTPipeline
+        diffusers.HDMXUTPipeline = HDMXUTPipeline
+        pipe = diffusers.HDMXUTPipeline.from_pretrained(
+            repo_id,
+            cache_dir=shared.opts.diffusers_dir,
+            trust_remote_code=True,
+            **diffusers_load_config,
+        ).to(devices.device)
+    except Exception as e:
+        shared.log.error(f'Load HDM-XUT: path="{checkpoint_info.path}" {e}')
+        errors.display(e, 'hdm')
+        return None
+
+    devices.torch_gc(force=True, reason='load')
+    return pipe
diff --git a/pipelines/model_hidream.py b/pipelines/model_hidream.py
index bc2acb07c..7f49997fc 100644
--- a/pipelines/model_hidream.py
+++ b/pipelines/model_hidream.py
@@ -71,5 +71,6 @@ def load_hidream(checkpoint_info, diffusers_load_config={}):
     del tokenizer_4
     del transformer
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc()
     return pipe
diff --git a/pipelines/model_hunyuandit.py b/pipelines/model_hunyuandit.py
index 87b74eca8..42ea4db21 100644
--- a/pipelines/model_hunyuandit.py
+++ b/pipelines/model_hunyuandit.py
@@ -1,6 +1,6 @@
 import transformers
 import diffusers
-from modules import shared, sd_models, devices, model_quant, sd_hijack_te
+from modules import shared, sd_models, devices, model_quant
 from pipelines import generic
 
 
@@ -8,6 +8,11 @@ def load_hunyuandit(checkpoint_info, diffusers_load_config={}):
     repo_id = sd_models.path_to_repo(checkpoint_info)
     sd_models.hf_auth_check(checkpoint_info)
 
+    # import torch # override for hunyuandit
+    # devices.dtype = torch.float16
+    # devices.dtype_vae = torch.float16
+    # devices.dtype_unet = torch.float16
+    # diffusers_load_config['torch_dtype'] = devices.dtype
     load_args, _quant_args = model_quant.get_dit_args(diffusers_load_config)
     shared.log.debug(f'Load model: type=HunyuanDiT repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={load_args}')
 
@@ -19,12 +24,15 @@ def load_hunyuandit(checkpoint_info, diffusers_load_config={}):
         repo_id,
         transformer=transformer,
         text_encoder_2=text_encoder_2,
+        safety_checker=None,
+        feature_extractor=None,
         cache_dir=shared.opts.diffusers_dir,
         **load_args,
     )
 
     del text_encoder_2
     del transformer
-    sd_hijack_te.init_hijack(pipe)
+    # sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_kandinsky.py b/pipelines/model_kandinsky.py
index 0d5ad0013..a5f238d0c 100644
--- a/pipelines/model_kandinsky.py
+++ b/pipelines/model_kandinsky.py
@@ -1,6 +1,6 @@
 import transformers
 import diffusers
-from modules import shared, sd_models, devices, model_quant, sd_hijack_te
+from modules import shared, sd_models, devices, model_quant, sd_hijack_te, sd_hijack_vae
 from pipelines import generic
 
 
@@ -44,7 +44,7 @@ def load_kandinsky3(checkpoint_info, diffusers_load_config={}):
     shared.log.debug(f'Load model: type=Kandinsky30 repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={load_args}')
 
     unet = generic.load_transformer(repo_id, cls_name=diffusers.Kandinsky3UNet, load_config=diffusers_load_config, subfolder="unet", variant="fp16")
-    text_encoder = generic.load_text_encoder(repo_id, cls_name=transformers.T5EncoderModel, load_config=diffusers_load_config, subfolder="text_encoder", variant="fp16")
+    text_encoder = generic.load_text_encoder(repo_id, cls_name=transformers.T5EncoderModel, load_config=diffusers_load_config, subfolder="text_encoder", variant="fp16", allow_shared=False)
 
     pipe = diffusers.Kandinsky3Pipeline.from_pretrained(
         repo_id,
@@ -61,5 +61,7 @@ def load_kandinsky3(checkpoint_info, diffusers_load_config={}):
     del text_encoder
     del unet
     sd_hijack_te.init_hijack(pipe)
+    sd_hijack_vae.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_kolors.py b/pipelines/model_kolors.py
index a432e9020..fe95e2b4a 100644
--- a/pipelines/model_kolors.py
+++ b/pipelines/model_kolors.py
@@ -1,14 +1,16 @@
 import torch
 import diffusers
-from modules import shared, devices, sd_hijack_te
+from modules import shared, devices, sd_models, sd_hijack_te
 
 
-def load_kolors(_checkpoint_info, diffusers_load_config={}):
+def load_kolors(checkpoint_info, diffusers_load_config={}):
+    repo_id = sd_models.path_to_repo(checkpoint_info)
+    sd_models.hf_auth_check(checkpoint_info)
+
     diffusers_load_config['variant'] = "fp16"
     if 'torch_dtype' not in diffusers_load_config:
         diffusers_load_config['torch_dtype'] = torch.float16
 
-    repo_id = 'Kwai-Kolors/Kolors-diffusers'
     shared.log.debug(f'Load model: type=Kolors repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={diffusers_load_config}')
     pipe = diffusers.KolorsPipeline.from_pretrained(
         repo_id,
@@ -16,6 +18,8 @@ def load_kolors(_checkpoint_info, diffusers_load_config={}):
         **diffusers_load_config,
     )
     pipe.vae.config.force_upcast = True
+
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_lumina.py b/pipelines/model_lumina.py
index 430c1178b..060cd3f0b 100644
--- a/pipelines/model_lumina.py
+++ b/pipelines/model_lumina.py
@@ -1,12 +1,15 @@
 import transformers
 import diffusers
-from modules import shared, sd_models, sd_hijack_te, devices, modelloader, model_quant
+from modules import shared, sd_models, sd_hijack_te, devices, model_quant
 from pipelines import generic
 
 
-def load_lumina(_checkpoint_info, diffusers_load_config={}):
-    modelloader.hf_login()
+def load_lumina(checkpoint_info, diffusers_load_config={}):
+    repo_id = sd_models.path_to_repo(checkpoint_info)
+    sd_models.hf_auth_check(checkpoint_info)
+
     load_config, _quant_config = model_quant.get_dit_args(diffusers_load_config, allow_quant=False)
+    shared.log.debug(f'Load model: type=LuminaSFT repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={diffusers_load_config}')
     pipe = diffusers.LuminaText2ImgPipeline.from_pretrained(
         'Alpha-VLLM/Lumina-Next-SFT-diffusers',
         cache_dir = shared.opts.diffusers_dir,
@@ -19,12 +22,14 @@ def load_lumina(_checkpoint_info, diffusers_load_config={}):
 
 def load_lumina2(checkpoint_info, diffusers_load_config={}):
     repo_id = sd_models.path_to_repo(checkpoint_info)
+    sd_models.hf_auth_check(checkpoint_info)
 
     if shared.opts.teacache_enabled:
         from modules import teacache
         shared.log.debug(f'Transformers cache: type=teacache patch=forward cls={diffusers.Lumina2Transformer2DModel.__name__}')
         diffusers.Lumina2Transformer2DModel.forward = teacache.teacache_lumina2_forward # patch must be done before transformer is loaded
 
+    shared.log.debug(f'Load model: type=Lumina2 repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={diffusers_load_config}')
     transformer = generic.load_transformer(repo_id, cls_name=diffusers.Lumina2Transformer2DModel, load_config=diffusers_load_config)
     text_encoder = generic.load_text_encoder(repo_id, cls_name=transformers.Gemma2Model, load_config=diffusers_load_config)
 
@@ -40,5 +45,6 @@ def load_lumina2(checkpoint_info, diffusers_load_config={}):
     del transformer
     del text_encoder
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_meissonic.py b/pipelines/model_meissonic.py
index 4c1643a4a..165732b9f 100644
--- a/pipelines/model_meissonic.py
+++ b/pipelines/model_meissonic.py
@@ -1,6 +1,6 @@
 import transformers
 import diffusers
-from modules import shared, devices, modelloader, sd_models, shared_items, sd_hijack_te
+from modules import shared, devices, sd_models, shared_items, sd_hijack_te
 
 
 def load_meissonic(checkpoint_info, diffusers_load_config={}):
@@ -11,36 +11,40 @@ def load_meissonic(checkpoint_info, diffusers_load_config={}):
     from pipelines.meissonic.pipeline_inpaint import MeissonicInpaintPipeline
     shared_items.pipelines['Meissonic'] = MeissonicPipeline
 
-    modelloader.hf_login()
-    fn = sd_models.path_to_repo(checkpoint_info)
-    cache_dir = shared.opts.diffusers_dir
+    repo_id = sd_models.path_to_repo(checkpoint_info)
+    sd_models.hf_auth_check(checkpoint_info)
 
     diffusers_load_config['variant'] = 'fp16'
     diffusers_load_config['trust_remote_code'] = True
 
+    shared.log.debug(f'Load model: type=Meissonic repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={diffusers_load_config}')
     model = TransformerMeissonic.from_pretrained(
-        fn,
+        repo_id,
         subfolder="transformer",
-        cache_dir=cache_dir,
+        cache_dir=shared.opts.diffusers_dir,
         **diffusers_load_config,
     )
     vqvae = diffusers.VQModel.from_pretrained(
-        fn,
+        repo_id,
         subfolder="vqvae",
-        cache_dir=cache_dir,
+        cache_dir=shared.opts.diffusers_dir,
         **diffusers_load_config,
     )
     text_encoder = transformers.CLIPTextModelWithProjection.from_pretrained(
-        fn,
+        repo_id,
         subfolder="text_encoder",
-        cache_dir=cache_dir,
+        cache_dir=shared.opts.diffusers_dir,
     )
     tokenizer = transformers.CLIPTokenizer.from_pretrained(
-        fn,
+        repo_id,
         subfolder="tokenizer",
-        cache_dir=cache_dir,
+        cache_dir=shared.opts.diffusers_dir,
+    )
+    scheduler = MeissonicScheduler.from_pretrained(
+        repo_id,
+        subfolder="scheduler",
+        cache_dir=shared.opts.diffusers_dir,
     )
-    scheduler = MeissonicScheduler.from_pretrained(fn, subfolder="scheduler", cache_dir=cache_dir)
     pipe = MeissonicPipeline(
             vqvae=vqvae.to(devices.dtype),
             text_encoder=text_encoder.to(devices.dtype),
@@ -53,5 +57,6 @@ def load_meissonic(checkpoint_info, diffusers_load_config={}):
     diffusers.pipelines.auto_pipeline.AUTO_IMAGE2IMAGE_PIPELINES_MAPPING["meissonic"] = MeissonicImg2ImgPipeline
     diffusers.pipelines.auto_pipeline.AUTO_INPAINT_PIPELINES_MAPPING["meissonic"] = MeissonicInpaintPipeline
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_omnigen.py b/pipelines/model_omnigen.py
index 39dff7347..6cff7de6f 100644
--- a/pipelines/model_omnigen.py
+++ b/pipelines/model_omnigen.py
@@ -4,9 +4,10 @@ from modules import shared, devices, sd_models, model_quant, sd_hijack_te
 
 def load_omnigen(checkpoint_info, diffusers_load_config={}): # pylint: disable=unused-argument
     repo_id = sd_models.path_to_repo(checkpoint_info)
-    vae = None
+    sd_models.hf_auth_check(checkpoint_info)
 
     load_config, quant_config = model_quant.get_dit_args(diffusers_load_config, module='Model')
+    shared.log.debug(f'Load model: type=OmniGen repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={diffusers_load_config}')
     transformer = diffusers.OmniGenTransformer2DModel.from_pretrained(
         repo_id,
         subfolder="transformer",
@@ -16,8 +17,6 @@ def load_omnigen(checkpoint_info, diffusers_load_config={}): # pylint: disable=u
     )
 
     load_config, quant_config = model_quant.get_dit_args(diffusers_load_config, allow_quant=False)
-    if vae is not None:
-        load_config['vae'] = vae
     pipe = diffusers.OmniGenPipeline.from_pretrained(
         repo_id,
         transformer=transformer,
@@ -26,5 +25,53 @@ def load_omnigen(checkpoint_info, diffusers_load_config={}): # pylint: disable=u
     )
 
     sd_hijack_te.init_hijack(pipe)
+
+    devices.torch_gc(force=True, reason='load')
+    return pipe
+
+
+def load_omnigen2(checkpoint_info, diffusers_load_config={}): # pylint: disable=unused-argument
+    repo_id = sd_models.path_to_repo(checkpoint_info)
+    sd_models.hf_auth_check(checkpoint_info)
+
+    from pipelines.omnigen2 import OmniGen2Pipeline, OmniGen2Transformer2DModel, Qwen2_5_VLForConditionalGeneration
+    diffusers.OmniGen2Pipeline = OmniGen2Pipeline # monkey-pathch
+    diffusers.pipelines.auto_pipeline.AUTO_TEXT2IMAGE_PIPELINES_MAPPING["omnigen2"] = diffusers.OmniGen2Pipeline
+    diffusers.pipelines.auto_pipeline.AUTO_IMAGE2IMAGE_PIPELINES_MAPPING["omnigen2"] = diffusers.OmniGen2Pipeline
+    diffusers.pipelines.auto_pipeline.AUTO_INPAINT_PIPELINES_MAPPING["omnigen2"] = diffusers.OmniGen2Pipeline
+
+    load_config, quant_config = model_quant.get_dit_args(diffusers_load_config, module='Model')
+    shared.log.debug(f'Load model: type=OmniGen2 repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={diffusers_load_config}')
+    transformer = OmniGen2Transformer2DModel.from_pretrained(
+        repo_id,
+        subfolder="transformer",
+        cache_dir=shared.opts.diffusers_dir,
+        trust_remote_code=True,
+        **load_config,
+        **quant_config,
+    )
+
+    load_config, quant_config = model_quant.get_dit_args(diffusers_load_config, module='TE')
+    mllm = Qwen2_5_VLForConditionalGeneration.from_pretrained(
+        repo_id,
+        subfolder="mllm",
+        cache_dir=shared.opts.diffusers_dir,
+        trust_remote_code=True,
+        **load_config,
+        **quant_config,
+    )
+
+    pipe = OmniGen2Pipeline.from_pretrained(
+        repo_id,
+        # transformer=transformer,
+        mllm=mllm,
+        cache_dir=shared.opts.diffusers_dir,
+        trust_remote_code=True,
+        **load_config,
+    )
+    pipe.transformer = transformer # for omnigen2 transformer must be loaded after pipeline
+
+    sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_omnigen2.py b/pipelines/model_omnigen2.py
deleted file mode 100644
index 42b9efb04..000000000
--- a/pipelines/model_omnigen2.py
+++ /dev/null
@@ -1,46 +0,0 @@
-import diffusers
-from modules import shared, devices, sd_models, model_quant, sd_hijack_te
-
-
-def load_omnigen2(checkpoint_info, diffusers_load_config={}): # pylint: disable=unused-argument
-    repo_id = sd_models.path_to_repo(checkpoint_info)
-
-    from pipelines.omnigen2 import OmniGen2Pipeline, OmniGen2Transformer2DModel, Qwen2_5_VLForConditionalGeneration
-    diffusers.OmniGen2Pipeline = OmniGen2Pipeline # monkey-pathch
-    diffusers.pipelines.auto_pipeline.AUTO_TEXT2IMAGE_PIPELINES_MAPPING["omnigen2"] = diffusers.OmniGen2Pipeline
-    diffusers.pipelines.auto_pipeline.AUTO_IMAGE2IMAGE_PIPELINES_MAPPING["omnigen2"] = diffusers.OmniGen2Pipeline
-    diffusers.pipelines.auto_pipeline.AUTO_INPAINT_PIPELINES_MAPPING["omnigen2"] = diffusers.OmniGen2Pipeline
-
-    load_config, quant_config = model_quant.get_dit_args(diffusers_load_config, module='Model')
-    transformer = OmniGen2Transformer2DModel.from_pretrained(
-        repo_id,
-        subfolder="transformer",
-        cache_dir=shared.opts.diffusers_dir,
-        trust_remote_code=True,
-        **load_config,
-        **quant_config,
-    )
-
-    load_config, quant_config = model_quant.get_dit_args(diffusers_load_config, module='TE')
-    mllm = Qwen2_5_VLForConditionalGeneration.from_pretrained(
-        repo_id,
-        subfolder="mllm",
-        cache_dir=shared.opts.diffusers_dir,
-        trust_remote_code=True,
-        **load_config,
-        **quant_config,
-    )
-
-    pipe = OmniGen2Pipeline.from_pretrained(
-        repo_id,
-        # transformer=transformer,
-        mllm=mllm,
-        cache_dir=shared.opts.diffusers_dir,
-        trust_remote_code=True,
-        **load_config,
-    )
-    pipe.transformer = transformer # for omnigen2 transformer must be loaded after pipeline
-
-    sd_hijack_te.init_hijack(pipe)
-    devices.torch_gc(force=True, reason='load')
-    return pipe
diff --git a/pipelines/model_pixart.py b/pipelines/model_pixart.py
index 0860c8ef0..c6563d836 100644
--- a/pipelines/model_pixart.py
+++ b/pipelines/model_pixart.py
@@ -12,13 +12,14 @@ def load_pixart(checkpoint_info, diffusers_load_config={}):
     repo_id_tenc = repo_id
     repo_id_pipe = repo_id
 
+    # PixArt-alpha/PixArt-Sigma-XL-2-2K-MS only holds transformer
     if not file_exists(repo_id_tenc, "text_encoder/config.json"):
         repo_id_tenc = "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
     if not file_exists(repo_id_pipe, "model_index.json"):
         repo_id_pipe = "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
 
     load_args, _quant_args = model_quant.get_dit_args(diffusers_load_config, allow_quant=False)
-    shared.log.debug(f'Load model: type=AuraFlow repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={load_args}')
+    shared.log.debug(f'Load model: type=PixArtSigma repo="{repo_id}" config={diffusers_load_config} offload={shared.opts.diffusers_offload_mode} dtype={devices.dtype} args={load_args}')
 
     transformer = generic.load_transformer(repo_id, cls_name=diffusers.PixArtTransformer2DModel, load_config=diffusers_load_config)
     text_encoder = generic.load_text_encoder(repo_id_tenc, cls_name=transformers.T5EncoderModel, load_config=diffusers_load_config)
@@ -34,5 +35,6 @@ def load_pixart(checkpoint_info, diffusers_load_config={}):
     del text_encoder
     del transformer
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_qwen.py b/pipelines/model_qwen.py
index 5681634b5..2af23595c 100644
--- a/pipelines/model_qwen.py
+++ b/pipelines/model_qwen.py
@@ -1,6 +1,6 @@
 import transformers
 import diffusers
-from modules import shared, devices, sd_models, model_quant, sd_hijack_te
+from modules import shared, devices, sd_models, model_quant, sd_hijack_te, sd_hijack_vae
 from pipelines import generic
 
 
@@ -49,11 +49,8 @@ def load_qwen(checkpoint_info, diffusers_load_config={}):
 
     del text_encoder
     del transformer
-
     sd_hijack_te.init_hijack(pipe)
-    from modules.video_models import video_vae
-    pipe.vae.orig_decode = pipe.vae.decode
-    pipe.vae.decode = video_vae.hijack_vae_decode
+    sd_hijack_vae.init_hijack(pipe)
 
     devices.torch_gc()
     return pipe
diff --git a/pipelines/model_sana.py b/pipelines/model_sana.py
index 7400a1d97..a84c19ded 100644
--- a/pipelines/model_sana.py
+++ b/pipelines/model_sana.py
@@ -1,8 +1,7 @@
-import time
 import torch
 import diffusers
 import transformers
-from modules import shared, sd_models, sd_hijack_te, devices, modelloader, model_quant
+from modules import shared, sd_models, sd_hijack_te, devices, model_quant
 
 
 def load_quants(kwargs, repo_id, cache_dir):
@@ -23,8 +22,8 @@ def load_quants(kwargs, repo_id, cache_dir):
 
 
 def load_sana(checkpoint_info, kwargs={}):
-    modelloader.hf_login()
     repo_id = sd_models.path_to_repo(checkpoint_info)
+    sd_models.hf_auth_check(checkpoint_info)
 
     kwargs.pop('load_connected_pipeline', None)
     kwargs.pop('safety_checker', None)
@@ -47,7 +46,6 @@ def load_sana(checkpoint_info, kwargs={}):
 
     kwargs = load_quants(kwargs, repo_id, cache_dir=shared.opts.diffusers_dir)
     shared.log.debug(f'Load model: type=Sana repo="{repo_id}" args={list(kwargs)}')
-    t0 = time.time()
 
     if devices.dtype == torch.bfloat16 or devices.dtype == torch.float32:
         kwargs['torch_dtype'] = devices.dtype
@@ -79,7 +77,6 @@ def load_sana(checkpoint_info, kwargs={}):
         shared.log.error(f'Load model: type=Sana {e}')
 
     sd_hijack_te.init_hijack(pipe)
-    t1 = time.time()
-    shared.log.debug(f'Load model: type=Sana target={devices.dtype} te={pipe.text_encoder.dtype} transformer={pipe.transformer.dtype} vae={pipe.vae.dtype} time={t1-t0:.2f}')
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_sd3.py b/pipelines/model_sd3.py
index dfc3d73e4..027ef9038 100644
--- a/pipelines/model_sd3.py
+++ b/pipelines/model_sd3.py
@@ -32,5 +32,6 @@ def load_sd3(checkpoint_info, diffusers_load_config={}):
     del text_encoder_3
     del transformer
     sd_hijack_te.init_hijack(pipe)
+
     devices.torch_gc(force=True, reason='load')
     return pipe
diff --git a/pipelines/model_wanai.py b/pipelines/model_wanai.py
index 1be7b0cc7..246d10123 100644
--- a/pipelines/model_wanai.py
+++ b/pipelines/model_wanai.py
@@ -1,7 +1,7 @@
 import os
 import transformers
 import diffusers
-from modules import shared, devices, sd_models, model_quant, sd_hijack_te
+from modules import shared, devices, sd_models, model_quant, sd_hijack_te, sd_hijack_vae
 
 
 def load_transformer(repo_id, diffusers_load_config={}, subfolder='transformer'):
@@ -60,13 +60,13 @@ def load_wan(checkpoint_info, diffusers_load_config={}):
     sd_models.hf_auth_check(checkpoint_info)
 
     if 'a14b' in repo_id.lower():
-        if shared.opts.model_wan_stage == 'high noise':
+        if shared.opts.model_wan_stage == 'high noise' or shared.opts.model_wan_stage == 'first':
             transformer = load_transformer(repo_id, diffusers_load_config, 'transformer')
             transformer_2 = None
-        elif shared.opts.model_wan_stage == 'low noise':
+        elif shared.opts.model_wan_stage == 'low noise' or shared.opts.model_wan_stage == 'second':
             transformer = load_transformer(repo_id, diffusers_load_config, 'transformer_2')
             transformer_2 = None
-        elif shared.opts.model_wan_stage == 'combined':
+        elif shared.opts.model_wan_stage == 'combined' or shared.opts.model_wan_stage == 'both':
             transformer = load_transformer(repo_id, diffusers_load_config, 'transformer')
             transformer_2 = load_transformer(repo_id, diffusers_load_config, 'transformer_2')
         else:
@@ -102,9 +102,7 @@ def load_wan(checkpoint_info, diffusers_load_config={}):
     del transformer_2
 
     sd_hijack_te.init_hijack(pipe)
-    from modules.video_models import video_vae
-    pipe.vae.orig_decode = pipe.vae.decode
-    pipe.vae.decode = video_vae.hijack_vae_decode
+    sd_hijack_vae.init_hijack(pipe)
 
     devices.torch_gc()
     return pipe
diff --git a/pipelines/qwen/qwen_nunchaku.py b/pipelines/qwen/qwen_nunchaku.py
index 698b59447..4464a12b4 100644
--- a/pipelines/qwen/qwen_nunchaku.py
+++ b/pipelines/qwen/qwen_nunchaku.py
@@ -12,7 +12,9 @@ def load_qwen_nunchaku(repo_id):
         shared.log.error(f'Load module: quant=Nunchaku module=transformer repo="{repo_id}" low nunchaku version')
         return None
     if repo_id.lower().endswith('qwen-image'):
-        nunchaku_repo = f"nunchaku-tech/nunchaku-qwen-image/svdq-{nunchaku_precision}_r128-qwen-image.safetensors" # r32 vs R128
+        nunchaku_repo = f"nunchaku-tech/nunchaku-qwen-image/svdq-{nunchaku_precision}_r128-qwen-image.safetensors" # r32 vs r128
+    elif repo_id.lower().endswith('qwen-lightning'):
+        nunchaku_repo = f"nunchaku-tech/nunchaku-qwen-image/svdq-{nunchaku_precision}_r128-qwen-image-lightningv1.1-8steps.safetensors" # 8-step variant
     else:
         shared.log.error(f'Load module: quant=Nunchaku module=transformer repo="{repo_id}" unsupported')
     if nunchaku_repo is not None:
diff --git a/requirements.txt b/requirements.txt
index 20fe134c3..2cacf05d8 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -34,14 +34,14 @@ pi-heif
 rich==14.1.0
 safetensors==0.6.2
 tensordict==0.8.3
-peft==0.17.0
+peft==0.17.1
 httpx==0.24.1
 compel==2.1.1
 torchsde==0.2.6
 antlr4-python3-runtime==4.9.3
 requests==2.32.4
 tqdm==4.67.1
-accelerate==1.10.0
+accelerate==1.10.1
 opencv-contrib-python-headless==4.11.0.86
 einops==0.8.1
 huggingface_hub==0.34.4
@@ -50,14 +50,14 @@ numpy==2.1.2
 pandas==2.3.1
 numba==0.61.2
 protobuf==4.25.3
-pytorch_lightning==2.5.3
-tokenizers==0.21.4
+pytorch_lightning==2.5.4
 urllib3==1.26.19
 Pillow==10.4.0
 timm==1.0.16
 pydantic==1.10.21
 pyparsing==3.2.3
 typing-extensions==4.14.1
+sentencepiece==0.2.1
 
 # additional
 blendmodes
diff --git a/scripts/cogvideo.py b/scripts/cogvideo.py
index d4d1dfd1f..cf72742df 100644
--- a/scripts/cogvideo.py
+++ b/scripts/cogvideo.py
@@ -205,7 +205,7 @@ class Script(scripts_manager.Script):
         self.offload(offload)
         frames = self.generate(p, model)
         devices.torch_gc()
-        processed = processing.Processed(p, images_list=frames)
+        processed = processing.get_processed(p, images_list=frames)
         shared.state.end()
         return processed
 
diff --git a/scripts/consistory_ext.py b/scripts/consistory_ext.py
index 0b7ed9e7b..0042e348f 100644
--- a/scripts/consistory_ext.py
+++ b/scripts/consistory_ext.py
@@ -210,7 +210,7 @@ class Script(scripts_manager.Script):
                 images.append(image)
 
         shared.sd_model.disable_freeu()
-        processed = processing.Processed(p, images)
+        processed = processing.get_processed(p, images)
         return processed
 
     def after(self, p: processing.StableDiffusionProcessing, processed: processing.Processed, *args): # pylint: disable=arguments-differ, unused-argument
diff --git a/scripts/custom_code.py b/scripts/custom_code.py
index 4c79463db..30ad81cf2 100644
--- a/scripts/custom_code.py
+++ b/scripts/custom_code.py
@@ -2,7 +2,7 @@ import copy
 import ast
 import gradio as gr
 from modules import scripts_manager
-from modules.processing import Processed
+from modules.processing import Processed, get_processed
 from modules.shared import opts, cmd_opts, state # pylint: disable=unused-import
 
 
@@ -87,4 +87,4 @@ __webuitemp__()"""
         if isinstance(result, Processed):
             return result
 
-        return Processed(p, *display_result_data)
+        return get_processed(p, *display_result_data)
diff --git a/scripts/hdr.py b/scripts/hdr.py
index 7daf3e3a0..70d3a6655 100644
--- a/scripts/hdr.py
+++ b/scripts/hdr.py
@@ -4,7 +4,7 @@ import numpy as np
 import gradio as gr
 from PIL import Image
 from modules import images, processing, shared, scripts_manager
-from modules.processing import Processed
+from modules.processing import get_processed
 from modules.shared import opts, state
 
 
@@ -95,5 +95,5 @@ class Script(scripts_manager.Script):
             grid = [images.image_grid(imgs, rows=1)] if opts.return_grid else []
             imgs = [img] + grid
 
-        processed = Processed(p, images_list=imgs, seed=p.seed, info=info)
+        processed = get_processed(p, images_list=imgs, seed=p.seed, info=info)
         return processed
diff --git a/scripts/infiniteyou/pipeline_infu_flux.py b/scripts/infiniteyou/pipeline_infu_flux.py
index 7499ab077..b8e42c23a 100644
--- a/scripts/infiniteyou/pipeline_infu_flux.py
+++ b/scripts/infiniteyou/pipeline_infu_flux.py
@@ -153,11 +153,12 @@ class InfUFluxPipeline:
         infusenet_path = os.path.join(infiniteyou_path, 'InfuseNetModel')
         quant_args = model_quant.create_config(module='Control')
         shared.log.debug(f'InfiniteYou: fn="{infusenet_path}" load infusenet')
-        self.infusenet = FluxControlNetModel.from_pretrained(
+        infusenet = FluxControlNetModel.from_pretrained(
             infusenet_path,
             torch_dtype=devices.dtype,
             **quant_args,
         )
+        infusenet.offload_never = True
         # assemble pipeline
         self.pipe = FluxInfuseNetPipeline(
                 vae=pipe.vae,
@@ -167,8 +168,9 @@ class InfUFluxPipeline:
                 tokenizer_2=pipe.tokenizer_2,
                 transformer=pipe.transformer,
                 scheduler=pipe.scheduler,
-                controlnet=self.infusenet,
+                controlnet=infusenet,
             )
+        del infusenet
         # Load image proj model
         num_tokens = image_proj_num_tokens
         image_emb_dim = 512
diff --git a/scripts/infiniteyou_ext.py b/scripts/infiniteyou_ext.py
index 53c994a01..7172b03f7 100644
--- a/scripts/infiniteyou_ext.py
+++ b/scripts/infiniteyou_ext.py
@@ -28,6 +28,7 @@ def load_infiniteyou(model: str):
         model_version=model,
     )
     sd_models.copy_diffuser_options(shared.sd_model, orig_pipeline)
+    sd_models.clear_caches(full=True)
     sd_models.set_diffuser_options(shared.sd_model)
 
 
diff --git a/scripts/ipadapter.py b/scripts/ipadapter.py
index 9de53111d..7f1e04a48 100644
--- a/scripts/ipadapter.py
+++ b/scripts/ipadapter.py
@@ -63,7 +63,7 @@ class Script(scripts_manager.Script):
                     with gr.Row():
                         adapter = gr.Dropdown(label='Adapter', choices=list(ipadapter.get_adapters()), value='None')
                         adapters.append(adapter)
-                        ui_common.create_refresh_button(adapter, ipadapter.get_adapters)
+                        ui_common.create_refresh_button(adapter, ipadapter.get_adapters, elem_id=f"ipadapter_adapter_{i}_refresh")
                     with gr.Row():
                         scales.append(gr.Slider(label='Strength', minimum=0.0, maximum=1.0, step=0.01, value=0.5))
                         crops.append(gr.Checkbox(label='Crop to portrait', value=False, interactive=True))
diff --git a/scripts/ipinstruct.py b/scripts/ipinstruct.py
index 6861b8638..244030ff3 100644
--- a/scripts/ipinstruct.py
+++ b/scripts/ipinstruct.py
@@ -101,7 +101,7 @@ class Script(scripts_manager.Script):
             auto_scale = False,
             simple_cfg_mode = False,
         )
-        processed = processing.Processed(p, images_list=image_list, seed=p.seed, subseed=p.subseed, index_of_first_image=0) # manually created processed object
+        processed = processing.get_processed(p, images_list=image_list, seed=p.seed, subseed=p.subseed, index_of_first_image=0) # manually created processed object
         # p.extra_generation_params["IPInstruct"] = f''
         return processed
 
diff --git a/scripts/lbm_ext.py b/scripts/lbm_ext.py
index 7b1fbf811..3c268fb26 100644
--- a/scripts/lbm_ext.py
+++ b/scripts/lbm_ext.py
@@ -131,6 +131,6 @@ class Script(scripts_manager.Script):
 
         if output_image is not None:
             output_image.resize((ori_h_bg, ori_w_bg))
-            return processing.Processed(p, [output_image])
+            return processing.get_processed(p, [output_image])
         else:
             return processing.Processed(p, [])
diff --git a/scripts/outpainting_mk_2.py b/scripts/outpainting_mk_2.py
index 257bc6ec9..28f71a221 100644
--- a/scripts/outpainting_mk_2.py
+++ b/scripts/outpainting_mk_2.py
@@ -3,7 +3,7 @@ import numpy as np
 import gradio as gr
 from PIL import Image, ImageDraw
 from modules import images, scripts_manager
-from modules.processing import Processed, process_images
+from modules.processing import get_processed, process_images
 from modules.shared import opts, state
 
 
@@ -222,7 +222,7 @@ class Script(scripts_manager.Script):
         combined_grid_image = images.image_grid(all_processed_images)
         if opts.return_grid and len(all_processed_images) > 1:
             all_images = [combined_grid_image] + all_processed_images
-        res = Processed(p, all_images, initial_seed_and_info[0], initial_seed_and_info[1])
+        res = get_processed(p, all_images, initial_seed_and_info[0], initial_seed_and_info[1])
         if opts.samples_save:
             for img in all_processed_images:
                 images.save_image(img, p.outpath_samples, "", res.seed, p.prompt, opts.samples_format, info=res.info, p=p)
diff --git a/scripts/poor_mans_outpainting.py b/scripts/poor_mans_outpainting.py
index 19b857fb4..209811678 100644
--- a/scripts/poor_mans_outpainting.py
+++ b/scripts/poor_mans_outpainting.py
@@ -2,7 +2,7 @@ import math
 import gradio as gr
 from PIL import Image, ImageDraw
 from modules import images, devices, scripts_manager
-from modules.processing import Processed, process_images
+from modules.processing import get_processed, process_images
 from modules.shared import opts, state, log
 
 
@@ -108,5 +108,5 @@ class Script(scripts_manager.Script):
         combined_image = images.combine_grid(grid)
         if opts.samples_save:
             images.save_image(combined_image, p.outpath_samples, "", initial_seed, p.prompt, opts.samples_format, info=initial_info, p=p)
-        processed = Processed(p, [combined_image], initial_seed, initial_info)
+        processed = get_processed(p, [combined_image], initial_seed, initial_info)
         return processed
diff --git a/scripts/postprocessing_upscale.py b/scripts/postprocessing_upscale.py
index b33cd9888..bf9456c2b 100644
--- a/scripts/postprocessing_upscale.py
+++ b/scripts/postprocessing_upscale.py
@@ -23,7 +23,7 @@ class ScriptPostprocessingUpscale(scripts_postprocessing.ScriptPostprocessing):
                                 with gr.Row(elem_id="upscaling_column_size"):
                                     upscaling_resize_w = gr.Slider(minimum=64, maximum=4096, step=8, label="Width", value=1024, elem_id="extras_upscaling_resize_w")
                                     upscaling_resize_h = gr.Slider(minimum=64, maximum=4096, step=8, label="Height", value=1024, elem_id="extras_upscaling_resize_h")
-                                    upscaling_res_switch_btn = ToolButton(value=symbols.switch, elem_id="upscaling_res_switch_btn")
+                                    upscaling_res_switch_btn = ToolButton(value=symbols.switch, elem_id="upscaling_res_swap")
                                     upscaling_crop = gr.Checkbox(label='Crop to fit', value=True, elem_id="extras_upscaling_crop")
 
                 with gr.Row():
diff --git a/scripts/prompts_from_file.py b/scripts/prompts_from_file.py
index 6e55a8ebb..dea8ced6f 100644
--- a/scripts/prompts_from_file.py
+++ b/scripts/prompts_from_file.py
@@ -3,7 +3,7 @@ import random
 import shlex
 import gradio as gr
 from modules import sd_samplers, errors, scripts_manager
-from modules.processing import Processed, process_images
+from modules.processing import get_processed, process_images
 from modules.shared import state, log
 
 
@@ -148,4 +148,4 @@ class Script(scripts_manager.Script):
             all_negative += proc.all_negative_prompts
             images += proc.images
             infotexts += proc.infotexts
-        return Processed(p, images, p.seed, "", all_prompts=all_prompts, all_seeds=all_seeds, all_negative_prompts=all_negative, infotexts=infotexts)
+        return get_processed(p, images, p.seed, "", all_prompts=all_prompts, all_seeds=all_seeds, all_negative_prompts=all_negative, infotexts=infotexts)
diff --git a/scripts/pulid_ext.py b/scripts/pulid_ext.py
index 7003daf19..0c8b55f23 100644
--- a/scripts/pulid_ext.py
+++ b/scripts/pulid_ext.py
@@ -273,7 +273,7 @@ class Script(scripts_manager.Script):
                     id_scale=strength,
                     )[0]
             info = processing.create_infotext(p)
-            processed = processing.Processed(p, [output], info=info)
+            processed = processing.get_processed(p, [output], info=info)
             shared.state.end()
         else: # let processing run the pipeline
             p.task_args['id_embedding'] = id_embedding
diff --git a/scripts/sd_upscale.py b/scripts/sd_upscale.py
index 3665bcfc5..0e71ff7d5 100644
--- a/scripts/sd_upscale.py
+++ b/scripts/sd_upscale.py
@@ -2,7 +2,7 @@ import math
 import gradio as gr
 from PIL import Image
 from modules import processing, shared, images, devices, scripts_manager
-from modules.processing import Processed
+from modules.processing import get_processed
 from modules.shared import opts, state, log
 
 
@@ -89,6 +89,6 @@ class Script(scripts_manager.Script):
             if opts.samples_save:
                 images.save_image(combined_image, p.outpath_samples, "", start_seed, p.prompt, opts.samples_format, info=initial_info, p=p)
 
-        processed = Processed(p, result_images, seed, initial_info)
+        processed = get_processed(p, result_images, seed, initial_info)
         log.info(f"SD upscale: images={result_images}")
         return processed
diff --git a/scripts/xyz_grid.py b/scripts/xyz_grid.py
index 051f9c766..9c7671353 100644
--- a/scripts/xyz_grid.py
+++ b/scripts/xyz_grid.py
@@ -40,17 +40,17 @@ class Script(scripts_manager.Script):
                     x_type = gr.Dropdown(label="X type", container=True, choices=[x.label for x in self.current_axis_options], value=self.current_axis_options[0].label, type="index", elem_id=self.elem_id("x_type"))
                     x_values = gr.Textbox(label="X values", container=True, lines=1, elem_id=self.elem_id("x_values"))
                     x_values_dropdown = gr.Dropdown(label="X values", container=True, visible=False, multiselect=True, interactive=True)
-                    fill_x_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_fill_x_tool_button", visible=False)
+                    fill_x_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_x_list", visible=False)
                 with gr.Row(variant='compact'):
                     y_type = gr.Dropdown(label="Y type", container=True, choices=[x.label for x in self.current_axis_options], value=self.current_axis_options[0].label, type="index", elem_id=self.elem_id("y_type"))
                     y_values = gr.Textbox(label="Y values", container=True, lines=1, elem_id=self.elem_id("y_values"))
                     y_values_dropdown = gr.Dropdown(label="Y values", container=True, visible=False, multiselect=True, interactive=True)
-                    fill_y_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_fill_y_tool_button", visible=False)
+                    fill_y_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_y_list", visible=False)
                 with gr.Row(variant='compact'):
                     z_type = gr.Dropdown(label="Z type", container=True, choices=[x.label for x in self.current_axis_options], value=self.current_axis_options[0].label, type="index", elem_id=self.elem_id("z_type"))
                     z_values = gr.Textbox(label="Z values", container=True, lines=1, elem_id=self.elem_id("z_values"))
                     z_values_dropdown = gr.Dropdown(label="Z values", container=True, visible=False, multiselect=True, interactive=True)
-                    fill_z_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_fill_z_tool_button", visible=False)
+                    fill_z_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_z_list", visible=False)
 
         with gr.Row():
             with gr.Column():
diff --git a/scripts/xyz_grid_on.py b/scripts/xyz_grid_on.py
index b0dea5ead..bfa0ca662 100644
--- a/scripts/xyz_grid_on.py
+++ b/scripts/xyz_grid_on.py
@@ -46,17 +46,17 @@ class Script(scripts_manager.Script):
                         x_type = gr.Dropdown(label="X type", container=True, choices=[x.label for x in self.current_axis_options], value=self.current_axis_options[0].label, type="index", elem_id=self.elem_id("x_type"))
                         x_values = gr.Textbox(label="X values", container=True, lines=1, elem_id=self.elem_id("x_values"))
                         x_values_dropdown = gr.Dropdown(label="X values", container=True, visible=False, multiselect=True, interactive=True)
-                        fill_x_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_fill_x_tool_button", visible=False)
+                        fill_x_button = ToolButton(value=symbols.fill, elem_id="xyz_gridon_x_list", visible=False)
                     with gr.Row(variant='compact'):
                         y_type = gr.Dropdown(label="Y type", container=True, choices=[x.label for x in self.current_axis_options], value=self.current_axis_options[0].label, type="index", elem_id=self.elem_id("y_type"))
                         y_values = gr.Textbox(label="Y values", container=True, lines=1, elem_id=self.elem_id("y_values"))
                         y_values_dropdown = gr.Dropdown(label="Y values", container=True, visible=False, multiselect=True, interactive=True)
-                        fill_y_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_fill_y_tool_button", visible=False)
+                        fill_y_button = ToolButton(value=symbols.fill, elem_id="xyz_gridon_y_list", visible=False)
                     with gr.Row(variant='compact'):
                         z_type = gr.Dropdown(label="Z type", container=True, choices=[x.label for x in self.current_axis_options], value=self.current_axis_options[0].label, type="index", elem_id=self.elem_id("z_type"))
                         z_values = gr.Textbox(label="Z values", container=True, lines=1, elem_id=self.elem_id("z_values"))
                         z_values_dropdown = gr.Dropdown(label="Z values", container=True, visible=False, multiselect=True, interactive=True)
-                        fill_z_button = ToolButton(value=symbols.fill, elem_id="xyz_grid_fill_z_tool_button", visible=False)
+                        fill_z_button = ToolButton(value=symbols.fill, elem_id="xyz_gridon_z_list", visible=False)
 
             with gr.Row():
                 with gr.Column():
diff --git a/wiki b/wiki
index e7506467c..419896675 160000
--- a/wiki
+++ b/wiki
@@ -1 +1 @@
-Subproject commit e7506467c068b8550f994b3725b44f66ce659b64
+Subproject commit 4198966750bda9a98fe9a95f0a927095c2b2516e