7.1 KiB
| name | description | argument-hint |
|---|---|---|
| debug-model | Debug a broken SD.Next or Diffusers model integration. Use when a newly added or ported model fails to load, misdetects, crashes during prompt encoding or sampling, or produces incorrect outputs. | Describe the failing model, where it fails, the error message, and whether the model is upstream Diffusers, custom pipeline, or raw checkpoint based |
Debug SD.Next And Diffusers Model Port
Read the error, identify which integration layer is failing, isolate the smallest reproducible failure point, fix the root cause, and validate the fix without expanding scope.
When To Use
- A newly added SD.Next model type does not autodetect correctly
- The loader fails to instantiate a pipeline or component
- A custom pipeline imports but fails during
from_pretrained - Prompt encoding fails because of tokenizer, processor, or text encoder mismatch
- Sampling fails due to tensor shape, dtype, device, or scheduler issues
- The model loads but outputs corrupted images, wrong output type, or obviously incorrect results
Debugging Order
Always debug from the outside in.
- Detection and routing
- Loader arguments and component selection
- Checkpoint path and artifact layout
- Weight loading and key mapping
- Prompt encoding
- Sampling forward path
- Output postprocessing and SD.Next task integration
Do not start by rewriting the architecture if the failure is likely in detection, loader wiring, or output handling.
Files To Check First
.github/copilot-instructions.md.github/instructions/core.instructions.mdmodules/sd_detect.pymodules/sd_models.pymodules/modeldata.pypipelines/model_<name>.pypipelines/<model>/model.pypipelines/<model>/pipeline.pypipelines/generic.py
If the port is based on a standalone script, compare the failing path against the original reference implementation and identify the first semantic divergence.
Failure Classification
1. Model Not Detected Or Misclassified
Check:
- Filename and repo-name heuristics in
modules/sd_detect.py - Loader dispatch branch in
modules/sd_models.py - Reverse pipeline classification in
modules/modeldata.py
Typical symptoms:
- Wrong loader called
- Pipeline classified as a broader family such as
chromainstead of a customzetachroma - Task switching behaves incorrectly because the loaded pipeline type is wrong
2. Loader Fails Before Pipeline Construction
Check:
sd_models.path_to_repo(checkpoint_info)outputgeneric.load_transformer(...)andgeneric.load_text_encoder(...)arguments- Duplicate kwargs such as
torch_dtype - Wrong class chosen for text encoder, tokenizer, or processor
- Whether the source is really a Diffusers repo or only a raw checkpoint
Typical symptoms:
- Missing subfolder errors
from_pretrainedargument mismatch- Component class mismatch
3. Raw Checkpoint Load Fails
Check:
- Checkpoint path resolution for local file, local directory, and Hub repo
- State dict load method
- Key remapping logic
- Config inference from tensor shapes
- Missing versus unexpected keys after
load_state_dict
Typical symptoms:
- Key mismatch explosion
- Wrong inferred head counts, dimensions, or decoder settings
- Silent shape corruption caused by a bad remap
4. Prompt Encoding Fails
Check:
- Tokenizer or processor choice
trust_remote_coderequirements- Chat template or custom prompt formatting
- Hidden state index selection
- Padding and batch alignment between positive and negative prompts
Typical symptoms:
- Tokenizer attribute errors
- Hidden state shape mismatch
- CFG failures when negative prompts do not match prompt batch length
5. Sampling Or Forward Pass Fails
Check:
- Input tensor shape and channel count
- Device and dtype alignment across all components
- Scheduler timesteps and expected timestep convention
- Classifier-free guidance concatenation and split logic
- Pixel-space versus latent-space assumptions
Typical symptoms:
- Shape mismatch in attention or decoder blocks
- Device mismatch between text encoder output and model tensors
- Images exploding to NaNs because timestep semantics are inverted
6. Output Is Wrong But No Exception Is Raised
Check:
- Whether the model predicts
x0, noise, or velocity - Whether the Euler or other sampler update matches the model objective
- Final scaling and clamp path
output_typehandling andpipe.task_args- Whether a VAE is being applied incorrectly to direct pixel-space output
Typical symptoms:
- Black, gray, washed-out, or heavily clipped images
- Output with correct size but obviously broken semantics
- Correct tensors but wrong SD.Next display behavior because output type is mismatched
Minimal Debug Procedure
1. Reproduce Narrowly
Capture the smallest failing operation.
- Pure import failure
- Loader-only failure
from_pretrainedfailure- Prompt encode failure
- Single forward pass failure
- First sampler step failure
Prefer narrow Python checks before attempting a full generation run.
2. Compare Against Working Pattern
Find the closest working in-repo analogue and compare:
- Loader structure
- Registered module names
- Pipeline class name and module registration
- Prompt encoding path
- Output conversion path
3. Fix The Root Cause
Examples:
- Add the missing
modeldatabranch instead of patching downstream task handling - Fix checkpoint remapping rather than forcing
strict=Falseand ignoring real mismatches - Correct the output path for pixel-space models instead of routing through a VAE
- Make config inference fail explicitly when ambiguous instead of guessing silently
4. Validate In Layers
After each meaningful fix, validate the narrowest relevant layer first.
compileallor syntax checkruffon touched files- Import smoke test
- Loader-only smoke test
- Full run only when the lower layers are stable
Common Root Causes
modules/modeldata.pynot updated after adding a new custom pipeline familymodules/sd_detect.pybranch order causes overbroad detection to win first- Loader passes duplicated keyword args like
torch_dtype - Shared text encoder assumptions do not match the actual model variant
from_pretrainedassumestransformer/ortext_encoder/subfolders that do not exist- Key remapping merges QKV in the wrong order
- CFG path concatenates embeddings or latents incorrectly
- Direct pixel-space models are postprocessed like latent-space diffusion outputs
- Negative prompts are not padded or repeated to match prompt batch shape
- Pipeline class naming collides with broader family checks in
modeldata
Validation Checklist
When closing the task, report which of these were completed:
- Exact failing layer identified
- Root cause fixed
- Syntax check passed
- Focused lint passed
- Import or loader smoke test passed
- Real generation tested, or explicitly not tested
Example Request Shapes
- "The new model port fails in from_pretrained"
- "SD.Next detects my custom pipeline as the wrong model type"
- "The loader works but generation returns black images"
- "This standalone-script port loads weights but crashes in attention"