Update quant doc so it's not completely wrong. (#13381)
There is still more that needs to be fixed.pull/13382/head
parent
31283d2892
commit
971932346a
|
|
@ -139,9 +139,9 @@ Example:
|
||||||
"_quantization_metadata": {
|
"_quantization_metadata": {
|
||||||
"format_version": "1.0",
|
"format_version": "1.0",
|
||||||
"layers": {
|
"layers": {
|
||||||
"model.layers.0.mlp.up_proj": "float8_e4m3fn",
|
"model.layers.0.mlp.up_proj": {"format": "float8_e4m3fn"},
|
||||||
"model.layers.0.mlp.down_proj": "float8_e4m3fn",
|
"model.layers.0.mlp.down_proj": {"format": "float8_e4m3fn"},
|
||||||
"model.layers.1.mlp.up_proj": "float8_e4m3fn"
|
"model.layers.1.mlp.up_proj": {"format": "float8_e4m3fn"}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -165,4 +165,4 @@ Activation quantization (e.g., for FP8 Tensor Core operations) requires `input_s
|
||||||
3. **Compute scales**: Derive `input_scale` from collected statistics
|
3. **Compute scales**: Derive `input_scale` from collected statistics
|
||||||
4. **Store in checkpoint**: Save `input_scale` parameters alongside weights
|
4. **Store in checkpoint**: Save `input_scale` parameters alongside weights
|
||||||
|
|
||||||
The calibration dataset should be representative of your target use case. For diffusion models, this typically means a diverse set of prompts and generation parameters.
|
The calibration dataset should be representative of your target use case. For diffusion models, this typically means a diverse set of prompts and generation parameters.
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue