post release jumbo patch

Signed-off-by: Vladimir Mandic <mandic00@live.com>
pull/3586/head
Vladimir Mandic 2024-11-22 12:55:18 -05:00
parent a0403e7d54
commit 62c53942e0
17 changed files with 189 additions and 153 deletions

View File

@ -1,5 +1,22 @@
# Change Log for SD.Next
## Update for 2024-11-22
- Model loader improvements:
- detect model components on model load fail
- Flux, SD35: force unload model
- Flux: apply `bnb` quant when loading *unet/transformer*
- Flux: all-in-one safetensors
example: <https://civitai.com/models/646328?modelVersionId=1040235>
- Flux: do not recast quants
- Sampler improvements
- update DPM FlowMatch samplers
- Fixes:
- update `diffusers`
- fix README links
- fix sdxl controlnet single-file loader
- relax settings validator
## Update for 2024-11-21
### Highlights for 2024-11-21

View File

@ -56,7 +56,7 @@ For screenshots and informations on other available themes, see [Themes Wiki](ht
## Model support
Additional models will be added as they become available and there is public interest in them
See [models overview](wiki/Models) for details on each model, including their architecture, complexity and other info
See [models overview](https://github.com/vladmandic/automatic/wiki/Models) for details on each model, including their architecture, complexity and other info
- [RunwayML Stable Diffusion](https://github.com/Stability-AI/stablediffusion/) 1.x and 2.x *(all variants)*
- [StabilityAI Stable Diffusion XL](https://github.com/Stability-AI/generative-models), [StabilityAI Stable Diffusion 3.0](https://stability.ai/news/stable-diffusion-3-medium) Medium, [StabilityAI Stable Diffusion 3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) Medium, Large, Large Turbo
@ -101,17 +101,17 @@ See [models overview](wiki/Models) for details on each model, including their ar
## Getting started
- Get started with **SD.Next** by following the [installation instructions](wiki/Installation)
- For more details, check out [advanced installation](wiki/Advanced-Install) guide
- List and explanation of [command line arguments](wiki/CLI-Arguments)
- Get started with **SD.Next** by following the [installation instructions](https://github.com/vladmandic/automatic/wiki/Installation)
- For more details, check out [advanced installation](https://github.com/vladmandic/automatic/wiki/Advanced-Install) guide
- List and explanation of [command line arguments](https://github.com/vladmandic/automatic/wiki/CLI-Arguments)
- Install walkthrough [video](https://www.youtube.com/watch?v=nWTnTyFTuAs)
> [!TIP]
> And for platform specific information, check out
> [WSL](wiki/WSL) | [Intel Arc](wiki/Intel-ARC) | [DirectML](wiki/DirectML) | [OpenVINO](wiki/OpenVINO) | [ONNX & Olive](wiki/ONNX-Runtime) | [ZLUDA](wiki/ZLUDA) | [AMD ROCm](wiki/AMD-ROCm) | [MacOS](wiki/MacOS-Python.md) | [nVidia](wiki/nVidia)
> [WSL](https://github.com/vladmandic/automatic/wiki/WSL) | [Intel Arc](https://github.com/vladmandic/automatic/wiki/Intel-ARC) | [DirectML](https://github.com/vladmandic/automatic/wiki/DirectML) | [OpenVINO](https://github.com/vladmandic/automatic/wiki/OpenVINO) | [ONNX & Olive](https://github.com/vladmandic/automatic/wiki/ONNX-Runtime) | [ZLUDA](https://github.com/vladmandic/automatic/wiki/ZLUDA) | [AMD ROCm](https://github.com/vladmandic/automatic/wiki/AMD-ROCm) | [MacOS](https://github.com/vladmandic/automatic/wiki/MacOS-Python.md) | [nVidia](https://github.com/vladmandic/automatic/wiki/nVidia)
> [!WARNING]
> If you run into issues, check out [troubleshooting](wiki/Troubleshooting) and [debugging](wiki/Debug) guides
> If you run into issues, check out [troubleshooting](https://github.com/vladmandic/automatic/wiki/Troubleshooting) and [debugging](https://github.com/vladmandic/automatic/wiki/Debug) guides
> [!TIP]
> All command line options can also be set via env variable

View File

@ -8,6 +8,7 @@ Main ToDo list can be found at [GitHub projects](https://github.com/users/vladma
- SD35 LoRA: <https://github.com/huggingface/diffusers/issues/9950>
- Flux IPAdapter: <https://github.com/huggingface/diffusers/issues/9825>
- Flux Fill/ControlNet/Redux: <https://github.com/huggingface/diffusers/pull/9985>
- Flux NF4: <https://github.com/huggingface/diffusers/issues/9996>
- SANA: <https://github.com/huggingface/diffusers/pull/9982>
## Other

View File

@ -38,6 +38,16 @@ def list_to_dict(flat_list):
return result_dict
def list_compact(flat_list):
result_list = []
for item in flat_list:
keys = item.split('.')
keys = '.'.join(keys[:2])
if keys not in result_list:
result_list.append(keys)
return result_list
def guess_dct(dct: dict):
# if has(dct, 'model.diffusion_model.input_blocks') and has(dct, 'model.diffusion_model.label_emb'):
# return 'sdxl'
@ -65,7 +75,9 @@ def read_keys(fn):
except Exception as e:
pprint(e)
dct = list_to_dict(keys)
lst = list_compact(keys)
pprint(f'file: {fn}')
pprint(lst)
pprint(remove_entries_after_depth(dct, 3))
pprint(remove_entries_after_depth(dct, 6))
guess = guess_dct(dct)

View File

@ -459,7 +459,7 @@ def check_python(supported_minors=[9, 10, 11, 12], reason=None):
def check_diffusers():
if args.skip_all or args.skip_requirements:
return
sha = 'cd6ca9df2987c000b28e13b19bd4eec3ef3c914b'
sha = 'b5fd6f13f5434d69d919cc8cedf0b11db664cf06'
pkg = pkg_resources.working_set.by_key.get('diffusers', None)
minor = int(pkg.version.split('.')[1] if pkg is not None else 0)
cur = opts.get('diffusers_version', '') if minor > 0 else ''

View File

@ -194,6 +194,7 @@ def load_transformer(file_path): # triggered by opts.sd_unet change
if _transformer is not None:
transformer = _transformer
else:
diffusers_load_config = model_quant.create_bnb_config(diffusers_load_config)
transformer = diffusers.FluxTransformer2DModel.from_single_file(file_path, **diffusers_load_config)
if transformer is None:
shared.log.error('Failed to load UNet model')
@ -213,6 +214,11 @@ def load_flux(checkpoint_info, diffusers_load_config): # triggered by opts.sd_ch
text_encoder_2 = None
vae = None
# unload current model
sd_models.unload_model_weights()
shared.sd_model = None
devices.torch_gc(force=True)
# load overrides if any
if shared.opts.sd_unet != 'None':
try:
@ -305,8 +311,21 @@ def load_flux(checkpoint_info, diffusers_load_config): # triggered by opts.sd_ch
repo_id = 'black-forest-labs/FLUX.1-dev' # workaround since sayakpaul model is missing model_index.json
for c in kwargs:
if kwargs[c].dtype == torch.float32 and devices.dtype != torch.float32:
shared.log.warning(f'Load model: type=FLUX component={c} dtype={kwargs[c].dtype} cast dtype={devices.dtype}')
shared.log.warning(f'Load model: type=FLUX component={c} dtype={kwargs[c].dtype} cast dtype={devices.dtype} recast')
kwargs[c] = kwargs[c].to(dtype=devices.dtype)
kwargs = model_quant.create_bnb_config(kwargs)
pipe = diffusers.FluxPipeline.from_pretrained(repo_id, cache_dir=shared.opts.diffusers_dir, **kwargs, **diffusers_load_config)
allow_bnb = 'gguf' not in (sd_unet.loaded_unet or '')
kwargs = model_quant.create_bnb_config(kwargs, allow_bnb)
if checkpoint_info.path.endswith('.safetensors') and os.path.isfile(checkpoint_info.path):
pipe = diffusers.FluxPipeline.from_single_file(checkpoint_info.path, cache_dir=shared.opts.diffusers_dir, **kwargs, **diffusers_load_config)
else:
pipe = diffusers.FluxPipeline.from_pretrained(repo_id, cache_dir=shared.opts.diffusers_dir, **kwargs, **diffusers_load_config)
# release memory
transformer = None
text_encoder_1 = None
text_encoder_2 = None
vae = None
devices.torch_gc()
return pipe

View File

@ -7,10 +7,10 @@ bnb = None
quanto = None
def create_bnb_config(kwargs = None):
def create_bnb_config(kwargs = None, allow_bnb: bool = True):
from modules import shared, devices
if len(shared.opts.bnb_quantization) > 0:
if 'Model' in shared.opts.bnb_quantization and 'transformer' not in (kwargs or {}):
if len(shared.opts.bnb_quantization) > 0 and allow_bnb:
if 'Model' in shared.opts.bnb_quantization:
load_bnb()
bnb_config = diffusers.BitsAndBytesConfig(
load_in_8bit=shared.opts.bnb_quantization_type in ['fp8'],

View File

@ -120,6 +120,11 @@ def load_sd3(checkpoint_info, cache_dir=None, config=None):
repo_id = sd_models.path_to_repo(checkpoint_info.name)
fn = checkpoint_info.path
# unload current model
sd_models.unload_model_weights()
shared.sd_model = None
devices.torch_gc(force=True)
kwargs = {}
kwargs = load_overrides(kwargs, cache_dir)
if fn is None or not os.path.exists(fn):
@ -152,5 +157,5 @@ def load_sd3(checkpoint_info, cache_dir=None, config=None):
config=config,
**kwargs,
)
devices.torch_gc(force=True)
devices.torch_gc()
return pipe

View File

@ -13,6 +13,16 @@ def remove_entries_after_depth(d, depth, current_depth=0):
return d
def list_compact(flat_list):
result_list = []
for item in flat_list:
keys = item.split('.')
keys = '.'.join(keys[:2])
if keys not in result_list:
result_list.append(keys)
return result_list
def list_to_dict(flat_list):
result_dict = {}
try:

View File

@ -9,11 +9,11 @@ import torch
import torchsde
from diffusers.configuration_utils import ConfigMixin, register_to_config
from diffusers.utils import BaseOutput, logging
from diffusers.utils import BaseOutput
from diffusers.utils.torch_utils import randn_tensor
from diffusers.schedulers.scheduling_utils import SchedulerMixin
import scipy.stats
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
class BatchedBrownianTree:
"""A wrapper around torchsde.BrownianTree that enables batches of entropy."""
@ -101,39 +101,42 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
Args:
num_train_timesteps (`int`, defaults to 1000):
The number of diffusion steps to train the model.
beta_start (`float`, defaults to 0.0001):
The starting `beta` value of inference.
beta_end (`float`, defaults to 0.02):
The final `beta` value.
beta_schedule (`str`, defaults to `"scaled linear"`):
The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from `linear` or `scaled_linear`.
trained_betas (`np.ndarray`, *optional*):
Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`.
solver_order (`int`, defaults to 2):
The DPMSolver order which can be `2` or `3`. It is recommended to use `solver_order=2` for guided
sampling, and `solver_order=3` for unconditional sampling.
thresholding (`bool`, defaults to `False`):
Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such
as Stable Diffusion.
dynamic_thresholding_ratio (`float`, defaults to 0.995):
The ratio for the dynamic thresholding method. Valid only when `thresholding=True`.
sample_max_value (`float`, defaults to 1.0):
The threshold value for dynamic thresholding. Valid only when `thresholding=True`.
algorithm_type (`str`, defaults to `dpmsolver++2M`):
Algorithm type for the solver; can be `dpmsolver2`, `dpmsolver2A`, `dpmsolver++2M`, `dpmsolver++2S`, `dpmsolver++sde`, `dpmsolver++2Msde`,
or `dpmsolver++3Msde`.
solver_type (`str`, defaults to `midpoint`):
Solver type for the second-order solver; can be `midpoint` or `heun`. The solver type slightly affects the
sample quality, especially for a small number of steps. It is recommended to use `midpoint` solvers.
sigma_schedule (`str`, *optional*, defaults to None): Sigma schedule to compute the `sigmas`. Optionally, we use
sigma_schedule (`str`, *optional*, defaults to None (beta)): Sigma schedule to compute the `sigmas`. Optionally, we use
the schedule "karras" introduced in the EDM paper (https://arxiv.org/abs/2206.00364). Other acceptable values are
"exponential". The exponential schedule was incorporated in this model: https://huggingface.co/stabilityai/cosxl.
Other acceptable values are "lambdas". The uniform-logSNR for step sizes proposed by Lu's DPM-Solver in the
noise schedule during the sampling process. The sigmas and time steps are determined according to a sequence of `lambda(t)`.
use_noise_sampler for BrownianTreeNoiseSampler (only valid for `dpmsolver++2S`, `dpmsolver++sde`, `dpmsolver++2Msde`,
or `dpmsolver++3Msde`): A noise sampler backed by a torchsde increasing the stability of convergence. Default strategy
"betas" for step sizes in the noise schedule during the sampling process. Refer to [Beta
Sampling is All You Need](https://huggingface.co/papers/2407.12173) for more information.
use_noise_sampler for BrownianTreeNoiseSampler (only valid for `dpmsolver++2S`, `dpmsolver++sde`, `dpmsolver++2Msde`, or `dpmsolver++3Msde`.
A noise sampler backed by a torchsde increasing the stability of convergence. Default strategy
(random noise) has it jumping all over the place, but Brownian sampling is more stable. Utilizes the model generation seed provided.
midpoint_ratio (`float`, *optional*, range: 0.4 to 0.6, default=0.5): Only valid for (`dpmsolver++sde`, `dpmsolver++2S`).
Higher values may result in smoothing, more vivid colors and less noise at the expense of more detail and effect.
s_noise (`float`, *optional*, defaults to 1.0): Sigma noise strength: range 0 - 1.1 (only valid for `dpmsolver++2S`, `dpmsolver++sde`,
`dpmsolver++2Msde`, or `dpmsolver++3Msde`). The amount of additional noise to counteract loss of detail during sampling. A
reasonable range is [1.000, 1.011]. Defaults to 1.0 from the original implementation.
use_SD35_sigmas: (`bool` defaults to False for FLUX and True for SD3). Based on original interpretation of using beta values for determining sigmas.
use_beta_sigmas: (`bool` defaults to False for FLUX and True for SD3). Based on original interpretation of using beta values for determining sigmas.
use_dynamic_shifting (`bool` defaults to False for SD3 and True for FLUX). When `True`, shift is ignored.
shift (`float`, defaults to 3.0): The shift value for the timestep schedule for SD3 when not using dynamic shifting
The remaining args are specific to Flux's dynamic shifting based on resolution
shift (`float`, defaults to 3.0): The shift value for the timestep schedule for SD3 when not using dynamic shifting.
The remaining args are specific to Flux's dynamic shifting based on resolution.
"""
_compatibles = []
@ -143,10 +146,11 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
def __init__(
self,
num_train_timesteps: int = 1000,
beta_start: float = 0.00085,
beta_end: float = 0.012,
beta_schedule: str = "scaled linear",
trained_betas: Optional[Union[np.ndarray, List[float]]] = None,
solver_order: int = 2,
thresholding: Optional[bool] = False,
dynamic_thresholding_ratio: float = 0.995,
sample_max_value: Optional[float] = 1.0,
algorithm_type: str = "dpmsolver++2M",
solver_type: str = "midpoint",
sigma_schedule: Optional[str] = None,
@ -154,7 +158,7 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
midpoint_ratio: Optional[float] = 0.5,
s_noise: Optional[float] = 1.0,
use_noise_sampler: Optional[bool] = True,
use_SD35_sigmas: Optional[bool] = False,
use_beta_sigmas: Optional[bool] = False,
use_dynamic_shifting=False,
base_shift: Optional[float] = 0.5,
max_shift: Optional[float] = 1.15,
@ -168,6 +172,12 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
if solver_type not in ["midpoint", "heun"]:
raise NotImplementedError(f"{solver_type} is not implemented for {self.__class__}")
if sigma_schedule not in [None, "karras", "exponential", "lambdas", "betas"]:
raise NotImplementedError(f"{sigma_schedule} is not implemented for {self.__class__}")
if beta_schedule not in ["linear", "scaled linear"]:
raise NotImplementedError(f"{beta_schedule} is not implemented for {self.__class__}")
# setable values
timesteps = np.linspace(1, num_train_timesteps, num_train_timesteps, dtype=np.float32)[::-1].copy()
timesteps = torch.from_numpy(timesteps).to(dtype=torch.float32)
@ -186,8 +196,6 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
self._begin_index = None
self.sigmas = sigmas.to("cpu") # to avoid too much CPU/GPU communication
self.model_outputs = [None] * solver_order
self.sigma_min = self.sigmas[-1].item()
self.sigma_max = self.sigmas[0].item()
@property
def step_index(self):
@ -213,7 +221,7 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
"""
self._begin_index = begin_index
def time_shift(self, mu: float, sigma: float, t: torch.Tensor):
def time_shift(self, mu: float, sigma: float, t: torch.FloatTensor):
return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
def set_timesteps(self,
@ -235,25 +243,39 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
raise ValueError(" you have a pass a value for `mu` when `use_dynamic_shifting` is set to be `True`")
if sigmas is None:
self.use_SD35_sigmas = True
self.use_beta_sigmas = True
self.num_inference_steps = num_inference_steps
sigmas1 = np.linspace(1.0, 1 / num_inference_steps, num_inference_steps, dtype=np.float64)
beta_start = 0.00085
beta_end = 0.012
betas = torch.linspace(beta_start**0.5, beta_end**0.5, self.config.num_train_timesteps, dtype=torch.float64) ** 2
beta_start = self.config.beta_start
beta_end = self.config.beta_end
if self.config.trained_betas is not None:
betas = torch.tensor(self.config.trained_betas, dtype=torch.float64)
elif self.config.beta_schedule == "linear":
betas = torch.linspace(beta_start, beta_end, self.config.num_train_timesteps, dtype=torch.float64)
elif self.config.beta_schedule == "scaled linear":
# this schedule is very specific to the latent diffusion model.
betas = torch.linspace(beta_start**0.5, beta_end**0.5, self.config.num_train_timesteps, dtype=torch.float64) ** 2
else:
raise NotImplementedError(f"{self.config.beta_schedule} is not implemented for {self.__class__}")
alphas = 1.0 - betas
alphas_cumprod = torch.cumprod(alphas, dim=0)
sigmas = np.array(((1 - alphas_cumprod) / alphas_cumprod) ** 0.5)
del alphas_cumprod
del alphas
del betas
elif self.use_SD35_sigmas:
elif self.use_beta_sigmas:
num_inference_steps = len(sigmas)
self.num_inference_steps = num_inference_steps
sigmas1 = np.linspace(1.0, 1 / num_inference_steps, num_inference_steps, dtype=np.float64)
beta_start = 0.00085
beta_end = 0.012
betas = torch.linspace(beta_start**0.5, beta_end**0.5, self.config.num_train_timesteps, dtype=torch.float64) ** 2
beta_start = self.config.beta_start
beta_end = self.config.beta_end
if self.config.trained_betas is not None:
betas = torch.tensor(self.config.trained_betas, dtype=torch.float64)
elif self.config.beta_schedule == "linear":
betas = torch.linspace(beta_start, beta_end, self.config.num_train_timesteps, dtype=torch.float64)
elif self.config.beta_schedule == "scaled linear":
# this schedule is very specific to the latent diffusion model.
betas = torch.linspace(beta_start**0.5, beta_end**0.5, self.config.num_train_timesteps, dtype=torch.float64) ** 2
else:
raise NotImplementedError(f"{self.config.beta_schedule} is not implemented for {self.__class__}")
alphas = 1.0 - betas
alphas_cumprod = torch.cumprod(alphas, dim=0)
sigmas = np.array(((1 - alphas_cumprod) / alphas_cumprod) ** 0.5)
@ -265,7 +287,7 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
self.num_inference_steps = num_inference_steps
if self.config.sigma_schedule == "exponential":
if self.use_SD35_sigmas:
if self.use_beta_sigmas:
sigmas = np.flip(sigmas).copy()
sigma_min = sigmas[-1]
sigma_max = sigmas[0]
@ -273,13 +295,12 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
OldRange = sigma_max - sigma_min
NewRange = 1.0 - sigma_min
sigmas = (((sigmas - sigma_min) * NewRange) / OldRange) + sigma_min
del sigmas1
else:
sigma_min = sigmas[-1]
sigma_max = sigmas[0]
sigmas = self._convert_to_exponential(sigma_min, sigma_max, num_inference_steps=num_inference_steps)
elif self.config.sigma_schedule == "karras":
if self.use_SD35_sigmas:
if self.use_beta_sigmas:
sigmas = np.flip(sigmas).copy()
sigma_min = sigmas[-1]
sigma_max = sigmas[0]
@ -287,14 +308,13 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
OldRange = sigma_max - sigma_min
NewRange = 1.0 - sigma_min
sigmas = (((sigmas - sigma_min) * NewRange) / OldRange) + sigma_min
del sigmas1
else:
sigma_min = sigmas[-1]
sigma_max = sigmas[0]
sigmas = self._convert_to_karras(sigma_min, sigma_max, num_inference_steps=num_inference_steps)
sigmas = torch.from_numpy(sigmas).to(dtype=torch.float64, device=device)
elif self.config.sigma_schedule == "lambdas":
if self.use_SD35_sigmas:
if self.use_beta_sigmas:
log_sigmas = np.log(sigmas)
lambdas = np.flip(log_sigmas.copy())
lambdas = self._convert_to_lu(in_lambdas=lambdas, num_inference_steps=num_inference_steps)
@ -304,7 +324,6 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
OldRange = sigma_max - sigma_min
NewRange = 1.0 - sigma_min
sigmas = (((sigmas - sigma_min) * NewRange) / OldRange) + sigma_min
del sigmas1
del lambdas
del log_sigmas
else:
@ -315,12 +334,25 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
del lambdas
del log_sigmas
sigmas = torch.from_numpy(sigmas).to(dtype=torch.float64, device=device)
else:
if self.use_SD35_sigmas:
elif self.config.sigma_schedule == "betas":
if self.use_beta_sigmas:
sigmas = np.flip(sigmas).copy()
sigma_min = sigmas[-1]
sigma_max = sigmas[0]
sigmas = self._convert_to_beta(sigma_min, sigma_max, num_inference_steps=num_inference_steps, device=device)
OldRange = sigma_max - sigma_min
NewRange = 1.0 - sigma_min
sigmas = (((sigmas - sigma_min) * NewRange) / OldRange) + sigma_min
else:
sigmas = np.flip(sigmas).copy()
sigma_min = sigmas[-1]
sigmas = np.linspace(1.0, sigma_min, num_inference_steps)
sigmas = torch.from_numpy(sigmas).to(dtype=torch.float64, device=device)
else:
if self.use_beta_sigmas:
sigmas = np.flip(sigmas).copy()
sigma_min = sigmas[-1]
sigmas = np.linspace(1.0, sigma_min, num_inference_steps)
del sigmas1
sigmas = torch.from_numpy(sigmas).to(dtype=torch.float64, device=device)
if self.config.use_dynamic_shifting:
@ -339,39 +371,19 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
self._step_index = None
self._begin_index = None
# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler._threshold_sample
def _threshold_sample(self, sample: torch.Tensor) -> torch.Tensor:
"""
"Dynamic thresholding: At each sampling step we set s to a certain percentile absolute pixel value in xt0 (the
prediction of x_0 at timestep t), and if s > 1, then we threshold xt0 to the range [-s, s] and then divide by
s. Dynamic thresholding pushes saturated pixels (those near -1 and 1) inwards, thereby actively preventing
pixels from saturation at each step. We find that dynamic thresholding results in significantly better
photorealism as well as better image-text alignment, especially when using very large guidance weights."
https://arxiv.org/abs/2205.11487
"""
dtype = sample.dtype
batch_size, channels, *remaining_dims = sample.shape
if dtype not in (torch.float32, torch.float64):
sample = sample.float() # upcast for quantile calculation, and clamp not implemented for cpu half
# Flatten sample for doing quantile calculation along each image
sample = sample.reshape(batch_size, channels * np.prod(remaining_dims))
abs_sample = sample.abs() # "a certain percentile absolute pixel value"
s = torch.quantile(abs_sample, self.config.dynamic_thresholding_ratio, dim=1)
s = torch.clamp(
s, min=1, max=self.config.sample_max_value
) # When clamped to min=1, equivalent to standard clipping to [-1, 1]
s = s.unsqueeze(1) # (batch_size, 1) because clamp will broadcast along dim=0
sample = torch.clamp(sample, -s, s) / s # "we threshold xt0 to the range [-s, s] and then divide by s"
sample = sample.reshape(batch_size, channels, *remaining_dims)
sample = sample.to(dtype)
return sample
# Copied from diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler._convert_to_beta
def _convert_to_beta(self, sigma_min, sigma_max, num_inference_steps, device: Union[str, torch.device] = None, alpha: float = 0.6, beta: float = 0.6) -> torch.Tensor:
"""From "Beta Sampling is All You Need" [arXiv:2407.12173] (Lee et. al, 2024)"""
sigmas = torch.Tensor(
[
sigma_min + (ppf * (sigma_max - sigma_min))
for ppf in [
scipy.stats.beta.ppf(timestep, alpha, beta)
for timestep in 1 - np.linspace(0, 1, num_inference_steps).astype(np.float64)
]
]
).to(dtype=torch.float64, device=device)
return sigmas
def _convert_to_lu(self, in_lambdas: torch.Tensor, num_inference_steps) -> torch.Tensor:
"""Constructs the noise schedule of Lu et al. (2022)."""
@ -399,51 +411,6 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
sigmas = torch.linspace(math.log(sigma_max), math.log(sigma_min), num_inference_steps).exp()
return sigmas
def convert_model_output(
self,
model_output: torch.Tensor,
sample: torch.Tensor = None,
*args,
**kwargs,
) -> torch.Tensor:
"""
Convert the model output to the corresponding type the DPMSolver/DPMSolver++ algorithm needs. DPM-Solver is
designed to discretize an integral of the noise prediction model, and DPM-Solver++ is designed to discretize an
integral of the data prediction model.
<Tip>
The algorithm and model type are decoupled. You can use either DPMSolver or DPMSolver++ for both noise
prediction and data prediction models.
</Tip>
Args:
model_output (`torch.Tensor`):
The direct output from the learned diffusion model.
sample (`torch.Tensor`):
A current instance of a sample created by the diffusion process.
Returns:
`torch.Tensor`:
The converted model output.
"""
timestep = args[0] if len(args) > 0 else kwargs.pop("timestep", None)
if sample is None:
if len(args) > 1:
sample = args[1]
else:
raise ValueError("missing `sample` as a required keyward argument")
# Flow Match needs to solve an integral of the data prediction model.
sigma = self.sigmas[self.step_index]
x0_pred = sample - sigma * model_output
if self.config.thresholding:
x0_pred = self._threshold_sample(x0_pred)
return x0_pred
def index_for_timestep(self, timestep, schedule_timesteps=None):
if schedule_timesteps is None:
schedule_timesteps = self.timesteps
@ -511,7 +478,9 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
if self.config.algorithm_type in ["dpmsolver2", "dpmsolver2A"]:
pass
else:
model_output = self.convert_model_output(model_output, sample=sample)
# Flow Match needs to solve an integral of the data prediction model.
sigma = self.sigmas[self.step_index]
model_output = sample - sigma * model_output
for i in range(self.config.solver_order - 1):
self.model_outputs[i] = self.model_outputs[i + 1]
self.model_outputs[-1] = model_output
@ -830,7 +799,7 @@ class FlowMatchDPMSolverMultistepScheduler(SchedulerMixin, ConfigMixin):
return FlowMatchDPMSolverMultistepSchedulerOutput(prev_sample=prev_sample)
def scale_model_input(self, sample: torch.Tensor, *args, **kwargs) -> torch.Tensor:
def scale_model_input(self, sample: torch.FloatTensor, *args, **kwargs) -> torch.FloatTensor:
"""
Ensures interchangeability with schedulers that need to scale the denoising model input depending on the
current timestep.

View File

@ -92,7 +92,7 @@ def detect_pipeline(f: str, op: str = 'model', warning=True, quiet=False):
guess = 'Stable Diffusion 3'
if 'flux' in f.lower():
guess = 'FLUX'
if size > 11000 and size < 20000:
if size > 11000 and size < 16000:
warn(f'Model detected as FLUX UNET model, but attempting to load a base model: {op}={f} size={size} MB')
# switch for specific variant
if guess == 'Stable Diffusion' and 'inpaint' in f.lower():
@ -112,8 +112,9 @@ def detect_pipeline(f: str, op: str = 'model', warning=True, quiet=False):
if keys is not None and len(keys) > 0:
modules = model_tools.list_to_dict(keys)
modules = model_tools.remove_entries_after_depth(modules, 3)
lst = model_tools.list_compact(keys)
t1 = time.time()
shared.log.debug(f'Autodetect modules: {modules} time={t1-t0:.2f}')
shared.log.debug(f'Autodetect: modules={modules} list={lst} time={t1-t0:.2f}')
except Exception as e:
shared.log.error(f'Autodetect {op}: file="{f}" {e}')
if debug_load:

View File

@ -504,6 +504,8 @@ def move_model(model, device=None, force=False):
module.to_empty(device=device)
elif 'enable_sequential_cpu_offload' in str(e0):
pass # ignore model move if sequential offload is enabled
elif 'Params4bit' in str(e0) or 'Params8bit' in str(e0):
pass # ignore model move if quantization is enabled
else:
raise e0
t1 = time.time()
@ -819,6 +821,7 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No
if model_type is None:
shared.log.error(f'Load {op}: pipeline={shared.opts.diffusers_pipeline} not detected')
return
vae_file = None
if model_type.startswith('Stable Diffusion') and (op == 'model' or op == 'refiner'): # preload vae for sd models
vae_file, vae_source = sd_vae.resolve_vae(checkpoint_info.filename)
vae = sd_vae.load_vae_diffusers(checkpoint_info.path, vae_file, vae_source)
@ -897,7 +900,7 @@ def load_diffuser(checkpoint_info=None, already_loaded_state_dict=None, timer=No
set_diffuser_offload(sd_model, op)
if op == 'model' and not (os.path.isdir(checkpoint_info.path) or checkpoint_info.type == 'huggingface'):
if getattr(shared.sd_model, 'sd_checkpoint_info', None) is not None:
if getattr(shared.sd_model, 'sd_checkpoint_info', None) is not None and vae_file is not None:
sd_vae.apply_vae_config(shared.sd_model.sd_checkpoint_info.filename, vae_file, sd_model)
if op == 'refiner' and shared.opts.diffusers_move_refiner:
shared.log.debug('Moving refiner model to CPU')

View File

@ -80,13 +80,13 @@ config = {
'DPM++ Cosine': { 'solver_order': 2, 'sigma_schedule': "exponential", 'prediction_type': "v-prediction" },
'DPM SDE': { 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'use_beta_sigmas': False, 'noise_sampler_seed': None, 'timestep_spacing': 'linspace', 'steps_offset': 0, },
'DPM2 FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_SD35_sigmas': False, 'algorithm_type': 'dpmsolver2', 'use_noise_sampler': True },
'DPM2a FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_SD35_sigmas': False, 'algorithm_type': 'dpmsolver2A', 'use_noise_sampler': True },
'DPM2++ 2M FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_SD35_sigmas': False, 'algorithm_type': 'dpmsolver++2M', 'use_noise_sampler': True },
'DPM2++ 2S FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_SD35_sigmas': False, 'algorithm_type': 'dpmsolver++2S', 'use_noise_sampler': True },
'DPM2++ SDE FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_SD35_sigmas': False, 'algorithm_type': 'dpmsolver++sde', 'use_noise_sampler': True },
'DPM2++ 2M SDE FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_SD35_sigmas': False, 'algorithm_type': 'dpmsolver++2Msde', 'use_noise_sampler': True },
'DPM2++ 3M SDE FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 3, 'sigma_schedule': None, 'use_SD35_sigmas': False, 'algorithm_type': 'dpmsolver++3Msde', 'use_noise_sampler': True },
'DPM2 FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_beta_sigmas': False, 'algorithm_type': 'dpmsolver2', 'use_noise_sampler': True },
'DPM2a FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_beta_sigmas': False, 'algorithm_type': 'dpmsolver2A', 'use_noise_sampler': True },
'DPM2++ 2M FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_beta_sigmas': False, 'algorithm_type': 'dpmsolver++2M', 'use_noise_sampler': True },
'DPM2++ 2S FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_beta_sigmas': False, 'algorithm_type': 'dpmsolver++2S', 'use_noise_sampler': True },
'DPM2++ SDE FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_beta_sigmas': False, 'algorithm_type': 'dpmsolver++sde', 'use_noise_sampler': True },
'DPM2++ 2M SDE FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 2, 'sigma_schedule': None, 'use_beta_sigmas': False, 'algorithm_type': 'dpmsolver++2Msde', 'use_noise_sampler': True },
'DPM2++ 3M SDE FlowMatch': { 'shift': 1, 'use_dynamic_shifting': False, 'solver_order': 3, 'sigma_schedule': None, 'use_beta_sigmas': False, 'algorithm_type': 'dpmsolver++3Msde', 'use_noise_sampler': True },
'Heun': { 'use_beta_sigmas': False, 'use_karras_sigmas': False, 'use_exponential_sigmas': False, 'timestep_spacing': 'linspace' },
'Heun FlowMatch': { 'timestep_spacing': "linspace", 'shift': 1 },
@ -236,8 +236,8 @@ class DiffusionSampler:
if 'use_dynamic_shifting' in self.config:
if 'Flux' in model.__class__.__name__:
self.config['use_dynamic_shifting'] = shared.opts.schedulers_dynamic_shift
if 'use_SD35_sigmas' in self.config:
self.config['use_SD35_sigmas'] = 'StableDiffusion3' in model.__class__.__name__
if 'use_beta_sigmas' in self.config:
self.config['use_beta_sigmas'] = 'StableDiffusion3' in model.__class__.__name__
if 'rescale_betas_zero_snr' in self.config:
self.config['rescale_betas_zero_snr'] = shared.opts.schedulers_rescale_betas
if 'timestep_spacing' in self.config and shared.opts.schedulers_timestep_spacing != 'default' and shared.opts.schedulers_timestep_spacing is not None:

View File

@ -722,7 +722,7 @@ options_templates.update(options_section(('saving-images', "Image Options"), {
"image_watermark_image": OptionInfo('', "Image watermark file"),
}))
options_templates.update(options_section(('saving-paths', "Image Naming & Paths"), {
options_templates.update(options_section(('saving-paths', "Image Paths"), {
"saving_sep_images": OptionInfo("<h2>Save options</h2>", "", gr.HTML),
"save_images_add_number": OptionInfo(True, "Numbered filenames", component_args=hide_dirs),
"use_original_name_batch": OptionInfo(True, "Batch uses original name"),

View File

@ -1,5 +1,4 @@
import os
import sys
import time
import datetime
from modules.errors import log
@ -120,8 +119,8 @@ class State:
def end(self, api=None):
import modules.devices
if self.time_start is None: # someone called end before being
fn = f'{sys._getframe(2).f_code.co_name}:{sys._getframe(1).f_code.co_name}' # pylint: disable=protected-access
log.debug(f'Access state.end: {fn}') # pylint: disable=protected-access
# fn = f'{sys._getframe(2).f_code.co_name}:{sys._getframe(1).f_code.co_name}' # pylint: disable=protected-access
# log.debug(f'Access state.end: {fn}') # pylint: disable=protected-access
self.time_start = time.time()
if self.debug_output:
log.debug(f'State end: {self.job} time={time.time() - self.time_start:.2f}')

View File

@ -276,7 +276,7 @@ def create_sampler_options(tabname):
else: # shared.native
with gr.Row(elem_classes=['flex-break']):
sampler_sigma = gr.Dropdown(label='Sigma method', elem_id=f"{tabname}_sampler_sigma", choices=['default', 'karras', 'beta', 'exponential', 'lambdas'], value=shared.opts.schedulers_sigma, type='value')
sampler_sigma = gr.Dropdown(label='Sigma method', elem_id=f"{tabname}_sampler_sigma", choices=['default', 'karras', 'betas', 'exponential', 'lambdas'], value=shared.opts.schedulers_sigma, type='value')
sampler_spacing = gr.Dropdown(label='Timestep spacing', elem_id=f"{tabname}_sampler_spacing", choices=['default', 'linspace', 'leading', 'trailing'], value=shared.opts.schedulers_timestep_spacing, type='value')
with gr.Row(elem_classes=['flex-break']):
sampler_beta = gr.Dropdown(label='Beta schedule', elem_id=f"{tabname}_sampler_beta", choices=['default', 'linear', 'scaled', 'cosine'], value=shared.opts.schedulers_beta_schedule, type='value')

View File

@ -115,7 +115,7 @@ axis_options = [
AxisOption("[Process] Server options", str, apply_options),
AxisOptionTxt2Img("[Sampler] Name", str, apply_sampler, fmt=format_value_add_label, confirm=confirm_samplers, choices=lambda: [x.name for x in sd_samplers.samplers]),
AxisOptionImg2Img("[Sampler] Name", str, apply_sampler, fmt=format_value_add_label, confirm=confirm_samplers, choices=lambda: [x.name for x in sd_samplers.samplers_for_img2img]),
AxisOption("[Sampler] Sigma method", str, apply_setting("schedulers_sigma"), choices=lambda: ['default', 'karras', 'beta', 'exponential', 'lambdas']),
AxisOption("[Sampler] Sigma method", str, apply_setting("schedulers_sigma"), choices=lambda: ['default', 'karras', 'betas', 'exponential', 'lambdas']),
AxisOption("[Sampler] Timestep spacing", str, apply_setting("schedulers_timestep_spacing"), choices=lambda: ['default', 'linspace', 'leading', 'trailing']),
AxisOption("[Sampler] Timestep range", int, apply_setting("schedulers_timesteps_range")),
AxisOption("[Sampler] Solver order", int, apply_setting("schedulers_solver_order")),