See https://arxiv.org/abs/2303.06296
This adds an option to reparametrize the model weights using the spectral norm so that the overall norm of each weight can't change. This helps to stabilize training at high learning rates.
DREAM adds an additional forward pass during training to make the model more robust to errors in early sampling steps. See http://arxiv.org/abs/2312.00210