Batch size
Batch size increases speed, but requires more VRAM
Higher batch size might need a higher learning rate
Grad Accumulation
Grad size 3, should be on paper, similar to batch 3
Grad 3 batch 1, will do 3 batches of size 1 but only apply the learning at the end of the 3 iteration.
It will be the same speed as batch 1, but should have the training result of batch 3
So grad 3 batch 1 has an equivalent batch size of 3, training wise
Equivalent Batch Size
Grad 3 batch 2 => equivalent batch size 6
Gradient accumulation allows to replicate the results of high batch sizes(think of 48+ GB graphic card) on low VRAM environment.
The trade off is speed.
!!You want the equivalent batch size to be able to divide the training images and leave no remainder!!
For example, for 77 images with no class images
your only batch options are
| Batch Size | Grad Size | Equivalent |
|---|---|---|
| 1 | 1 | 1 |
| 1 | 7 | 7 |
| 1 | 11 | 11 |
| 1 | 77 | 77 |
| 7 | 1 | 7 |
| 7 | 11 | 77 |
| 11 | 1 | 11 |
| 11 | 7 | 77 |
| 77 | 1 | 77 |
Batch size suggestions
If speed is the main focus and VRAM is plenty, go for the highest batch size you are able to run(leaving no remainder).
High Batch Size
Training at high batch sizes (or equivalent) will produce a training that assimilates the features of the instance images more deeply.
This is good for style but might results on weird generations.
elephant at equivalent batch size 150(15*10) trained at 1.5e-4 (NO TENC)(150 captioned images, no class images used)

Low Batch Size
Low batch size (or equivalent) will produce images that usually maintain a higher integrity.
This is good when training on pictures of yourself, or on a specific object.
Wiki
Getting Started
Advanced Stuffs
- Class explained
- All settings explained
- API
- Batch Size
- Gradient Accumulation
- Learning Rate Scheduler
- Warmup
Troubleshooting