Smart pre-process including auto subject identification, caption subject swapping, and upscaling/facial restoration.
 
 
Go to file
Martino Bettucci 231cda38df
Update smart_process.js
Signed-off-by: Martino Bettucci <martinobettucci@users.noreply.github.com>
2023-01-19 13:42:26 +01:00
data Super Update 2022-12-10 17:33:34 -06:00
javascript Update smart_process.js 2023-01-19 13:42:26 +01:00
scripts Better Caption split/join 2023-01-03 11:39:09 -06:00
.gitattributes Initial commit 2022-11-11 14:34:40 -06:00
.gitignore Initial commit 2022-11-11 14:34:40 -06:00
README.md Code cleanup, fixes 2022-12-15 13:48:36 -06:00
clipcrop.py Add min/max CLIP length, don't adjust image size unless we tell it to 2023-01-02 20:59:28 -06:00
clipinterrogator.py Add min/max CLIP length, don't adjust image size unless we tell it to 2023-01-02 20:59:28 -06:00
dbimutils.py Super Update 2022-12-10 17:33:34 -06:00
install.py Super Update 2022-12-10 17:33:34 -06:00
interrogator.py Code cleanup, fixes 2022-12-15 13:48:36 -06:00
reallysafe.py Add min/max CLIP length, don't adjust image size unless we tell it to 2023-01-02 20:59:28 -06:00
requirements.txt Super Update 2022-12-10 17:33:34 -06:00
smartprocess.py Better Caption split/join 2023-01-03 11:39:09 -06:00
style.css Initial Commit 2022-11-11 14:35:46 -06:00

README.md

Stable Diffusion WebUI Smart Pre-Processing Extension

What is this??

As the name would imply, this is an extension for the Stable-Diffusion WebUI by @Automatic1111

What does this do?

It does a few things, actually.

For starters, it utilizes a combination of BLIP/CLIP and YOLOv5 to provide "smart cropping" for images. The primary subject of each image is identified, the center of that subject is determined, and then the application tries it's best to crop the image so as to keep as much of the subject as possible within the dimensions specified.

Second, it allows storing the determined image caption directly to the image filename, versus having to create a txt file along side every image. You can still create a txt file, use existing captions, or not do any captioning at all.

Third, I've provided face restoration and upscaling options for input images. You can select from GFPGAN and Codeformer for face restoration, and any of the provided upscalers from the "extras' tab to refine/smooth/add detail to your final output images.

Last, but not least, it offers a rudimentary way to swap the "class" of a captioned image with the specific keyword in the image. So, if you're trying to train a subject called "xyz" and "xyz" is a dog, you can easily swap "dog" (and "a dog") wth "xyz" in your captions. Neato!

Smart Cropping

As I said above, smart cropping utilizes a combination of YOLOV5 object recognition and BLIP/CLIP (and DeepDanBooru) captioning to automatically determine the most prominent subject in a photo, and automatically crop the subject as completely as possible. You can also specify a specific subject (dog/cat/woman/house) for the software to find, and skip the YOLOV5 detection entirely.

If a subject is not found, the image will be downscaled and cropped from the center.

Smart Captioning

This uses all the same features as set in user preferences, with the additional options to save to txt or append to the image file name.

Additionally, you can swap the generic "class" of the image with a specific subject keyword. This feature may not be perfect in all cases, but it should still go a long way in speeding up the captioning process.

You can also specify a maximum caption length, which will split the caption by spaces and append words until the maximum length is reached.

Post Processing

It's basically a simplified version of the "extras" tab. The idea is that you can do facial restoration and/or use a model like swinIR or LDSR to smooth or add details to an image. If an image is "actually" upscaled beyond the target crop size, it will be downscaled again back to the original size.