It is a depth aware extension that can help to create multiple complex subjects on a single image. It generates a background, then multiple foreground subjects, cuts their backgrounds after a depth analysis, paste them onto the background and finally does an img2img for a clean finish.

automatic1111 stable-diffusion stable-diffusion-webui stable-diffusion-webui-plugin

Go to file

Extraltodeus af012cdcc6 Create FUNDING.yml		2022-11-24 18:33:53 +01:00
.github	Create FUNDING.yml	2022-11-24 18:33:53 +01:00
scripts	Rename multi-renderer/scripts/multirender.py to scripts/multirender.py	2022-11-24 18:24:16 +01:00
README.md	Update README.md	2022-11-24 18:31:18 +01:00
install.py	Rename multi-renderer/install.py to install.py	2022-11-24 18:24:32 +01:00

README.md

multi-subject-render

Generate multiple complex subjects all at once!

Made as a script for the AUTOMATIC1111/stable-diffusion-webui repository.

_{Kyaaaaaaaaaaaaaaaaa!}

Jump to examples!

💥 Installation 💥

Run that command from the webui directory :

git clone https://github.com/isl-org/MiDaS.git repositories/midas

Copy the two scripts from this repository (so that one you're reading right now) into your scripts folder.

Alternatively you can just copy the url of that repository into the extension tab :

OR copy that repository in your extension folder :

You might need to restart the whole UI. Maybe twice.

The look

_{OK I know that's a big screenshot}

How the hell does this works?

First it creates your background image, then your foreground subjects, then does a depth analysis on them, cut their backgrounds, paste them onto your background and then does an img2img for a smooth blend!

^{It will cut around that lady with scissors made of code.}

Explanations of the different UI elements

I will only explain the not so obvious things because I spent enough time making that thing already.

First off, your usual UI will be for your initial background. So your normal prompt will probably be something like "a beach, sunny sky" etc.

For my example I decided to generate a bowling alley at 512x512 pixels :

Your foreground subjects will be described in that text box.
You case use wildcards.
If you only use the first line, that line will be used for every foreground subject that will be generated.
If you use multiple lines, each line will be used for each foreground subject.

_{Note : if you do that, you will need as many lines as foreground images generated.}

For my example I made tree penguins :

That's how much the seed will be incremented at each image. If you set it to 0 you will get the same foregrounds every time. Probably not what you want unless you use the Extra option in your main UI and "Variation strength".

You can use a different sampler for the foregrounds. As well as a different CLIP value.

The final blend is there to either make a smooth pass over your collage or to make something more complex / add details to your combination.
You can use different settings and samplers for your final blend. Make as you wish. The CLIP value will be the one you've set in your settings tab. Not the one for the foregrounds. So you can decide if you prefer one way or the other.

_{The are not really playing bowling because you need fingers. They're just here for trouble.}

An important part is to set the final blend width. Your initial background will be stretched to that size so you don't really need to make it initially big. Your foregrounds subjects will be pasted onto your stretched background before the final blend. Not wide enough and you will end up having too many characters at the same spot.

The scary miscellaneous options :

The foreground distance from center multiplier will make your characters closer together if you select a lower value, further with a higher one. I usually stick in between 0.8 and 1.0
Foreground Y shift : the center character will always be at the same height. The you multiply the value of that slider by the position of the foreground subject from the center. That gives you how many pixels lower they will be. Think about some super hero movie poster with the sidies slightly lower. That's what this slider does.
Foreground depth cut treshold is the scary one. At 0 the backgrounds of your foregrounds subjects will be opaque. At 255 the entire foreground will be transparent. The best values are in between 50-60 for cartoon-like characters and 90-100 for photorealistic subjects. Too much and they lose their heads, not enough and you get some rock that were sitting on in your final blend.
Random superposition : the default is to have the center character in front. if you enable that it might not be the case anymore. That's a cool option depending on what you want to do.
The center character will be behind the others. If you use the previous option this one becomes useless.
face correction is only for the final blend. If you want that on every foreground subjects, set it in your main UI. It think it's best to enable both if you make photorealistic stuff.

Tips and tricks :

using (bokeh) and (F1.8:1.2) will make blurry backgrounds which will make it easier for the depth analysis to do a clean cut of the backgrounds.
"wide angle" in your prompt will give your more chances to have characters that won't be cropped
"skin details" or "detailed skin" raises the chances of having close-ups. I prefer to avoid.
Not enough denoising/steps on your final blend will make it look like you used scissors on your moms Vogue catalogue and pasted the ladies onto your dads Lord of the Rings favorite poster. Don't do that.
Too much denoising/steps might make the characters all look the same. It's all about finding the right middle value for your needs.
Making your foreground subjects have less height than the final image might make them look cropped.

Known issues

It does only render the final blend to the UI. You have to save the images (like in the settings you just don't uncheck that "save all images" checkbox and you're good).
There can be bugs.

Credits

Thanks to thygate for letting me blatantly copy-paste some of his functions for the depth analysis.

A few more examples

An attempt at recreating the "Distracted boyfriend" meme. Without influencing the directions in which they are looking. 100% txt2img.

_{I messed up the order on the last one.}

_{Aren't they cute without oxygen?}

_{Of course you can make a harem just for yourself.}

_{MOAR KITTENS}

Now a few more groups of "super heroes" from the same batch as the first image here. Except maybe for the portraits.

Wrong settings examples

_{This is what too low denoising on the final blend looks like. Yuk!}

_{Same issue here. Looks like a funny kid collage. Grandma will love it because you typed your prompts with love and she knows it.}

_{Guess why I had to censor the lowest part. This is how too much denoising looks like. They look all the same.}