Merge branch 'pr/48'
commit
3e50472878
132
docs/MANUAL.md
132
docs/MANUAL.md
|
|
@ -372,6 +372,130 @@ If the given `path` ends with the `*` wildcard, `[init_image]` will choose a ran
|
|||
[init_image "C:/pictures/my_image.png"]
|
||||
```
|
||||
|
||||
### [invert_mask]
|
||||
Inverts the mask. Great in combination with `[txt2mask]` and `[instance2mask]`.
|
||||
|
||||
### [instance2mask]
|
||||
Uses Mask R-CNN (an instance segmentation model) to predict instances. The found instances are mask. Different from `[txt2mask]` as it allows to run the inpainting for each found instance individually. This is useful, when using high resolution inpainting. This shortcode only works in the img2img tab of the A1111 WebUI.
|
||||
**Important:** If per_instance is used it is assumed to be the last operator changing the mask.
|
||||
|
||||
The supported classes of instances are:
|
||||
- `person`
|
||||
- `bicycle`
|
||||
- `car`
|
||||
- `motorcycle`
|
||||
- `airplane`
|
||||
- `bus`
|
||||
- `train`
|
||||
- `truck`
|
||||
- `boat`
|
||||
- `traffic light`
|
||||
- `fire hydrant`
|
||||
- `stop sign`
|
||||
- `parking meter`
|
||||
- `bench`
|
||||
- `bird`
|
||||
- `cat`
|
||||
- `dog`
|
||||
- `horse`
|
||||
- `sheep`
|
||||
- `cow`
|
||||
- `elephant`
|
||||
- `bear`
|
||||
- `zebra`
|
||||
- `giraffe`
|
||||
- `backpack`
|
||||
- `umbrella`
|
||||
- `handbag`
|
||||
- `tie`
|
||||
- `suitcase`
|
||||
- `frisbee`
|
||||
- `skis`
|
||||
- `snowboard`
|
||||
- `sports ball`
|
||||
- `kite`
|
||||
- `baseball bat`
|
||||
- `baseball glove`
|
||||
- `skateboard`
|
||||
- `surfboard`
|
||||
- `tennis racket`
|
||||
- `bottle`
|
||||
- `wine glass`
|
||||
- `cup`
|
||||
- `fork`
|
||||
- `knife`
|
||||
- `spoon`
|
||||
- `bowl`
|
||||
- `banana`
|
||||
- `apple`
|
||||
- `sandwich`
|
||||
- `orange`
|
||||
- `broccoli`
|
||||
- `carrot`
|
||||
- `hot dog`
|
||||
- `pizza`
|
||||
- `donut`
|
||||
- `cake`
|
||||
- `chair`
|
||||
- `couch`
|
||||
- `potted plant`
|
||||
- `bed`
|
||||
- `dining table`
|
||||
- `toilet`
|
||||
- `tv`
|
||||
- `laptop`
|
||||
- `mouse`
|
||||
- `remote`
|
||||
- `keyboard`
|
||||
- `cell phone`
|
||||
- `microwave`
|
||||
- `oven`
|
||||
- `toaster`
|
||||
- `sink`
|
||||
- `refrigerator`
|
||||
- `book`
|
||||
- `clock`
|
||||
- `vase`
|
||||
- `scissors`
|
||||
- `teddy bear`
|
||||
- `hair drier`
|
||||
- `toothbrush`
|
||||
|
||||
Supports the `mode` argument which determines how the text mask will behave alongside a brush mask:
|
||||
- `add` will overlay the two masks. This is the default value.
|
||||
- `discard` will ignore the brush mask entirely.
|
||||
- `subtract` will remove the brush mask region from the text mask region.
|
||||
- `refine` will limit the inital mask to the selected instances.
|
||||
|
||||
Supports the optional `mask_precision` argument which determines the confidence of the instance mask. Default is 0.5, max value is 1.0. Lowering this value means you may select more than you intend per instance (instances may overlap).
|
||||
|
||||
Supports the optional `instance_precision` argument which determines the classification thresshold for instances to be masked. Reduce this, if instances are not detected successfully. Default is 0.85, max value is 1.0. Lowering this value can lead to wrongly classied areas.
|
||||
|
||||
Supports the optional `padding` argument which increases the radius of the instance masks by a given number of pixels.
|
||||
|
||||
Supports the optional `smoothing` argument which refines the boundaries of the mask, allowing you to create a smoother selection. Default is 0. Try a value of 20 or greater if you find that your masks are blocky.
|
||||
|
||||
Supports the optional `select` argument which defines how many instances to mask. Default value is 0, which means all instances.
|
||||
|
||||
Supports the optional `select_mode` argument which specifies which instances are selected:
|
||||
- `overlap` will select the instances starting with the instance that has the greatest absolute brushed mask in it.
|
||||
- `overlap relative` behaves similar to `overlap` but normalizes the areas by the size of the instance.
|
||||
- `greatest area` will select the greatest instances by pixels first.
|
||||
- `random` will select instances in a random order
|
||||
Defaults to `overlap`.
|
||||
|
||||
Supports the optional `show` positional argument which will append the final masks to your generation output window and for debug purposes a combined instance segmentation image.
|
||||
|
||||
Supports the optional `per_instance` positional argument which will render and append the selected masks individually. Leading to better results if full resolution inpainting is used.
|
||||
|
||||
```
|
||||
[instance2mask]clock[/txt2mask]
|
||||
```
|
||||
|
||||
### [support_multiple]
|
||||
is a helper shortcode that should be used if multiple init images, multiple masks or in combination with instance2mask per_instance should be used. Use this shortcode at the very end of the prompt, such that it can gather the correct init images and masks. Note that this operator will change the batch_size and batch_count (n_iter).
|
||||
|
||||
|
||||
### [txt2mask]
|
||||
|
||||
A port of [the script](https://github.com/ThereforeGames/txt2mask) by the same name, `[txt2mask]` allows you to create a region for inpainting based only on the text content (as opposed to the brush tool.) This shortcode only works in the img2img tab of the A1111 WebUI.
|
||||
|
|
@ -383,10 +507,16 @@ Supports the `mode` argument which determines how the text mask will behave alon
|
|||
|
||||
Supports the optional `precision` argument which determines the confidence of the mask. Default is 100, max value is 255. Lowering this value means you may select more than you intend.
|
||||
|
||||
Supports the optional `neg_precision` argument which determines the confidence of the negative mask. Default is 100, max value is 255. Lowering this value means you may select more than you intend.
|
||||
|
||||
Supports the optional `padding` argument which increases the radius of your selection by a given number of pixels.
|
||||
|
||||
Supports the optional `neg_padding` which is the same as `padding` but for the negative prompts.
|
||||
|
||||
Supports the optional `smoothing` argument which refines the boundaries of the mask, allowing you to create a smoother selection. Default is 0. Try a value of 20 or greater if you find that your masks are blocky.
|
||||
|
||||
Supports the optional `neg_smoothing` which is the same as `smoothing` but for the negative prompts.
|
||||
|
||||
Supports the optional `size_var` argument which will cause the shortcode to calculate the region occupied by your mask selection as a percentage of the total canvas. That value is stored into the variable you specify. For example: `[txt2mask size_var=test]face[/txt2mask]` if "face" takes up 40% of the canvas, then the `test` variable will become 0.4.
|
||||
|
||||
Supports the optional `negative_mask` argument which will subtract areas from the content mask.
|
||||
|
|
@ -935,4 +1065,4 @@ Note that variables are automatically deleted at the end of each run - you do **
|
|||
```
|
||||
[set var_a=10 var_b="something"]
|
||||
[unset var_a var_b]
|
||||
```
|
||||
```
|
||||
|
|
|
|||
|
|
@ -0,0 +1,181 @@
|
|||
from torchvision.transforms.functional import to_pil_image, to_tensor
|
||||
from torchvision.utils import draw_segmentation_masks
|
||||
import torch
|
||||
from torchvision.transforms.functional import to_pil_image, pil_to_tensor
|
||||
from modules.processing import process_images,Processed, StableDiffusionProcessingImg2Img
|
||||
|
||||
class Shortcode():
|
||||
def __init__(self,Unprompted):
|
||||
self.Unprompted = Unprompted
|
||||
self.image_mask = None
|
||||
self.image_masks = None
|
||||
self.show = False
|
||||
self.per_instance = False
|
||||
self.description = "Creates an image mask from instances of types specified by the content for use with inpainting."
|
||||
|
||||
def run_block(self, pargs, kwargs, context, content):
|
||||
from torchvision.models.detection import maskrcnn_resnet50_fpn_v2, MaskRCNN_ResNet50_FPN_V2_Weights
|
||||
from kornia.morphology import dilation, erosion
|
||||
from kornia.filters import box_blur
|
||||
|
||||
if "init_images" not in self.Unprompted.shortcode_user_vars:
|
||||
return
|
||||
|
||||
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
||||
|
||||
|
||||
self.show = True if "show" in pargs else False
|
||||
self.per_instance = True if "per_instance" in pargs else False
|
||||
|
||||
brush_mask_mode = self.Unprompted.parse_advanced(kwargs["mode"],context) if "mode" in kwargs else "add"
|
||||
select_mode = self.Unprompted.parse_advanced(kwargs["select_mode"],context) if "select_mode" in kwargs else "overlap"
|
||||
|
||||
smoothing_kernel = None
|
||||
smoothing = int(self.Unprompted.parse_advanced(kwargs["smoothing"],context)) if "smoothing" in kwargs else 20
|
||||
|
||||
if smoothing > 0:
|
||||
smoothing_kernel = torch.ones(1, smoothing, smoothing, device=device)/(smoothing**2)
|
||||
|
||||
# Pad the mask by applying a dilation or erosion
|
||||
mask_padding = int(self.Unprompted.parse_advanced(kwargs["padding"],context) if "padding" in kwargs else 0)
|
||||
padding_dilation_kernel = None
|
||||
if (mask_padding != 0):
|
||||
padding_dilation_kernel = torch.ones(abs(mask_padding), abs(mask_padding), device=device)
|
||||
|
||||
prompts = content.split(self.Unprompted.Config.syntax.delimiter)
|
||||
prompt_parts = len(prompts)
|
||||
|
||||
mask_precision = min(1.0,float(self.Unprompted.parse_advanced(kwargs["mask_precision"],context) if "mask_precision" in kwargs else 0.5))
|
||||
instance_precision = min(1.0,float(self.Unprompted.parse_advanced(kwargs["instance_precision"],context) if "instance_precision" in kwargs else 0.85))
|
||||
num_instances = int(self.Unprompted.parse_advanced(kwargs["select"],context) if "select" in kwargs else 0)
|
||||
|
||||
init_image = self.Unprompted.shortcode_user_vars["init_images"][0]
|
||||
|
||||
masks = self.Unprompted.shortcode_user_vars.setdefault("image_mask", None)
|
||||
if masks is not None:
|
||||
masks = pil_to_tensor(self.Unprompted.shortcode_user_vars["image_mask"].convert('L').resize((512, 512))) > 0
|
||||
else:
|
||||
masks = torch.zeros(512, 512, dtype=torch.bool)
|
||||
|
||||
weights = MaskRCNN_ResNet50_FPN_V2_Weights.DEFAULT
|
||||
transforms = weights.transforms()
|
||||
model = maskrcnn_resnet50_fpn_v2(weights=weights, progress=False).eval().to(device=device)
|
||||
|
||||
image = init_image
|
||||
image = init_image.resize((512, 512))
|
||||
image = transforms(image)
|
||||
|
||||
pred = model(image[None].to(device=device))[0]
|
||||
|
||||
target_labels = [weights.meta["categories"].index(i) for i in prompts]
|
||||
wanted_masks = torch.tensor([label in target_labels for label in pred["labels"]], device=device)
|
||||
likely_masks = (pred["scores"] > instance_precision)
|
||||
instance_masks:torch.Tensor = pred["masks"][likely_masks & wanted_masks]
|
||||
|
||||
instance_masks = instance_masks.float()
|
||||
|
||||
if mask_padding != 0:
|
||||
if mask_padding > 0:
|
||||
instance_masks = dilation(instance_masks, kernel=padding_dilation_kernel)
|
||||
else:
|
||||
instance_masks = erosion(instance_masks, kernel=padding_dilation_kernel)
|
||||
|
||||
if smoothing != 0:
|
||||
instance_masks = box_blur(instance_masks, (smoothing, smoothing))
|
||||
|
||||
instance_masks = instance_masks > mask_precision
|
||||
instance_masks = instance_masks > 0
|
||||
instance_masks = instance_masks.cpu()
|
||||
|
||||
if num_instances > 0:
|
||||
if "overlap" in select_mode:
|
||||
# select the instance with the highest overlay on the mask
|
||||
mask_in_instance = masks[None].broadcast_to(instance_masks.shape).clone()
|
||||
# count only parts of mask that are in instance mask
|
||||
mask_in_instance[~instance_masks] = 0
|
||||
overlap = mask_in_instance.count_nonzero(dim=[1,2,3])
|
||||
|
||||
if select_mode == "relative overlap":
|
||||
overlap = overlap / instance_masks.count_nonzero(dim=[1,2,3])
|
||||
|
||||
val, idx = torch.topk(overlap, k=num_instances)
|
||||
instance_masks = instance_masks[idx]
|
||||
|
||||
elif select_mode == "greatest area":
|
||||
# select the instance with the greatest mask
|
||||
val, idx = torch.topk(instance_masks.count_nonzero(dim=[1,2,3]), k=num_instances)
|
||||
instance_masks = instance_masks[idx]
|
||||
elif select_mode == "random":
|
||||
idx = torch.randperm(len(instance_masks))[:num_instances]
|
||||
instance_masks = instance_masks[idx]
|
||||
|
||||
if num_instances > 0:
|
||||
instance_masks = instance_masks.sum(dim=0)
|
||||
else:
|
||||
instance_masks = instance_masks.squeeze(dim=1)
|
||||
|
||||
masks = masks.broadcast_to(instance_masks.shape).clone()
|
||||
if brush_mask_mode == "refine":
|
||||
refine_mask = instance_masks > 0
|
||||
masks[~refine_mask] = 0
|
||||
elif brush_mask_mode == "add":
|
||||
masks = masks + instance_masks
|
||||
masks = masks > 0
|
||||
elif brush_mask_mode == "subtract":
|
||||
masks = ((instance_masks > 0) & ~masks)
|
||||
elif brush_mask_mode == "discard":
|
||||
masks = instance_masks > 0
|
||||
|
||||
# remove empty masks
|
||||
masks = masks[masks.count_nonzero(dim=[1,2]) != 0]
|
||||
|
||||
if self.per_instance:
|
||||
# support multiple will draw the other instances
|
||||
self.image_mask = to_pil_image(masks[0].float()).resize((init_image.width, init_image.height))
|
||||
|
||||
# save instance masks for support_multiple to pick it up
|
||||
self.Unprompted.shortcode_user_vars["image_masks"] = [to_pil_image(m.float()).resize((init_image.width, init_image.height)) for m in masks]
|
||||
self.image_masks = self.Unprompted.shortcode_user_vars["image_masks"]
|
||||
else:
|
||||
combined_mask = masks.sum(dim=0, keepdim=True) > 0
|
||||
self.image_mask = to_pil_image(combined_mask.float()).resize((init_image.width, init_image.height))
|
||||
# store instance masks for later segmentation drawing
|
||||
self.image_masks = [to_pil_image(m.float()).resize((init_image.width, init_image.height)) for m in masks]
|
||||
self.Unprompted.shortcode_user_vars["image_masks"] = [self.image_mask]
|
||||
|
||||
self.Unprompted.shortcode_user_vars["mode"] = 1
|
||||
self.Unprompted.shortcode_user_vars["mask_mode"] = 1
|
||||
self.Unprompted.shortcode_user_vars["image_mask"] = self.image_mask
|
||||
self.Unprompted.shortcode_user_vars["mask_for_overlay"] = self.image_mask
|
||||
self.Unprompted.shortcode_user_vars["latent_mask"] = None # fixes inpainting full resolution
|
||||
|
||||
if "save" in kwargs: self.image_mask.save(f"{self.Unprompted.parse_advanced(kwargs['save'],context)}.png")
|
||||
|
||||
return ""
|
||||
|
||||
def after(self, p:StableDiffusionProcessingImg2Img, processed:Processed):
|
||||
if self.image_masks and self.show:
|
||||
image = pil_to_tensor(p.init_images[-1])
|
||||
|
||||
masks = torch.stack([pil_to_tensor(m) for m in self.image_masks]).squeeze(dim=1)
|
||||
image = draw_segmentation_masks(image, masks > 0, alpha=0.75)
|
||||
processed.images += self.image_masks + [to_pil_image(image)]
|
||||
self.image_masks = None
|
||||
|
||||
return processed
|
||||
|
||||
def ui(self,gr):
|
||||
gr.Radio(label="Mask blend mode 🡢 mode",choices=["add","subtract","discard", "refine"], value="add", interactive=True)
|
||||
gr.Checkbox(label="Show mask in output 🡢 show")
|
||||
gr.Checkbox(label="Run inpaint per instance found 🡢 per_instance")
|
||||
gr.Number(label="Precision of selected area 🡢 mask_precision",value=0.5,interactive=True)
|
||||
gr.Number(label="Padding radius in pixels 🡢 padding",value=0,interactive=True)
|
||||
gr.Number(label="Smoothing radius in pixels 🡢 smoothing",value=20,interactive=True)
|
||||
gr.Number(label="Precision of instance selection 🡢 instance_precision",value=0.85,interactive=True)
|
||||
gr.Number(label="Number of instance to select 🡢 select",value=0,interactive=True)
|
||||
gr.Radio(
|
||||
label="Instance selection mode 🡢 select_mode",
|
||||
choices=["overlap", "relative overlap", "random", "greatest area"],
|
||||
value="overlap",
|
||||
interactive=True
|
||||
)
|
||||
|
|
@ -0,0 +1,18 @@
|
|||
class Shortcode():
|
||||
def __init__(self,Unprompted):
|
||||
self.Unprompted = Unprompted
|
||||
self.description = "Inverts the mask (great in combination with multiple txt2masks)"
|
||||
|
||||
def run_atomic(self, pargs, kwargs, context):
|
||||
from PIL import Image, ImageOps
|
||||
|
||||
if "image_mask" in self.Unprompted.shortcode_user_vars:
|
||||
mask = self.Unprompted.shortcode_user_vars["image_mask"]
|
||||
mask = mask.convert("L")
|
||||
mask = ImageOps.invert(mask)
|
||||
self.Unprompted.shortcode_user_vars["image_mask"] = mask
|
||||
|
||||
return ""
|
||||
|
||||
def ui(self,gr):
|
||||
pass
|
||||
|
|
@ -0,0 +1,146 @@
|
|||
from re import sub
|
||||
from modules.processing import StableDiffusionProcessingImg2Img, Processed, process_images
|
||||
from modules import images
|
||||
from torchvision.transforms.functional import to_pil_image, pil_to_tensor
|
||||
import torch
|
||||
from modules.shared import opts, state
|
||||
|
||||
class Shortcode():
|
||||
def __init__(self,Unprompted):
|
||||
self.Unprompted = Unprompted
|
||||
self.init_images = []
|
||||
self.image_masks = []
|
||||
self.processing = False
|
||||
self.orginal_n_iter = None
|
||||
self.description = "Allows to use multiple init_images or multiple masks"
|
||||
|
||||
def run_atomic(self, pargs, kwargs, context):
|
||||
if self.processing:
|
||||
return ""
|
||||
|
||||
had_init_image = False
|
||||
if "init_images" in self.Unprompted.shortcode_user_vars:
|
||||
self.init_images += self.Unprompted.shortcode_user_vars["init_images"]
|
||||
had_init_image = True
|
||||
|
||||
if "image_masks" in self.Unprompted.shortcode_user_vars:
|
||||
self.image_masks += [self.Unprompted.shortcode_user_vars["image_masks"]]
|
||||
elif "imgage_mask" in self.Unprompted.shortcode_user_vars:
|
||||
self.image_masks += [[self.Unprompted.shortcode_user_vars["image_mask"]]]
|
||||
elif had_init_image:
|
||||
# each init_image has at least an empty mask
|
||||
self.image_masks += [[]]
|
||||
|
||||
if "n_iter" in self.Unprompted.shortcode_user_vars:
|
||||
self.orginal_n_iter = self.Unprompted.shortcode_user_vars["n_iter"]
|
||||
self.Unprompted.shortcode_user_vars["n_iter"] = 0
|
||||
|
||||
return ""
|
||||
|
||||
def after(self, p:StableDiffusionProcessingImg2Img, processed: Processed):
|
||||
if not self.processing and self.orginal_n_iter is not None:
|
||||
self.processing = True
|
||||
|
||||
try:
|
||||
mask_count = sum([len(masks) for masks in self.image_masks])
|
||||
if mask_count == 0:
|
||||
state.job_count = self.orginal_n_iter
|
||||
else:
|
||||
if len(self.init_images) == 1:
|
||||
state.job_count = mask_count * self.orginal_n_iter
|
||||
else:
|
||||
state.job_count = mask_count
|
||||
|
||||
batched_init_imgs = [self.init_images[idx:idx+p.batch_size] for idx in range(0, len(self.init_images), p.batch_size)]
|
||||
batched_prompts = [p.all_prompts[idx:idx+p.batch_size] for idx in range(0, len(p.all_prompts), p.batch_size)]
|
||||
batched_neg_prompts = [p.all_negative_prompts[idx:idx+p.batch_size] for idx in range(0, len(p.all_negative_prompts), p.batch_size)]
|
||||
batched_masks = [self.image_masks[idx:idx+p.batch_size] for idx in range(0, len(self.image_masks), p.batch_size)]
|
||||
batched_seeds = [p.all_seeds[idx:idx+p.batch_size] for idx in range(0, len(p.all_seeds), p.batch_size)]
|
||||
|
||||
create_grid = not p.do_not_save_grid
|
||||
save_imgs = not p.do_not_save_samples
|
||||
|
||||
p.do_not_save_grid = True
|
||||
p.do_not_save_samples = True
|
||||
|
||||
p.n_iter = 1
|
||||
if len(self.init_images) == 1:
|
||||
batched_init_imgs = [[self.init_images[0]] * p.batch_size] * self.orginal_n_iter
|
||||
batched_masks = [[self.image_masks[0]] * p.batch_size] * self.orginal_n_iter
|
||||
|
||||
for init_imgs, prompts, neg_prompts, seeds, maskss in zip(batched_init_imgs, batched_prompts, batched_neg_prompts, batched_seeds, batched_masks):
|
||||
if sum([len(masks) for masks in maskss]) == 0:
|
||||
p.init_images = init_imgs
|
||||
p.all_prompts = batched_prompts
|
||||
p.all_negative_prompts = batched_neg_prompts
|
||||
p.all_seeds = seeds
|
||||
p.mask = None
|
||||
sub_processed = process_images(p)
|
||||
processed.images += sub_processed.images
|
||||
else:
|
||||
output_resolution = (init_imgs[0].width, init_imgs[0].height) if p.inpaint_full_res else (p.width, p.height)
|
||||
|
||||
if len(self.init_images) == 1:
|
||||
imgs = torch.stack([pil_to_tensor(init_imgs[0].resize(output_resolution))] * p.batch_size).clone()
|
||||
|
||||
for idx, mask in enumerate(maskss[0]):
|
||||
p.init_images = [to_pil_image(img) for img in imgs]
|
||||
p.image_mask = mask
|
||||
p.all_prompts = prompts
|
||||
p.all_negative_prompts = neg_prompts
|
||||
p.all_seeds = [seed + idx + 800 for seed in seeds]
|
||||
|
||||
sub_processed = process_images(p)
|
||||
|
||||
mask = mask.resize(output_resolution)
|
||||
mask = pil_to_tensor(mask) > 0
|
||||
mask = mask.broadcast_to(imgs.shape)
|
||||
|
||||
imgs[mask] = torch.stack([pil_to_tensor(img) for img in sub_processed.images[:len(imgs)]])[mask]
|
||||
|
||||
processed.images += [to_pil_image(img) for img in imgs]
|
||||
else:
|
||||
for init_img, prompt, neg_prompt, seed, masks in zip(init_imgs, prompts, neg_prompts, seeds, maskss):
|
||||
img = pil_to_tensor(init_img.resize(output_resolution))
|
||||
|
||||
for idx, mask in enumerate(masks):
|
||||
p.batch_size = 1
|
||||
p.init_images = [to_pil_image(img)]
|
||||
p.image_mask = mask
|
||||
p.all_prompts = [prompt]
|
||||
p.all_negative_prompts = [neg_prompt]
|
||||
p.all_seeds = [seed + idx + 800]
|
||||
|
||||
sub_processed = process_images(p)
|
||||
|
||||
mask = mask.resize(output_resolution)
|
||||
mask = pil_to_tensor(mask) > 0
|
||||
mask = mask.broadcast_to(img.shape)
|
||||
|
||||
img[mask] = pil_to_tensor(sub_processed.images[0])[mask]
|
||||
|
||||
processed.images.append(to_pil_image(img))
|
||||
|
||||
if opts.samples_save and save_imgs:
|
||||
for img, prompt, neg_prompt, seed in zip(processed.images, p.all_prompts, p.all_negative_prompts, p.all_seeds):
|
||||
images.save_image(img, p.outpath_samples, "", seed, prompt, opts.samples_format)
|
||||
|
||||
if create_grid and len(processed.images) > 1:
|
||||
grid = images.image_grid(processed.images, p.batch_size * len(batched_init_imgs))
|
||||
if opts.return_grid:
|
||||
processed.images.insert(0, grid)
|
||||
processed.index_of_first_image = 1
|
||||
if opts.grid_save:
|
||||
images.save_image(grid, p.outpath_grids, "grid", p.all_seeds[0], p.all_prompts[0], opts.grid_format, short_filename=not opts.grid_extended_filename, p=p, grid=True)
|
||||
|
||||
finally:
|
||||
self.processing = False
|
||||
self.init_images = []
|
||||
self.image_masks = []
|
||||
self.orginal_n_iter = None
|
||||
|
||||
|
||||
|
||||
|
||||
def ui(self,gr):
|
||||
pass
|
||||
|
|
@ -1,9 +1,13 @@
|
|||
from torchvision.utils import draw_segmentation_masks
|
||||
from torchvision.transforms.functional import pil_to_tensor, to_pil_image
|
||||
|
||||
class Shortcode():
|
||||
def __init__(self,Unprompted):
|
||||
self.Unprompted = Unprompted
|
||||
self.image_mask = None
|
||||
self.show = False
|
||||
self.description = "Creates an image mask from the content for use with inpainting."
|
||||
|
||||
def run_block(self, pargs, kwargs, context, content):
|
||||
from lib_unprompted.stable_diffusion.clipseg.models.clipseg import CLIPDensePredT
|
||||
|
||||
|
|
@ -17,22 +21,37 @@ class Shortcode():
|
|||
from modules.images import flatten
|
||||
from modules.shared import opts
|
||||
|
||||
|
||||
if "init_images" not in self.Unprompted.shortcode_user_vars:
|
||||
return
|
||||
|
||||
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
||||
|
||||
brush_mask_mode = self.Unprompted.parse_advanced(kwargs["mode"],context) if "mode" in kwargs else "add"
|
||||
self.show = True if "show" in pargs else False
|
||||
|
||||
self.legacy_weights = True if "legacy_weights" in pargs else False
|
||||
smoothing_kernel = None
|
||||
smoothing = int(self.Unprompted.parse_advanced(kwargs["smoothing"],context)) if "smoothing" in kwargs else 20
|
||||
|
||||
smoothing_kernel = None
|
||||
if smoothing > 0:
|
||||
smoothing_kernel = numpy.ones((smoothing,smoothing),numpy.float32)/(smoothing*smoothing)
|
||||
|
||||
neg_smoothing = int(self.Unprompted.parse_advanced(kwargs["neg_smoothing"],context)) if "neg_smoothing" in kwargs else 20
|
||||
neg_smoothing_kernel = None
|
||||
if neg_smoothing > 0:
|
||||
neg_smoothing_kernel = numpy.ones((neg_smoothing,neg_smoothing),numpy.float32)/(neg_smoothing*neg_smoothing)
|
||||
|
||||
# Pad the mask by applying a dilation or erosion
|
||||
mask_padding = int(self.Unprompted.parse_advanced(kwargs["padding"],context) if "padding" in kwargs else 0)
|
||||
neg_mask_padding = int(self.Unprompted.parse_advanced(kwargs["neg_padding"],context) if "neg_padding" in kwargs else 0)
|
||||
padding_dilation_kernel = None
|
||||
if (mask_padding != 0):
|
||||
padding_dilation_kernel = numpy.ones((abs(mask_padding), abs(mask_padding)), numpy.uint8)
|
||||
|
||||
neg_padding_dilation_kernel = None
|
||||
if (neg_mask_padding != 0):
|
||||
neg_padding_dilation_kernel = numpy.ones((abs(neg_mask_padding), abs(neg_mask_padding)), numpy.uint8)
|
||||
|
||||
prompts = content.split(self.Unprompted.Config.syntax.delimiter)
|
||||
prompt_parts = len(prompts)
|
||||
|
||||
|
|
@ -42,6 +61,7 @@ class Shortcode():
|
|||
else: negative_prompts = None
|
||||
|
||||
mask_precision = min(255,int(self.Unprompted.parse_advanced(kwargs["precision"],context) if "precision" in kwargs else 100))
|
||||
neg_mask_precision = min(255,int(self.Unprompted.parse_advanced(kwargs["neg_precision"],context) if "neg_precision" in kwargs else 100))
|
||||
|
||||
def overlay_mask_part(img_a,img_b,mode):
|
||||
if (mode == "discard"): img_a = ImageChops.darker(img_a, img_b)
|
||||
|
|
@ -51,19 +71,10 @@ class Shortcode():
|
|||
def gray_to_pil(img):
|
||||
return (Image.fromarray(cv2.cvtColor(img,cv2.COLOR_GRAY2RGBA)))
|
||||
|
||||
def center_crop(img,new_width,new_height):
|
||||
width, height = img.size # Get dimensions
|
||||
left = (width - new_width)/2
|
||||
top = (height - new_height)/2
|
||||
right = (width + new_width)/2
|
||||
bottom = (height + new_height)/2
|
||||
# Crop the center of the image
|
||||
return(img.crop((left, top, right, bottom)))
|
||||
|
||||
def process_mask_parts(these_preds,these_prompt_parts,mode,final_img = None):
|
||||
for i in range(these_prompt_parts):
|
||||
def process_mask_parts(masks, mode, final_img = None, mask_precision=100, mask_padding=0, padding_dilation_kernel=None, smoothing_kernel=None):
|
||||
for i, mask in enumerate(masks):
|
||||
filename = f"mask_{mode}_{i}.png"
|
||||
plt.imsave(filename,torch.sigmoid(these_preds[i][0]))
|
||||
plt.imsave(filename,torch.sigmoid(mask[0]))
|
||||
|
||||
# TODO: Figure out how to convert the plot above to numpy instead of re-loading image
|
||||
img = cv2.imread(filename)
|
||||
|
|
@ -84,11 +95,10 @@ class Shortcode():
|
|||
|
||||
final_img = bw_image
|
||||
return(final_img)
|
||||
|
||||
|
||||
def get_mask():
|
||||
# load model
|
||||
model = CLIPDensePredT(version='ViT-B/16', reduce_dim=64, complex_trans_conv=not self.legacy_weights)
|
||||
model.eval()
|
||||
model_dir = f"{self.Unprompted.base_dir}/lib/stable_diffusion/clipseg/weights"
|
||||
os.makedirs(model_dir, exist_ok=True)
|
||||
|
||||
|
|
@ -104,6 +114,7 @@ class Shortcode():
|
|||
|
||||
# non-strict, because we only stored decoder weights (not CLIP weights)
|
||||
model.load_state_dict(torch.load(d64_file), strict=False);
|
||||
model = model.eval().to(device=device)
|
||||
|
||||
transform = transforms.Compose([
|
||||
transforms.ToTensor(),
|
||||
|
|
@ -115,8 +126,8 @@ class Shortcode():
|
|||
|
||||
# predict
|
||||
with torch.no_grad():
|
||||
preds = model(img.repeat(prompt_parts,1,1,1), prompts)[0]
|
||||
if (negative_prompts): negative_preds = model(img.repeat(negative_prompt_parts,1,1,1), negative_prompts)[0]
|
||||
preds = model(img.repeat(prompt_parts,1,1,1).to(device=device), prompts)[0].cpu()
|
||||
if (negative_prompts): negative_preds = model(img.repeat(negative_prompt_parts,1,1,1).to(device=device), negative_prompts)[0].cpu()
|
||||
|
||||
if "image_mask" not in self.Unprompted.shortcode_user_vars: self.Unprompted.shortcode_user_vars["image_mask"] = None
|
||||
|
||||
|
|
@ -125,14 +136,14 @@ class Shortcode():
|
|||
else: final_img = None
|
||||
|
||||
# process masking
|
||||
final_img = process_mask_parts(preds,prompt_parts,"add",final_img)
|
||||
final_img = process_mask_parts(preds,"add",final_img, mask_precision, mask_padding, padding_dilation_kernel, smoothing_kernel)
|
||||
|
||||
# process negative masking
|
||||
if (brush_mask_mode == "subtract" and self.Unprompted.shortcode_user_vars["image_mask"] is not None):
|
||||
self.Unprompted.shortcode_user_vars["image_mask"] = ImageOps.invert(self.Unprompted.shortcode_user_vars["image_mask"])
|
||||
self.Unprompted.shortcode_user_vars["image_mask"] = self.Unprompted.shortcode_user_vars["image_mask"].convert("RGBA").resize((512,512))
|
||||
final_img = overlay_mask_part(final_img,self.Unprompted.shortcode_user_vars["image_mask"],"discard")
|
||||
if (negative_prompts): final_img = process_mask_parts(negative_preds,negative_prompt_parts,"discard",final_img)
|
||||
if (negative_prompts): final_img = process_mask_parts(negative_preds,"discard",final_img, neg_mask_precision,neg_mask_padding, neg_padding_dilation_kernel, neg_smoothing_kernel)
|
||||
|
||||
if "size_var" in kwargs:
|
||||
img_data = final_img.load()
|
||||
|
|
@ -163,6 +174,9 @@ class Shortcode():
|
|||
def after(self,p=None,processed=None):
|
||||
if self.image_mask and self.show:
|
||||
processed.images.append(self.image_mask)
|
||||
|
||||
overlayed_init_img = draw_segmentation_masks(pil_to_tensor(p.init_images[0]), pil_to_tensor(self.image_mask.convert("L")) > 0)
|
||||
processed.images.append(to_pil_image(overlayed_init_img))
|
||||
self.image_mask = None
|
||||
self.show = False
|
||||
return processed
|
||||
|
|
@ -172,7 +186,10 @@ class Shortcode():
|
|||
gr.Checkbox(label="Show mask in output 🡢 show")
|
||||
gr.Checkbox(label="Use legacy weights 🡢 legacy_weights")
|
||||
gr.Number(label="Precision of selected area 🡢 precision",value=100,interactive=True)
|
||||
gr.Number(label="Precision of negative selected area 🡢 neg_precision",value=100,interactive=True)
|
||||
gr.Number(label="Padding radius in pixels 🡢 padding",value=0,interactive=True)
|
||||
gr.Number(label="Padding radius in pixels for negative mask 🡢 neg_padding",value=0,interactive=True)
|
||||
gr.Number(label="Smoothing radius in pixels 🡢 smoothing",value=20,interactive=True)
|
||||
gr.Number(label="Smoothing radius in pixels 🡢 neg_smoothing",value=20,interactive=True)
|
||||
gr.Textbox(label="Negative mask prompt 🡢 negative_mask",max_lines=1)
|
||||
gr.Textbox(label="Save the mask size to the following variable 🡢 size_var",max_lines=1)
|
||||
Loading…
Reference in New Issue