Segment Anything for Stable Diffusion WebUI. Automatically generate high-quality segmentations/masks for images by clicking or text prompting. Aim for connecting WebUI and ControlNet with Segment Anything and GroundingDINO to enhance Stable Diffusion/ControlNet inpainting (both single image and batch process), enhance ControlNet semantic segmentation, automate image matting and create LoRA/LyCORIS training set.
 
 
Go to file
Chengsong Zhang b30e51d38b resolve groundingdino install problem 2023-04-15 20:18:31 +08:00
.github/ISSUE_TEMPLATE update README to advise update of webui 2023-04-12 01:20:58 +08:00
javascript minor update 2023-04-15 03:06:30 +08:00
models runnable version 2023-04-13 05:36:01 +08:00
scripts resolve groundingdino install problem 2023-04-15 20:18:31 +08:00
.gitignore update gitignore 2023-04-12 20:32:20 +08:00
README.md Updated the README with API usage example 2023-04-14 19:03:45 -05:00
install.py resolve groundingdino install problem 2023-04-15 20:18:31 +08:00
requirements.txt resolve groundingdino install problem 2023-04-15 20:18:31 +08:00
style.css GroundingDINO supported without batch process 2023-04-14 09:35:16 +08:00

README.md

Segment Anything for Stable Diffusion WebUI

This extension aim for helping stable diffusion webui users to use segment anything and GroundingDINO to do stable diffusion inpainting.

News

  • 2023/04/12: [Feature] Mask expansion enabled. Thanks @jordan-barrett-jm for your great contribution!
  • 2023/04/14: [Feature] GroundingDINO support with full feature released in master branch! Check it out and use text prompt to automatically generate masks! Also use Batch Process tab to get LoRA/LyCORIS training set! Note that when you firstly initiate WebUI you may need to wait some time for GroundingDINO to be built. Also make sure that you have access to GitHub on your terminal, otherwise you may need to install manually. cd to ${sd-webui-sam}/ and run git checkout 99a0fe5 on the terminal to revert back to the previous version without GroundingDINO.

Plan

Thanks for suggestions from GitHub Issues, reddit and bilibili to make this extension better.

  • [Developing] Support API as mentioned in #15
  • Support color inpainting as mentioned in #21
  • Support automatic mask generation for hierarchical image segmentation and SD animation
  • Support semantic segmentation for batch process, ControlNet segmentation and SD animation
  • Connect to ControlNet inpainting and segmentation
  • Support WebUI older commits (e.g. a9fed7c364061ae6efb37f797b6b522cb3cf7aa2)

Not all plans may ultimately be implemented. Some ideas might not work and be abandoned. Support for old commits has low priority, so I would encourage you to update your WebUI as soon as you can.

Update your WebUI version

If you are unable to add dot, observe list index out of range error on your terminal, or any other error, the most probable reason is that your WebUI is outdated (such as you are using this commitment: a9fed7c364061ae6efb37f797b6b522cb3cf7aa2).

In most cases, updating your WebUI can solve your problem. Before you submit your issue and before I release support for some old version of WebUI, I ask that you firstly check your version of your WebUI.

How to use

Step 1:

Download this extension to ${sd-webui}/extensions use whatever way you like (git clone or install from UI)

Step 2:

Download segment-anything model from here to ${sd-webui}/models/sam. Do not change model name, otherwise this extension may fail due to a bug inside segment anything.

To give you a reference, vit_h is 2.56GB, vit_l is 1.25GB, vit_b is 375MB. I myself tested vit_h on NVIDIA 3090 Ti which is good. If you encounter VRAM problem, you should switch to smaller models.

Step 3:

  • Launch webui and switch to img2img mode.

Single Image

  • Upload your image
  • Optionally add point prompts on the image. Left click for positive point prompt (black dot), right click for negative point prompt (red dot), left click any dot again to cancel the prompt. You must add point prompt if you do not wish to use GroundingDINO.
  • Optionally check Enable GroundingDINO, select GroundingDINO model you want, write text prompt and pick a box threshold. You must write text prompt if you do not wish to use point prompts. Note that GroundingDINO models will be automatically downloaded from HuggingFace. If your terminal cannot visit HuggingFace, please manually download the model and put it under ${sd-webui-sam}/models/grounding-dino.
  • Optionally enable previewing GroundingDINO bounding box and click Generate bounding box. You must write text prompt to preview bounding box. After you see the boxes with number marked on the left corner, uncheck all the boxes you do not want. If you uncheck all boxes, you will have to add point prompts to generate masks.
  • Click Preview Segmentation button. Due to the limitation of SAM, if there are multiple bounding boxes, your point prompts will not take effect when generating masks.
  • Choose your favorite segmentation and check Copy to Inpaint Upload
  • Optionally check Expand Mask and specify the amount, then click Update Mask
  • Click Switch to Inpaint Upload button. There is no need to upload another image or mask, just leave them blank. Write your prompt, configurate and click Generate.

Batch Process

  • Choose your SAM model, GroundingDINO model, text prompt, box threshold and mask expansion amount. Enter the source and destination directories of your images. The source directory should only contain images.
  • Output per image gives you a choice on configurating the number of masks per bounding box. I would highly recommend choosing 3, since some mask might be wierd.
  • save mask gives you a choice to save the black & white mask and Save original image with mask and bounding box enables you to save image+mask+bounding_box.
  • Click Start batch process and wait. If you see "Done" below this button, you are all set.

Demo

Point prompts demo

https://user-images.githubusercontent.com/63914308/230916163-af661008-5a50-496e-8b79-8be7f193f9e9.mp4

GroundingDINO demo

https://user-images.githubusercontent.com/63914308/232157480-757f6e70-673a-4023-b4ca-df074ed30436.mp4

Batch process image demo

Configuration Image

Input Image Output Image Output Mask Output Blend
Input Image Output Image Output Mask Output Blend

API Usage

We have added an API endpoint to allow for automated workflows.

The API utilizes both Segment Anything and GroundingDINO to return masks of all instances of whatever object is specified in the text prompt.

This is an extension of the existing Stable Diffusion Web UI API.

There are 2 endpoints exposed

  • GET sam-webui/heartbeat
  • POST /sam-webui/image-mask

The heartbeat endpoint can be used to ensure that the API is up.

The image-mask endpoint accepts a payload that includes your base64-encoded image.

Below is an example of how to interface with the API using requests.

API Example Usage

import base64
import requests
from PIL import Image
from io import BytesIO

def image_to_base64(img_path: str) -> str:
    with open(img_path, "rb") as img_file:
        img_base64 = base64.b64encode(img_file.read()).decode()
    return img_base64

payload = {
    "image": image_to_base64("IMAGE_FILE_PATH"),
    "prompt": "TEXT PROMPT",
    "box_threshold": 0.3
}
res = requests.post(url, json=payload)

for dct in res.json():
    image_data = base64.b64decode(dct['image'])
    image = Image.open(BytesIO(image_data))
    image.show()

Contribute

Disclaimer: I have not thoroughly tested this extension, so there might be bugs. Bear with me while I'm fixing them :)

If you encounter a bug, please submit a issue. Please at least provide your WebUI version, your extension version, your browser version, errors on your browser console log if there is any, error on your terminal log if there is any, to make sure that I can find a solution faster.

I welcome any contribution. Please submit a pull request if you want to contribute

Star History

Star History Chart