Allow user to migrate existing workloads including ckpt merge, model training, model inferencing onto AWS

Go to file

Xiujuan Li ae280fe82b upgrade		2024-09-10 12:25:26 +08:00
.github	fix oas for ja	2024-06-24 16:49:37 +08:00
aws_extension	upgrade api client version	2024-06-23 18:35:16 +08:00
build_scripts	upgrade	2024-09-10 12:25:26 +08:00
deployment	improved serve	2024-04-07 12:48:13 +08:00
docs	fix logs	2024-07-25 09:26:11 +08:00
infrastructure	make train get file from another bucket	2024-07-18 15:44:10 +08:00
javascript	improved validators	2024-04-18 08:40:33 +08:00
middleware_api	fix train path	2024-07-18 11:53:32 +08:00
scripts	merge proxy	2024-05-22 20:22:58 +08:00
test	Merge pull request #850 from awslabs/dependabot/pip/test/setuptools-70.0.0	2024-07-30 15:30:40 +08:00
update_scripts	removed unused ModelTable	2024-05-17 21:23:50 +08:00
workshop	Merge remote-tracking branch 'origin/dev' into dev_juan	2024-07-24 11:19:16 +08:00
.gitallowed	recovery ignores for some files	2024-02-22 14:31:19 +08:00
.gitignore	improved oas	2024-06-24 16:14:05 +08:00
.viperlightignore	improved oas	2024-06-24 16:14:05 +08:00
.viperlightrc	fix: workflow test	2023-07-03 13:18:38 +08:00
CHANGELOG.md	doc update: version and notice update	2023-06-20 09:02:07 +00:00
CODE_OF_CONDUCT.md	improved cdk	2024-03-30 22:23:37 +08:00
CONTRIBUTING.md	improved readme	2024-07-04 11:44:45 +08:00
LICENSE	initial push for extension (container exclude)	2023-05-05 15:23:36 +08:00
NOTICE	Initial commit	2023-05-04 00:23:42 -04:00
README.md	docs update: per new version	2024-07-14 11:58:57 +00:00
THIRD-PARTY-LICENSES.txt	initial push for extension (container exclude)	2023-05-05 15:23:36 +08:00
buildspec-private-repo.yml	update lambda packages version	2024-04-05 15:27:29 +08:00
buildspec.yml	improved oas	2024-06-24 16:14:05 +08:00
commit-id.sh	improved endpoint cache check	2024-04-14 16:15:54 +08:00
docker_image.sh	improved workflow delete check and delete folder	2024-06-11 12:04:16 +08:00
docker_reset.sh	update docker reset	2024-06-07 15:40:53 +08:00
docker_start.sh	improved serve	2024-07-16 15:47:58 +08:00
install.bat	update windows commit id	2024-04-12 07:54:38 +08:00
install.py	initial push for extension (container exclude)	2023-05-05 15:23:36 +08:00
install.sh	fixed download esd branch	2024-04-08 14:10:54 +08:00
utils.py	improved config	2024-03-19 01:30:46 +08:00
utils_cn.py	chore: remove db	2024-01-25 15:28:05 +08:00

README.md

Extension for Stable Diffusion on AWS

Extension for Stable Diffusion on AWS: Unlock the Power of image and video generation in the Cloud with Ease and Speed

This is a webUI extension to help users migrate existing workload (inference, train, etc.) from local server or standalone server to AWS Cloud. Key features include:

Support Stable Diffusion webUI inference along with other extensions through BYOC (bring your own containers) in the cloud.
Support LoRa model training through Kohya_ss in the cloud.
Support ComfyUI inference along with other extensions in the cloud. This supports users in conveniently releasing templates that require stable, continuous inference to the cloud. Additionally, users can make simple modifications (e.g., prompt adjustments) to the released templates on the cloud and maintain stable inference.

Architecture
Quick Start
API Reference
Version
License

Architecture

The diagram below presents the architecture you can automatically deploy using the solution's implementation guide and accompanying Amazon CloudFormation template.

Users in WebUI console will trigger the requests to API Gateway with assigned API token for authentication. Note that no Amazon Web Services credentials are required from WebUI perspective.
Amazon API Gateway will route the requests based on URL prefix to different functional Lambda to implement util jobs (for example, model upload, checkpoint merge), model training and model inferencing. In the meantime, Amazon Lambda will record the operation metadata into Amazon DynamoDB (for example, inferencing parameters, model name) for successive query and association.
For training process, the Amazon Step Functions will be invoked to orchestrate the training process including Amazon SageMaker for training and SNS for training status notification. For inference process, Amazon Lambda will invoke the Amazon SageMaker to implement async inference. Training data, model and checkpoint will be stored in Amazon S3 bucket delimited with difference prefix.

Quick Start

There are 3 key features that the extension supports. There are 2 branches of deployment method, depending on the key feature that you'd like to deploy.

If you'd like to adopt SD webUI or Kohya in the cloud, please follow the instruction here.
If you'd like to adopt ComfyUI in the cloud, please follow the instruction here.

API Reference

To provide developers with a more convenient experience for invoking and debugging APIs, we offer a feature API debugger. With this tool, you can view the complete set of APIs and corresponding parameters for cloud-based inference images with a single click.

Click the button to refresh the inference history job list
Pull down the inference job list, find and select the job
Click the API button on the right

The comprehensive APIs with sample can be found here.

Version

Check our wiki for the latest & historical version

License

This project is licensed under the Apache-2.0 License.

Source Code Structure

.
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── NOTICE
├── README.md
├── THIRD-PARTY-LICENSES.txt
├── build_scripts -- scripts to build the docker images, we use these scripts to build docker images on cloud
├── buildspec.yml -- buildspec file for CodeBuild, we have code pipeline to use this buildspec to transfer the CDK assets to Cloudformation templates
├── deployment    -- scripts to deploy the CloudFormation template
├── docs
├── infrastructure -- CDK project to deploy the middleware, all the middle ware infrastructure code is in this directory
├── install.py -- install dependencies for the extension
├── install.sh --  script to set the webui and extension to specific version
├── javascript -- javascript code for the extension
├── middleware_api -- middleware api denifition and lambda code
├── sagemaker_entrypoint_json.py -- wrapper function for SageMaker
├── scripts -- extension related code for WebUI
└── utils.py -- wrapper function for configure options