starts to work

pull/1/head v1.0
Kahsolt 2022-11-10 22:02:00 +08:00
commit 5081935c39
6 changed files with 451 additions and 0 deletions

24
LICENSE Normal file
View File

@ -0,0 +1,24 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <https://unlicense.org>

75
README.md Normal file
View File

@ -0,0 +1,75 @@
# stable-diffusion-webui-prompt-travel
Extension script for AUTOMATIC1111/stable-diffusion-webui to travel between prompts in latent space.
----
This is the more human-sensible version of [stable-diffusion-webui-prompt-erosion](https://github.com/Kahsolt/stable-diffusion-webui-prompt-erosion),
now we do not modify on text char level, but do linear interpolating on the hidden embedded vectors. 😀
⚠ Though this is still not the best way to do semantics interpolate, future works will continue to explorer.
⚠ 尽管线性插值仍然不是最连续流畅的过渡方式,之后的工作将探索是否能通过探测梯度下降方向来插值(但是先摸一会儿别的东西了 :lolipop:
实话不说我想有可能通过这个来做ppt童话绘本<del>甚至本子</del>……
聪明的用法:先手工盲搜两张好看的图 (只有prompt差异),然后再尝试在其间 travel 😀
### How it works?
- generate image one by one (batch configs are ignored)
- gradually change the digested inputs between prompts
- freeze all other settings (steps, sampler, cfg factor, rand seed, etc.)
- force `subseed = None`
- gather to be a video!
DDIM:
![DDIM](img/ddim.gif)
Eular a:
![eular_a](img/eular_a.gif)
在原始的 prompt 框里输入正面/负面提示词每一行表示一个stage
在左下角的插件栏修改 stage之间的补帧数量 和 视频输出帧率
```
[postive prompts]
(((masterpiece))), highres, ((boy)), child, cat ears, white hair, red eyes, yellow bell, red cloak, barefoot, angel, [flying], egyptian
((masterpiece)), highres, ((girl)), loli, cat ears, light blue hair, red eyes, magical wand, barefoot, [running]
[negative prompts]
(((nsfw))), ugly,duplicate,morbid,mutilated,tranny,trans,trannsexual,mutation,deformed,long neck,bad anatomy,bad proportions,extra arms,extra legs, disfigured,more than 2 nipples,malformed,mutated,hermaphrodite,out of frame,extra limbs,missing arms,missing legs,poorly drawn hands,poorty drawn face,mutation,poorly drawn,long body,multiple breasts,cloned face,gross proportions, mutated hands,bad hands,bad feet,long neck,missing limb,malformed limbs,malformed hands,fused fingers,too many fingers,extra fingers,missing fingers,extra digit,fewer digits,mutated hands and fingers,lowres,text,error,cropped,worst quality,low quality,normal quality,jpeg artifacts,signature,watermark,username,blurry,text font ufemale focus, poorly drawn, deformed, poorly drawn face, (extra leg:1.3), (extra fingers:1.2),out of frame
[steps]
45
```
### Options
- postive prompts: (list of strings)
- negative prompts: (list of strings)
- each line is a prompt stage
- if len(postive) != len(negative), the shorter one's last item will be repeated to match the longer one
- steps: (int, list of int)
- travel from stage1 to stage2 in n steps (即补帧数量)
- if single int, constant number of images between two successive stages
- if list of ints, should match `len(stages)-1` e.g.: `12, 24, 36`
⚠ this feature does NOT support the **schedule** syntax (i.e.: `[propmt:propmt:number]`), because I don't know how to interpolate between different schedule plans :(
max length diff for each prompts should not exceed `75` in token count, cos' I also don't know how to interpolate between different-lengthed tensors :)
### Installation
Easiest way to install it is to:
1. Go to the "Extensions" tab in the webui
2. Click on the "Install from URL" tab
3. Paste https://github.com/Kahsolt/stable-diffusion-webui-prompt-travel.git into "URL for extension's git repository" and click install
4. (Optional) You will need to restart the webui for dependensies to be installed or you won't be able to generate video files.
Manual install:
1. Copy the file in the scripts-folder to the scripts-folder from https://github.com/AUTOMATIC1111/stable-diffusion-webui
2. Add `moviepy==1.0.3` to requirements_versions.txt
----
by Armit
2022/11/10

BIN
img/ddim.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 MiB

BIN
img/eular_a.gif Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 MiB

4
install.py Normal file
View File

@ -0,0 +1,4 @@
import launch
if not launch.is_installed("moviepy"):
launch.run_pip("install moviepy==1.0.3", "requirements for Seed Travel")

348
scripts/prompt_travel.py Normal file
View File

@ -0,0 +1,348 @@
import os
import random
from copy import deepcopy
import gradio as gr
import numpy as np
try:
from moviepy.video.io.ImageSequenceClip import ImageSequenceClip
except ImportError:
print(f"moviepy python module not installed. Will not be able to generate video.")
import modules.scripts as scripts
from modules.processing import Processed, StableDiffusionProcessing
from modules.processing import *
from modules.prompt_parser import ScheduledPromptConditioning
from modules.shared import state
DEFAULT_STEPS = 10
DEFAULT_SAVE = True
DEFAULT_FPS = 10
DEFAULT_DEBUG = True
# ↓↓↓ the following is modified from 'modules/processing.py' ↓↓↓
import torch
import numpy as np
from PIL import Image
import random
import modules.sd_hijack
from modules import devices, prompt_parser, lowvram
from modules.sd_hijack import model_hijack
from modules.shared import opts, cmd_opts, state
import modules.shared as shared
import modules.face_restoration
import modules.images as images
import modules.styles
def process_images_inner_half_A(p: StableDiffusionProcessing) -> tuple:
"""this is the main loop that both txt2img and img2img use; it calls func_init once inside all the scopes and func_sample once per batch"""
if type(p.prompt) == list:
assert(len(p.prompt) > 0)
else:
assert p.prompt is not None
with open(os.path.join(shared.script_path, "params.txt"), "w", encoding="utf8") as file:
processed = Processed(p, [], p.seed, "")
file.write(processed.infotext(p, 0))
devices.torch_gc()
seed = get_fixed_seed(p.seed)
subseed = get_fixed_seed(p.subseed)
modules.sd_hijack.model_hijack.apply_circular(p.tiling)
modules.sd_hijack.model_hijack.clear_comments()
shared.prompt_styles.apply_styles(p)
if type(p.prompt) == list:
p.all_prompts = p.prompt
else:
p.all_prompts = p.batch_size * 1 * [p.prompt]
if type(seed) == list:
p.all_seeds = seed
else:
p.all_seeds = [int(seed) + (x if p.subseed_strength == 0 else 0) for x in range(len(p.all_prompts))]
if type(subseed) == list:
p.all_subseeds = subseed
else:
p.all_subseeds = [int(subseed) + x for x in range(len(p.all_prompts))]
if os.path.exists(cmd_opts.embeddings_dir) and not p.do_not_reload_embeddings:
model_hijack.embedding_db.load_textual_inversion_embeddings()
if p.scripts is not None:
p.scripts.process(p)
with torch.no_grad(), p.sd_model.ema_scope():
with devices.autocast():
p.init(p.all_prompts, p.all_seeds, p.all_subseeds)
if state.job_count == -1:
state.job_count = 1
n = 0 # batch count for legacy compatible
prompts = p.all_prompts [n * p.batch_size : (n + 1) * p.batch_size]
seeds = p.all_seeds [n * p.batch_size : (n + 1) * p.batch_size]
subseeds = p.all_subseeds[n * p.batch_size : (n + 1) * p.batch_size]
if p.scripts is not None:
p.scripts.process_batch(p, batch_number=n, prompts=prompts, seeds=seeds, subseeds=subseeds)
with devices.autocast():
uc = prompt_parser.get_learned_conditioning(shared.sd_model, len(prompts) * [p.negative_prompt], p.steps)
c = prompt_parser.get_multicond_learned_conditioning(shared.sd_model, prompts, p.steps)
return c, uc, prompts, seeds, subseeds
def process_images_inner_half_B(p: StableDiffusionProcessing, c, uc, prompts, seeds, subseeds):
comments = {}
infotexts = []
output_images = []
def infotext(iteration=0, position_in_batch=0):
return create_infotext(p, p.all_prompts, p.all_seeds, p.all_subseeds, comments, iteration, position_in_batch)
with torch.no_grad(), p.sd_model.ema_scope():
with devices.autocast():
samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
samples_ddim = samples_ddim.to(devices.dtype_vae)
x_samples_ddim = decode_first_stage(p.sd_model, samples_ddim)
x_samples_ddim = torch.clamp((x_samples_ddim + 1.0) / 2.0, min=0.0, max=1.0)
del samples_ddim
if shared.cmd_opts.lowvram or shared.cmd_opts.medvram:
lowvram.send_everything_to_cpu()
devices.torch_gc()
if opts.filter_nsfw:
import modules.safety as safety
x_samples_ddim = modules.safety.censor_batch(x_samples_ddim)
n = 0 # batch count for legacy compatible
for i, x_sample in enumerate(x_samples_ddim):
x_sample = 255. * np.moveaxis(x_sample.cpu().numpy(), 0, 2)
x_sample = x_sample.astype(np.uint8)
if p.restore_faces:
if opts.save and not p.do_not_save_samples and opts.save_images_before_face_restoration:
images.save_image(Image.fromarray(x_sample), p.outpath_samples, "", seeds[i], prompts[i], opts.samples_format, info=infotext(n, i), p=p, suffix="-before-face-restoration")
devices.torch_gc()
x_sample = modules.face_restoration.restore_faces(x_sample)
devices.torch_gc()
image = Image.fromarray(x_sample)
if p.color_corrections is not None and i < len(p.color_corrections):
if opts.save and not p.do_not_save_samples and opts.save_images_before_color_correction:
image_without_cc = apply_overlay(image, p.paste_to, i, p.overlay_images)
images.save_image(image_without_cc, p.outpath_samples, "", seeds[i], prompts[i], opts.samples_format, info=infotext(n, i), p=p, suffix="-before-color-correction")
image = apply_color_correction(p.color_corrections[i], image)
image = apply_overlay(image, p.paste_to, i, p.overlay_images)
if opts.samples_save and not p.do_not_save_samples:
images.save_image(image, p.outpath_samples, "", seeds[i], prompts[i], opts.samples_format, info=infotext(n, i), p=p)
text = infotext(n, i)
infotexts.append(text)
if opts.enable_pnginfo:
image.info["parameters"] = text
output_images.append(image)
del x_samples_ddim
devices.torch_gc()
state.nextjob()
p.color_corrections = None
devices.torch_gc()
if len(model_hijack.comments) > 0:
for comment in model_hijack.comments:
comments[comment] = 1
res = Processed(p, output_images, p.all_seeds[0], infotext() + "".join(["\n\n" + x for x in comments]), subseed=p.all_subseeds[0], all_prompts=p.all_prompts, all_seeds=p.all_seeds, all_subseeds=p.all_subseeds, index_of_first_image=0, infotexts=infotexts)
if p.scripts is not None:
p.scripts.postprocess(p, res)
return res
# ↑↑↑ the above is modified from 'modules/processing.py' ↑↑↑
class Script(scripts.Script):
def title(self):
return 'Prompt Travel'
def describe(self):
return "Gradually travels from one prompt to another in the semantical latent space."
def show(self, is_img2img):
return True
def ui(self, is_img2img):
steps = gr.Textbox(label='Steps between prompts', value=lambda: DEFAULT_STEPS, precision=0)
video_save = gr.Checkbox(label='Save results as video', value=lambda: DEFAULT_SAVE)
video_fps = gr.Number(label='Frames per second', value=lambda: DEFAULT_FPS)
show_debug = gr.Checkbox(label='Show verbose debug info at console', value=lambda: DEFAULT_DEBUG)
return [steps, video_save, video_fps, show_debug]
def get_next_sequence_number(path):
from pathlib import Path
"""
Determines and returns the next sequence number to use when saving an image in the specified directory.
The sequence starts at 0.
"""
result = -1
dir = Path(path)
for file in dir.iterdir():
if not file.is_dir(): continue
try:
num = int(file.name)
if num > result: result = num
except ValueError:
pass
return result + 1
def run(self, p:StableDiffusionProcessing, steps:str, video_save:bool, video_fps:int, show_debug:bool):
initial_info = None
images = []
prompt_pos = p.prompt .strip()
prompt_neg = p.negative_prompt.strip()
if not prompt_pos:
print('positive prompt should not be empty')
return Processed(p, images, p.seed)
# prepare prompts
pos_prompts = prompt_pos.split('\n')
neg_prompts = prompt_neg.split('\n')
n_stages = max(len(pos_prompts), len(neg_prompts))
while len(pos_prompts) < n_stages: pos_prompts.append(pos_prompts[-1])
while len(neg_prompts) < n_stages: neg_prompts.append(neg_prompts[-1])
steps = steps.strip()
try:
steps = [int(s.strip()) for s in steps.split(',')]
except:
print(f'cannot parse steps options: {steps}')
return Processed(p, images, p.seed)
if len(steps) == 1:
steps = [steps[0]] * (n_stages - 1)
elif len(steps) != n_stages - 1:
print(f'stage count mismatch: you have {n_stages} prompt stages, but specified {len(steps)} steps; should be len(steps) = len(stages) - 1')
return Processed(p, images, p.seed)
count = sum(steps) + n_stages
print(f'n_stages={n_stages}, steps={steps}')
steps.insert(0, -1) # fixup the first stage
# Custom seed travel saving
travel_path = os.path.join(p.outpath_samples, 'prompt_travel')
os.makedirs(travel_path, exist_ok=True)
travel_number = Script.get_next_sequence_number(travel_path)
travel_path = os.path.join(travel_path, f"{travel_number:05}")
p.outpath_samples = travel_path
os.makedirs(travel_path, exist_ok=True)
# Force Batch Count and Batch Size to 1.
p.n_iter = 1
p.batch_size = 1
# Random unified const seed
if p.seed == -1: seed = random.randint(0, 2147483647)
else: seed = p.seed
print('seed:', seed)
# Start job
state.job_count = count
print(f"Generating {count} images.")
def weighted_sum(A, B, alpha, kind):
C = deepcopy(A)
if kind == 'pos':
condA = A.batch[0][0].schedules[0].cond
condB = B.batch[0][0].schedules[0].cond
condC = (1 - alpha) * condA + alpha * condB
end_at_step = C.batch[0][0].schedules[0].end_at_step
C.batch[0][0].schedules[0] = ScheduledPromptConditioning(end_at_step, condC)
if kind == 'neg':
condA = A[0][0].cond
condB = B[0][0].cond
condC = (1 - alpha) * condA + alpha * condB
end_at_step = C[0][0].end_at_step
C[0][0] = ScheduledPromptConditioning(end_at_step, condC)
return C
# draw the first image
if show_debug:
print(f'[stage 1/{n_stages}]')
print(f' pos prompts: {pos_prompts[0]}')
print(f' neg prompts: {neg_prompts[0]}')
p.prompt = pos_prompts[0]
p.negative_prompt = neg_prompts[0]
p.seed = seed
p.subseed = None
p.subseed_strength = 0.0
from_pos_hidden, from_neg_hidden, prompts, seeds, subseeds = process_images_inner_half_A(p)
proc = process_images_inner_half_B(p, from_pos_hidden, from_neg_hidden, prompts, seeds, subseeds)
if initial_info is None: initial_info = proc.info
images += proc.images
# travel through every stages
for i in range(1, n_stages):
if state.interrupted: break
if show_debug:
print(f'[stage {i+1}/{n_stages}]')
print(f' pos prompts: {pos_prompts[i]}')
print(f' neg prompts: {neg_prompts[i]}')
# only change target prompts
p.prompt = pos_prompts[i]
p.negative_prompt = neg_prompts[i]
p.seed = seed
p.subseed = None
p.subseed_strength = 0.0
to_pos_hidden, to_neg_hidden, prompts, seeds, subseeds = process_images_inner_half_A(p)
# draw the interpolated images
n_inter = steps[i] + 1
for t in range(1, n_inter + 1):
if state.interrupted: break
alpha = t / n_inter # [1/N, 2/N, .. N/N=1], including the target stage
inter_pos_hidden = weighted_sum(from_pos_hidden, to_pos_hidden, alpha, kind='pos')
inter_neg_hidden = weighted_sum(from_neg_hidden, to_neg_hidden, alpha, kind='neg')
proc = process_images_inner_half_B(p, inter_pos_hidden, inter_neg_hidden, prompts, seeds, subseeds)
if initial_info is None: initial_info = proc.info
images += proc.images
# move to next stage
from_pos_hidden = to_pos_hidden
from_neg_hidden = to_neg_hidden
if video_save:
try:
clip = ImageSequenceClip([np.asarray(t) for t in images], fps=video_fps)
clip.write_videofile(os.path.join(travel_path, f"travel-{travel_number:05}.mp4"), verbose=False, audio=False, logger=None)
except: pass
return Processed(p, images, p.seed, initial_info)