From bf34eed362b09621a521fdb1f358ad42f004d547 Mon Sep 17 00:00:00 2001 From: hnmr293 Date: Fri, 24 Mar 2023 13:10:57 +0900 Subject: [PATCH] update README, especially for English --- README.md | 47 ++++---- README_en.md | 322 ++++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 313 insertions(+), 56 deletions(-) diff --git a/README.md b/README.md index cb53e55..8f7c8b0 100644 --- a/README.md +++ b/README.md @@ -1,33 +1,29 @@ -[English ver. (WIP)](./README_en.md) +[English ver.](./README_en.md) # Dump U-Net ## 目次 - - - [Dump U-Net](#dump-u-net) - - [1. 目次](#1-%E7%9B%AE%E6%AC%A1) - - [2. これは何](#2-%E3%81%93%E3%82%8C%E3%81%AF%E4%BD%95) - - [3. できること](#3-%E3%81%A7%E3%81%8D%E3%82%8B%E3%81%93%E3%81%A8) - - [4. 特徴量の抽出](#4-%E7%89%B9%E5%BE%B4%E9%87%8F%E3%81%AE%E6%8A%BD%E5%87%BA) - - [4.1. U-Net の特徴量画像](#41-u-net-%E3%81%AE%E7%89%B9%E5%BE%B4%E9%87%8F%E7%94%BB%E5%83%8F) - - [4.1.1. 画面説明](#411-%E7%94%BB%E9%9D%A2%E8%AA%AC%E6%98%8E) - - [4.1.2. Colorization](#412-colorization) - - [4.1.3. Dump Setting](#413-dump-setting) - - [4.1.4. 抽出画像の例](#414-%E6%8A%BD%E5%87%BA%E7%94%BB%E5%83%8F%E3%81%AE%E4%BE%8B) - - [4.2. アテンション層の特徴量抽出](#42-%E3%82%A2%E3%83%86%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%B3%E5%B1%A4%E3%81%AE%E7%89%B9%E5%BE%B4%E9%87%8F%E6%8A%BD%E5%87%BA) - - [4.2.1. 画面説明](#421-%E7%94%BB%E9%9D%A2%E8%AA%AC%E6%98%8E) - - [4.2.2. 例](#422-%E4%BE%8B) - - [5. ブロックごとのプロンプトの変更](#5-%E3%83%96%E3%83%AD%E3%83%83%E3%82%AF%E3%81%94%E3%81%A8%E3%81%AE%E3%83%97%E3%83%AD%E3%83%B3%E3%83%97%E3%83%88%E3%81%AE%E5%A4%89%E6%9B%B4) - - [5.1. 概要](#51-%E6%A6%82%E8%A6%81) - - [5.2. 画面説明](#52-%E7%94%BB%E9%9D%A2%E8%AA%AC%E6%98%8E) - - [5.3. 記法](#53-%E8%A8%98%E6%B3%95) - - [5.4. 例](#54-%E4%BE%8B) - - [5.5. Dynamic Prompts との併用](#55-dynamic-prompts-%E3%81%A8%E3%81%AE%E4%BD%B5%E7%94%A8) - - [6. TODO](#6-todo) - - + - [目次](#目次) + - [これは何](#これは何) + - [できること](#できること) + - [特徴量の抽出](#特徴量の抽出) + - [U-Net の特徴量画像](#u-net-の特徴量画像) + - [画面説明](#画面説明) + - [Colorization](#colorization) + - [Dump Setting](#dump-setting) + - [抽出画像の例](#抽出画像の例) + - [アテンション層の特徴量抽出](#アテンション層の特徴量抽出) + - [画面説明](#画面説明-1) + - [例](#例) + - [ブロックごとのプロンプトの変更](#ブロックごとのプロンプトの変更) + - [概要](#概要) + - [画面説明](#画面説明-2) + - [記法](#記法) + - [例](#例-1) + - [Dynamic Prompts との併用](#dynamic-prompts-との併用) + - [TODO](#todo) ## これは何 @@ -158,8 +154,6 @@ Seed: 1719471015 ### アテンション層の特徴量抽出 -現バージョンではクロスアテンション層の `Q*K` を出力する。 - #### 画面説明 ![](images/README_00_07.png) @@ -325,5 +319,4 @@ a excellent girl ## TODO -- K, VQK の可視化 - セルフアテンション層の可視化 \ No newline at end of file diff --git a/README_en.md b/README_en.md index 6f9835d..362a7b8 100644 --- a/README_en.md +++ b/README_en.md @@ -1,60 +1,324 @@ -An extension for [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that adds a custom script which let you to observe U-Net feature maps. +# Dump U-Net -# Example +## Table of contents -Model Output Image: +- [Dump U-Net](#dump-u-net) + - [Table of contents](#table-of-contents) + - [What is this](#what-is-this) + - [What can this](#what-can-this) + - [Feature extraction](#feature-extraction) + - [Feature extraction from U-Net](#feature-extraction-from-u-net) + - [UI description](#ui-description) + - [Colorization](#colorization) + - [Dump Setting](#dump-setting) + - [Examples of extracted images](#examples-of-extracted-images) + - [Feature extraction from Attention layer](#feature-extraction-from-attention-layer) + - [UI description](#ui-description-1) + - [Examples](#examples) + - [Per-block Prompts](#per-block-prompts) + - [Overview](#overview) + - [UI description](#ui-description-2) + - [Notation](#notation) + - [Examples](#examples-1) + - [Use with Dynamic Prompts](#use-with-dynamic-prompts) + - [TODO](#todo) + +## What is this + +This is an extension for [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that adds a custom script which let you to observe U-Net feature maps. + +## What can this + +This extension can + +1. visualize intermediate output of the model: features of each block of U-Net and attention layer. +2. per-block prompts: generate images changing the prompt in each block of U-Net. +3. visualize the difference of features in 2. + +## Feature extraction + +Use the image below as an example. ![Model Output Image](images/00.png) ``` Model: waifu-diffusion-v1-3-float16 (84692140) Prompt: a cute girl, pink hair +Sampling steps: 20 Sampling Method: DPM++ 2M Karras Size: 512x512 CFG Scale: 7 Seed: 1719471015 ``` -U-Net features: +### Feature extraction from U-Net -Let the feature value is `v`, larger `|v|` is white, and zero is black. +For example, the following images are generated. + +Grayscale output `OUT11, steps 20, Black/White, Sigmoid(1,0)` +![](images/README_00_01_gray.png) + +Colored output `OUT11, steps 20, Custom, Sigmoid(1,0), H=(2+v)/3, S=1.0, V=0.5` +![](images/README_00_01_color.png) + +#### UI description + +![](images/README_00.png) + +
+
Extract U-Net features
+
If checked, U-Net feature extraction is enabled.
+
Layers
+
Specify blocks to be extracted. Comma delimiters and hyphen delimiters can be used. IN11, M00 and OUT00 are connected.
+
Image saving steps
+
Specify the steps processing extraction.
+
Colorization
+
Specify how colorize the output images.
+
Dump Setting
+
Configure "binary-dump" settings.
+
Selected Layer Info
+
Details of the block input/output specified in Layer section.
+
+ +In `Layer` section you can use the grammer below: + +``` +single block: IN00 + You can use IN00, IN01, ..., IN11, M00, OUT00, OUT01 ..., OUT11. +multiple blocks: IN00, IN01, M00 + Comma separated block names. +range: IN00-OUT11 + Hyphen separated block names. + Edges are included in the range. + IN11, M00 and OUT00 are connected. +range with steps: IN00-OUT11(+2) + `(+digits)` after the range defines steps. + `+1` is same as normal range. + `+2` means "every other block". + For instance, `IN00-OUT11(+2)` means: + IN00, IN02, IN04, IN06, IN08, IN10, + M00, + OUT01, OUT03, OUT05, OUT07, OUT09, OUT11 +``` + +#### Colorization + +![](images/README_00_02.png) + +
+
Colorize method
+
Specifies the colorization method.
+Let v be the feature value.
+White/Black shows white pixel for large |v|, black pixel for small |v|.
+Red/Blue shows red pixel for large v, blue pixel for small |v|.
+Custom computes the color from v. You can use RGB or HSL colorspace. +
+
Value transform
+
+Feature values are not suitable to be used as-is to specify colors. This section specifies the conversion method from feature values to pixel values.
+Auto [0,1] converts the value to [0,1] linearly using the minimum and maximum values of given feature values.
+Auto [-1,1] converts the value to [-1,1] as well.
+Linear first clamps feature values to specified Clamp min./max. range. Then linearly converts values to [0,1] when Colorize method is White/Black and to [-1,1] otherwise.
+
+Sigmoid is a sigmoid function with specified gain and x-offset. The output is in range [0,1] when Colorize method is White/Black, and [-1,1] otherwise.
+ +
+
Color space
+
Write code to convert v transformed by Value transform to the pixel value, where v is given as [0,1] or [-1,1] according to Colorize method and Value transform. The result is clipped at [0,1].
+The code is executed with numpy module as the global environment. For example, abs(v) means numpy.abs(v).
+ +
+
+ +#### Dump Setting + +![](images/README_00_06.png) + +
+
Dump feature tensors to files
+
If checked, U-Net feature tensors are exported as files.
+
Output path
+
Specify the directory to output binaries. If it does not exist, it will be created.
+
+ +#### Examples of extracted images + +Images with `steps=1,5,10` from left to right. - IN00 (64x64, 320ch) +![IN00](images/IN00.jpg) -step 1 +- IN05 (32x32, 640ch) +![IN05](images/IN05.jpg) -![IN00 step1](images/IN00-step01.png) +- M00 (8x8, 1280ch) +![M00](images/M00.jpg) -step 10 - -![IN00 step10](images/IN00-step10.png) - -step 20 - -![IN00 step20](images/IN00-step20.png) - -- OUT02 (16x16, 1280ch) - -step 20 - -![OUT02 step20](images/OUT02-step20.png) +- OUT06 (32x32, 640ch) +![OUT06](images/OUT06.jpg) - OUT11 (64x64, 320ch) +![OUT11](images/OUT11.jpg) -step 1 +### Feature extraction from Attention layer -![OUT11 step1](images/OUT11-step01.png) +#### UI description -step 10 +![](images/README_00_07.png) -![OUT11 step10](images/OUT11-step10.png) +Same as [Feature extraction from U-Net](#feature-extraction-from-u-net). -step 20 +#### Examples -![OUT11 step20](images/OUT11-step20.png) +The horizontal axis represents the token position. The beginning token and ending token are inserted, so the 75 images in between represent the influence of each token. -Color map mode: +The vertical axis represents the heads of the attention layer. In the current model, h=8, so there will be 8 images in a row. -Red means the value is positive, and blue means the value is negative. +"It seems `pink hair` is working on this layer..." Something like that can be seen. -![OUT11 step20 cm](images/OUT11-step20-cm.png) +- IN01 +![Attention IN01](images/attn-IN01.png) + +- M00 +![Attention M00](images/attn-M00.png) + +- OUT10 +![Attention OUt10](images/attn-OUT10.png) + +## Per-block Prompts + +### Overview + +See the following article for content (Japanese lang). + +[Generating images with different prompts for each block in Stable Diffusion's U-Net (block-specific prompts)](https://note.com/kohya_ss/n/n93b7c01b0547) + +Example of Difference map +Example of Difference map +Example of Difference map +Example of Difference map + +``` +Model: waifu-diffusion-v1-3-float16 (84692140) +Prompt: a (~: IN00-OUT11: cute; M00: excellent :~) girl +Sampling Method: Euler a +Size: 512x512 +CFG Scale: 7 +Seed: 3292581281 +``` + +The above images are in order: + +- generated by `a cute girl`. +- with cute changed to excellent in IN00 +- with cute changed to excellent in IN05 +- with cute changed to excellent in M00 + +### UI description + +![](images/README_01.png) + +Same as [Feature extraction from U-Net](#feature-extraction-from-u-net) + +
+
Output difference map of U-Net features between with and without Layer Prompt
+
Add outputs to an image which shows difference between per-block prompt disabled and enabled.
+
+ +### Notation + +Use notation below in the prompt: + +``` +a (~: IN00-OUT11: cute ; M00: excellent :~) girl +``` + +In above case, IN00-OUT11 (i.e. whole generation process) use + +``` +a cute girl +``` + +but for M00 + +``` +a excellent girl +``` + +You can specify per-block prompts with the grammer below: + +``` +(~: + block-spec:prompt; + block-spec:prompt; + ... + block-spec:prompt; +:~) +``` + +After `(~:`, before `:~)`, before `:`, and after `;`, you may insert spaces. Note that the `:prompt;` is reflected in the result as it is with spaces. The semicolon after the last prompt may be omitted. + +The block specification (`block-spec` above) is as follows. Generally, it is the same as X/Y plot. If there are overlapping ranges, the later one takes precedence. + +``` +single block: IN00 + You can use IN00, IN01, ..., IN11, M00, OUT00, OUT01 ..., OUT11. +multiple blocks: IN00, IN01, M00 + Comma separated block names. +range: IN00-OUT11 + Hyphen separated block names. + Edges are included in the range. + IN11, M00 and OUT00 are connected. +range with steps: IN00-OUT11(+2) + `(+digits)` after the range defines steps. + `+1` is same as normal range. + `+2` means "every other block". + For instance, `IN00-OUT11(+2)` means: + IN00, IN02, IN04, IN06, IN08, IN10, + M00, + OUT01, OUT03, OUT05, OUT07, OUT09, OUT11 +otherwise: _ (underbar) + This is a special symbol and has the lowest precedence. + If any other block specs are matched, the prompt defined here will be used. +``` + +### Examples + +A few examaples. + +``` +1: (~: IN00: A ; IN01: B :~) +2: (~: IN00: A ; IN01: B ; IN02: C :~) +3: (~: IN00: A ; IN01: B ; IN02: C ; _ : D :~) +4: (~: IN00,IN01: A ; M00 : B :~) +5: (~: IN00-OUT11: A ; M00 : B :~) +``` + +1: use A in IN00, B in IN01, and nothing in other blocks. +2: use A in IN00, B in IN01, C in IN02 and nothing in other blocks. +3: use A in IN00, B in IN01, C in IN02 and D in other blocks. +4: use A in IN00 and IN01, B in M00, and nothing in other blocks. +5: use A in from IN00 to OUT11 (all blocks), but B for M00. + +### Use with Dynamic Prompts + +For experiments, [Dynamic Prompts](https://github.com/adieyal/sd-dynamic-prompts) is useful. + +For instance, if you want to see the effect of changing the prompt in only one block, enable Jinja Template in Dynamic Prompts and input the following prompt: + +``` +{% for layer in [ "IN00", "IN01", "IN02", "IN03", "IN04", "IN05", "IN06", "IN07", "IN08", "IN09", "IN10", "IN11", "M00", "OUT00", "OUT01", "OUT02", "OUT03", "OUT04", "OUT05", "OUT06", "OUT07", "OUT08", "OUT09", "OUT10", "OUT11" ] %} + {% prompt %}a cute school girl, pink hair, wide shot, (~:{{layer}}:bad anatomy:~){% endprompt %} +{% endfor %} +``` + +to check the effect of `bad anatomy` in each block. + +Actual examples is here (Japasese lang). + +[Test adding prompts to one specific block with prompts by block](https://gist.github.com/hnmr293/7f240aa5b74c0f5a27a9764fdd9672e2) + +## TODO + +- visualize self-attention layer +- \ No newline at end of file