update README, especially for English

2023-03-24 13:10:57 +09:00 · 2023-03-24 13:10:57 +09:00 · bf34eed362
parent 7ce3991695
commit bf34eed362
2 changed files with 313 additions and 56 deletions
--- a/README.md
+++ b/README.md
@ -1,33 +1,29 @@
-[English ver. (WIP)](./README_en.md)
+[English ver.](./README_en.md)

 # Dump U-Net

 ## 目次

-<!-- TOC -->
-
 - [Dump U-Net](#dump-u-net)
-    - [1. 目次](#1-%E7%9B%AE%E6%AC%A1)
-    - [2. これは何](#2-%E3%81%93%E3%82%8C%E3%81%AF%E4%BD%95)
-    - [3. できること](#3-%E3%81%A7%E3%81%8D%E3%82%8B%E3%81%93%E3%81%A8)
-    - [4. 特徴量の抽出](#4-%E7%89%B9%E5%BE%B4%E9%87%8F%E3%81%AE%E6%8A%BD%E5%87%BA)
-        - [4.1. U-Net の特徴量画像](#41-u-net-%E3%81%AE%E7%89%B9%E5%BE%B4%E9%87%8F%E7%94%BB%E5%83%8F)
-            - [4.1.1. 画面説明](#411-%E7%94%BB%E9%9D%A2%E8%AA%AC%E6%98%8E)
-            - [4.1.2. Colorization](#412-colorization)
-            - [4.1.3. Dump Setting](#413-dump-setting)
-            - [4.1.4. 抽出画像の例](#414-%E6%8A%BD%E5%87%BA%E7%94%BB%E5%83%8F%E3%81%AE%E4%BE%8B)
-        - [4.2. アテンション層の特徴量抽出](#42-%E3%82%A2%E3%83%86%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%B3%E5%B1%A4%E3%81%AE%E7%89%B9%E5%BE%B4%E9%87%8F%E6%8A%BD%E5%87%BA)
-            - [4.2.1. 画面説明](#421-%E7%94%BB%E9%9D%A2%E8%AA%AC%E6%98%8E)
-            - [4.2.2. 例](#422-%E4%BE%8B)
-    - [5. ブロックごとのプロンプトの変更](#5-%E3%83%96%E3%83%AD%E3%83%83%E3%82%AF%E3%81%94%E3%81%A8%E3%81%AE%E3%83%97%E3%83%AD%E3%83%B3%E3%83%97%E3%83%88%E3%81%AE%E5%A4%89%E6%9B%B4)
-        - [5.1. 概要](#51-%E6%A6%82%E8%A6%81)
-        - [5.2. 画面説明](#52-%E7%94%BB%E9%9D%A2%E8%AA%AC%E6%98%8E)
-        - [5.3. 記法](#53-%E8%A8%98%E6%B3%95)
-        - [5.4. 例](#54-%E4%BE%8B)
-        - [5.5. Dynamic Prompts との併用](#55-dynamic-prompts-%E3%81%A8%E3%81%AE%E4%BD%B5%E7%94%A8)
-    - [6. TODO](#6-todo)
-
-<!-- /TOC -->
+  - [目次](#目次)
+  - [これは何](#これは何)
+  - [できること](#できること)
+  - [特徴量の抽出](#特徴量の抽出)
+    - [U-Net の特徴量画像](#u-net-の特徴量画像)
+      - [画面説明](#画面説明)
+      - [Colorization](#colorization)
+      - [Dump Setting](#dump-setting)
+      - [抽出画像の例](#抽出画像の例)
+    - [アテンション層の特徴量抽出](#アテンション層の特徴量抽出)
+      - [画面説明](#画面説明-1)
+      - [例](#例)
+  - [ブロックごとのプロンプトの変更](#ブロックごとのプロンプトの変更)
+    - [概要](#概要)
+    - [画面説明](#画面説明-2)
+    - [記法](#記法)
+    - [例](#例-1)
+    - [Dynamic Prompts との併用](#dynamic-prompts-との併用)
+  - [TODO](#todo)

 ## これは何

@ -158,8 +154,6 @@ Seed: 1719471015

 ### アテンション層の特徴量抽出

-現バージョンではクロスアテンション層の `Q*K` を出力する。
-
 #### 画面説明

 ![](images/README_00_07.png)
@ -325,5 +319,4 @@ a  excellent  girl

 ## TODO

- K, VQK の可視化
 - セルフアテンション層の可視化
--- a/README_en.md
+++ b/README_en.md
@ -1,60 +1,324 @@
-An extension for [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that adds a custom script which let you to observe U-Net feature maps.
+# Dump U-Net

-# Example
+## Table of contents

-Model Output Image:
+- [Dump U-Net](#dump-u-net)
+  - [Table of contents](#table-of-contents)
+  - [What is this](#what-is-this)
+  - [What can this](#what-can-this)
+  - [Feature extraction](#feature-extraction)
+    - [Feature extraction from U-Net](#feature-extraction-from-u-net)
+      - [UI description](#ui-description)
+      - [Colorization](#colorization)
+      - [Dump Setting](#dump-setting)
+      - [Examples of extracted images](#examples-of-extracted-images)
+    - [Feature extraction from Attention layer](#feature-extraction-from-attention-layer)
+      - [UI description](#ui-description-1)
+      - [Examples](#examples)
+  - [Per-block Prompts](#per-block-prompts)
+    - [Overview](#overview)
+    - [UI description](#ui-description-2)
+    - [Notation](#notation)
+    - [Examples](#examples-1)
+    - [Use with Dynamic Prompts](#use-with-dynamic-prompts)
+  - [TODO](#todo)
+
+## What is this
+
+This is an extension for [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that adds a custom script which let you to observe U-Net feature maps.
+
+## What can this
+
+This extension can
+
+1. visualize intermediate output of the model: features of each block of U-Net and attention layer.
+2. per-block prompts: generate images changing the prompt in each block of U-Net.
+3. visualize the difference of features in 2.
+
+## Feature extraction
+
+Use the image below as an example.

 ![Model Output Image](images/00.png)

 ```
 Model: waifu-diffusion-v1-3-float16 (84692140)
 Prompt: a cute girl, pink hair
+Sampling steps: 20
 Sampling Method: DPM++ 2M Karras
 Size: 512x512
 CFG Scale: 7
 Seed: 1719471015
 ```

-U-Net features:
+### Feature extraction from U-Net

-Let the feature value is `v`, larger `|v|` is white, and zero is black.
+For example, the following images are generated.
+
+Grayscale output `OUT11, steps 20, Black/White, Sigmoid(1,0)`
+![](images/README_00_01_gray.png)
+
+Colored output `OUT11, steps 20, Custom, Sigmoid(1,0), H=(2+v)/3, S=1.0, V=0.5`
+![](images/README_00_01_color.png)
+
+#### UI description
+
+![](images/README_00.png)
+
+<dl>
+<dt>Extract U-Net features</dt>
+<dd>If checked, U-Net feature extraction is enabled.</dd>
+<dt>Layers</dt>
+<dd>Specify blocks to be extracted. Comma delimiters and hyphen delimiters can be used. <code>IN11</code>, <code>M00</code> and <code>OUT00</code> are connected.</dd>
+<dt>Image saving steps</dt>
+<dd>Specify the steps processing extraction.</dd>
+<dt>Colorization</dt>
+<dd>Specify how colorize the output images.</dd>
+<dt>Dump Setting</dt>
+<dd>Configure "binary-dump" settings.</dd>
+<dt>Selected Layer Info</dt>
+<dd>Details of the block input/output specified in <code>Layer</code> section.</dd>
+</dl>
+
+In `Layer` section you can use the grammer below:
+
+```
+single block: IN00
+    You can use IN00, IN01, ...,  IN11, M00, OUT00, OUT01 ..., OUT11.
+multiple blocks: IN00, IN01, M00
+    Comma separated block names.
+range: IN00-OUT11
+    Hyphen separated block names.
+    Edges are included in the range.
+    IN11, M00 and OUT00 are connected.
+range with steps: IN00-OUT11(+2)
+    `(+digits)` after the range defines steps.
+    `+1` is same as normal range.
+    `+2` means "every other block".
+    For instance, `IN00-OUT11(+2)` means:
+      IN00, IN02, IN04, IN06, IN08, IN10,
+      M00,
+      OUT01, OUT03, OUT05, OUT07, OUT09, OUT11
+```
+
+#### Colorization
+
+![](images/README_00_02.png)
+
+<dl>
+<dt>Colorize method</dt>
+<dd>Specifies the colorization method.<br/>
+Let <code>v</code> be the feature value.</br>
+<code>White/Black</code> shows white pixel for large <code>|v|</code>, black pixel for small <code>|v|</code>.<br/>
+<code>Red/Blue</code> shows red pixel for large <code>v</code>, blue pixel for small <code>|v|</code>.<br/>
+<code>Custom</code> computes the color from <code>v</code>. You can use RGB or HSL colorspace.
+</dd>
+<dt>Value transform</dt>
+<dd>
+Feature values are not suitable to be used as-is to specify colors. This section specifies the conversion method from feature values to pixel values.<br/>
+<code>Auto [0,1]</code> converts the value to <code>[0,1]</code> linearly using the minimum and maximum values of given feature values.<br/>
+<code>Auto [-1,1]</code> converts the value to <code>[-1,1]</code> as well.<br/>
+<code>Linear</code> first clamps feature values to specified <code>Clamp min./max.</code> range. Then linearly converts values to <code>[0,1]</code> when <code>Colorize method</code> is </code>White/Black</code> and to </code>[-1,1]</code> otherwise.<br/>
+<img src="images/README_00_03.png"/><br/>
+<code>Sigmoid</code> is a <a href="https://en.wikipedia.org/wiki/Sigmoid_function" target="_blank">sigmoid function</a> with specified gain and x-offset. The output is in range <code>[0,1]</code> when <code>Colorize method</code> is <code>White/Black</code>, and <code>[-1,1]</code> otherwise.<br/>
+<img src="images/README_00_04.png"/>
+</dd>
+<dt>Color space</dt>
+<dd>Write code to convert <code>v</code> transformed by <code>Value transform</code> to the pixel value, where <code>v</code> is given as <code>[0,1]</code> or <code>[-1,1]</code> according to <code>Colorize method</code> and <code>Value transform</code>. The result is clipped at <code>[0,1]</code>.<br/>
+The code is executed with <code>numpy</code> module as the global environment. For example, <code>abs(v)</code> means <code>numpy.abs(v)</code>.<br/>
+<img src="images/README_00_05.png"/>
+</dd>
+</dl>
+
+#### Dump Setting
+
+![](images/README_00_06.png)
+
+<dl>
+<dt>Dump feature tensors to files</dt>
+<dd>If checked, U-Net feature tensors are exported as files.</dd>
+<dt>Output path</dt>
+<dd>Specify the directory to output binaries. If it does not exist, it will be created.</dd>
+</dl>
+
+#### Examples of extracted images
+
+Images with `steps=1,5,10` from left to right.

 - IN00 (64x64, 320ch)
+![IN00](images/IN00.jpg)

-step 1
+- IN05 (32x32, 640ch)
+![IN05](images/IN05.jpg)

-![IN00 step1](images/IN00-step01.png)
+- M00 (8x8, 1280ch)
+![M00](images/M00.jpg)

-step 10
-
-![IN00 step10](images/IN00-step10.png)
-
-step 20
-
-![IN00 step20](images/IN00-step20.png)
-
- OUT02 (16x16, 1280ch)
-
-step 20
-
-![OUT02 step20](images/OUT02-step20.png)
+- OUT06 (32x32, 640ch)
+![OUT06](images/OUT06.jpg)

 - OUT11 (64x64, 320ch)
+![OUT11](images/OUT11.jpg)

-step 1
+### Feature extraction from Attention layer

-![OUT11 step1](images/OUT11-step01.png)
+#### UI description

-step 10
+![](images/README_00_07.png)

-![OUT11 step10](images/OUT11-step10.png)
+Same as [Feature extraction from U-Net](#feature-extraction-from-u-net).

-step 20
+#### Examples

-![OUT11 step20](images/OUT11-step20.png)
+The horizontal axis represents the token position. The beginning token and ending token are inserted, so the 75 images in between represent the influence of each token.

-Color map mode:
+The vertical axis represents the heads of the attention layer. In the current model, <code>h=8</code>, so there will be 8 images in a row.

-Red means the value is positive, and blue means the value is negative.
+"It seems `pink hair` is working on this layer..." Something like that can be seen.

-![OUT11 step20 cm](images/OUT11-step20-cm.png)
+- IN01
+![Attention IN01](images/attn-IN01.png)
+
+- M00
+![Attention M00](images/attn-M00.png)
+
+- OUT10
+![Attention OUt10](images/attn-OUT10.png)
+
+## Per-block Prompts
+
+### Overview
+
+See the following article for content (Japanese lang).
+
+[Generating images with different prompts for each block in Stable Diffusion's U-Net (block-specific prompts)](https://note.com/kohya_ss/n/n93b7c01b0547)
+
+<img src="images/README_02a.png" alt="Example of Difference map" width="256" height="256"/>
+<img src="images/README_02_IN00.png" alt="Example of Difference map" width="256" height="256"/>
+<img src="images/README_02_IN05.png" alt="Example of Difference map" width="256" height="256"/>
+<img src="images/README_02_M00.png" alt="Example of Difference map" width="256" height="256"/>
+
+```
+Model: waifu-diffusion-v1-3-float16 (84692140)
+Prompt: a (~: IN00-OUT11: cute; M00: excellent :~) girl
+Sampling Method: Euler a
+Size: 512x512
+CFG Scale: 7
+Seed: 3292581281
+```
+
+The above images are in order:
+
+- generated by `a cute girl`.
+- with cute changed to excellent in IN00
+- with cute changed to excellent in IN05
+- with cute changed to excellent in M00
+
+### UI description
+
+![](images/README_01.png)
+
+Same as [Feature extraction from U-Net](#feature-extraction-from-u-net)
+
+<dl>
+<dt>Output difference map of U-Net features between with and without Layer Prompt</dt>
+<dd>Add outputs to an image which shows difference between per-block prompt disabled and enabled.</dd>
+</dl>
+
+### Notation
+
+Use notation below in the prompt:
+
+```
+a (~: IN00-OUT11: cute ; M00: excellent :~) girl
+```
+
+In above case, IN00-OUT11 (i.e. whole generation process) use
+
+```
+a  cute  girl
+```
+
+but for M00
+
+```
+a  excellent  girl
+```
+
+You can specify per-block prompts with the grammer below:
+
+```
+(~:
+    block-spec:prompt;
+    block-spec:prompt;
+    ...
+    block-spec:prompt;
+:~)
+```
+
+After `(~:`, before `:~)`, before `:`, and after `;`, you may insert spaces. Note that the `:prompt;` is reflected in the result as it is with spaces. The semicolon after the last prompt may be omitted.
+
+The block specification (`block-spec` above) is as follows. Generally, it is the same as X/Y plot. If there are overlapping ranges, the later one takes precedence.
+
+```
+single block: IN00
+    You can use IN00, IN01, ...,  IN11, M00, OUT00, OUT01 ..., OUT11.
+multiple blocks: IN00, IN01, M00
+    Comma separated block names.
+range: IN00-OUT11
+    Hyphen separated block names.
+    Edges are included in the range.
+    IN11, M00 and OUT00 are connected.
+range with steps: IN00-OUT11(+2)
+    `(+digits)` after the range defines steps.
+    `+1` is same as normal range.
+    `+2` means "every other block".
+    For instance, `IN00-OUT11(+2)` means:
+      IN00, IN02, IN04, IN06, IN08, IN10,
+      M00,
+      OUT01, OUT03, OUT05, OUT07, OUT09, OUT11
+otherwise: _ (underbar)
+    This is a special symbol and has the lowest precedence.
+    If any other block specs are matched, the prompt defined here will be used.
+```
+
+### Examples
+
+A few examaples.
+
+```
+1: (~: IN00: A ; IN01: B :~)
+2: (~: IN00: A ; IN01: B ; IN02: C :~)
+3: (~: IN00: A ; IN01: B ; IN02: C ; _ : D :~)
+4: (~: IN00,IN01: A ; M00 : B :~)
+5: (~: IN00-OUT11: A ; M00 : B :~)
+```
+
+1: use A in IN00, B in IN01, and nothing in other blocks.
+2: use A in IN00, B in IN01, C in IN02 and nothing in other blocks.
+3: use A in IN00, B in IN01, C in IN02 and D in other blocks.
+4: use A in IN00 and IN01, B in M00, and nothing in other blocks.
+5: use A in from IN00 to OUT11 (all blocks), but B for M00.
+
+### Use with Dynamic Prompts
+
+For experiments, [Dynamic Prompts](https://github.com/adieyal/sd-dynamic-prompts) is useful.
+
+For instance, if you want to see the effect of changing the prompt in only one block, enable Jinja Template in Dynamic Prompts and input the following prompt:
+
+```
+{% for layer in [ "IN00", "IN01", "IN02", "IN03", "IN04", "IN05", "IN06", "IN07", "IN08", "IN09", "IN10", "IN11", "M00", "OUT00", "OUT01", "OUT02", "OUT03", "OUT04", "OUT05", "OUT06", "OUT07", "OUT08", "OUT09", "OUT10", "OUT11" ] %}
+  {% prompt %}a cute school girl, pink hair, wide shot, (~:{{layer}}:bad anatomy:~){% endprompt %}
+{% endfor %}
+```
+
+to check the effect of `bad anatomy` in each block.
+
+Actual examples is here (Japasese lang).
+
+[Test adding prompts to one specific block with prompts by block](https://gist.github.com/hnmr293/7f240aa5b74c0f5a27a9764fdd9672e2)
+
+## TODO
+
+- visualize self-attention layer
+-