v10.7.0

2024-03-06 19:39:11 -05:00 · 2024-03-06 19:39:11 -05:00 · 0718bbabd8
parent 2d4a184597
commit 0718bbabd8
93 changed files with 26901 additions and 148 deletions
--- a/config.json
+++ b/config.json
@ -80,7 +80,7 @@
 			{
 				"canny":"controlnet11Models_canny",
 				"depth":"controlnet11Models_depth",
-				"normalmap":"control_v11p_sd15_normalbae",
+				"normalmap":"controlnet11Models_normal",
 				"openpose":"controlnet11Models_openpose",
 				"mlsd":"",
 				"lineart":"controlnet11Models_animeline",
--- a/docs/ABOUT.md
+++ b/docs/ABOUT.md
@ -9,4 +9,4 @@ Unprompted is a powerful templating language written in Python. Unlike most temp

 Software created by [Therefore Games](https://therefore.games). If you like my work, you can [sponsor the project on ☕ Github](https://github.com/sponsors/ThereforeGames) or [support me on <span class="patreon-symbol">┃🔴</span> Patreon](https://patreon.com/thereforegames). Thank you!

-*Compatible with Python v3.10.6 and WebUI v1.6.0.*
+*Compatible with Python v3.10.6 and WebUI v1.7.0.*
--- a/docs/ANNOUNCEMENTS.md
+++ b/docs/ANNOUNCEMENTS.md
@ -1,6 +1,28 @@
 # Unprompted Announcements
 Stay informed on the latest Unprompted news and updates.

+<details><summary>Spice It Up - 6 March 2024</summary>
+
+Hi folks,
+
+I have just released Unprompted v10.7.0, which includes two notable features:
+
+First, the **Magic Spice template** that aims to "beautify" your Stable Diffusion results using techniques from [Fooocus](https://github.com/lllyasviel/Fooocus) and elsewhere.
+
+It can, for example: run a GPT-2 model to expand your prompt, automatically apply optimized Loras and embeddings, and even fix issues with image contrast. Here are some before/after examples using the `allspice_v1` preset:
+
+![magic_spice_demo]([base_dir]/images/posts/magic_spice_demo.jpg)
+
+This update also adds the `[autotone]` shortcode, which implements the Photoshop algorithm by the same name. It adjusts the black point of an image to enhance contrast. Particularly useful when working with low CFG or Loras that present gamma problems. Simply include `[after][autotone][/after]` in your prompts to engage the feature:
+
+![autotone_demo]([base_dir]/images/posts/autotone_demo.png)
+
+Finally, v10.7.0 addresses a few bugs and improves compatibility with the Forge WebUI.
+
+Thank you for enjoying Unprompted.
+
+</details>
+
 <details><summary>Cool Autumn Update — 11 October 2023</summary>

 Hi folks,
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@ -3,7 +3,42 @@ All notable changes to this project will be documented in this file.

 For more details on new features, please check the [Manual](./MANUAL.md).

-<details open><summary>10.6.0 - 1 December 2023</summary>
+<details open><summary>10.7.0 - 6 March 2024</summary>
+
+### Added
+- New shortcode `[autotone]`: Adjusts the black point of the image to enhance contrast (should be placed inside an `[after]` block)
+- New free template Magic Spice v0.0.1: Produces high-quality images regardless of the simplicity of your prompt, using ideas from Fooocus and elsewhere
+- `[faceswap]`: Now supports the `gender_bonus` kwarg to boost facial similarity score when source and target genders are equal (compatible with insightface pipeline only)
+- `[faceswap]`: Now supports the `age_influence` kwarg to penalize facial similarity score based on the difference of ages between source and target faces (compatible with insightface pipeline only)
+- `[faceswap]`: Now supports the `prefer_gpu` kwarg to run inference on the video card if possible
+- `[faceswap]`: The `make_embedding` option will now save gender and age values into the blended embedding
+- `[faceswap]`: The insightface analyser is now properly cached, improving inference time significantly
+- `[gpt]`: Now supports the `instruction` kwarg to help steer models that are capable of following instruction-response format prompts
+- Added a customized `insightface_cuda` package that swaps hardcoded CPU references to CUDA equivalents
+- Wizard UI now supports `_lines` and `_max_lines` to specify number of rows in a textbox UI element
+- Unprompted now detects if you're using the Forge WebUI
+- New txt2img preset `restart_fast_v1`
+- New txt2img preset `dpm_lightning_8step_v1`: Uses the new Lightning sampler and Lora in Forge WebUI for super fast SDXL inference
+- New helper function `str_to_rgb()`
+- Facelift template banner image
+
+### Changed
+- `[gpt]`: The default GPT-2 model is now `LykosAI/GPT-Prompt-Expansion-Fooocus-v2`
+- `[gpt]`: Renamed the `cache` parg to `unload` to match naming convention of other shortcodes
+- Facelift template now defaults to the `fast_v1` preset
+
+### Fixed
+- The `wizard_generate_shortcode()` and `wizard_generate_template()` methods will no longer escape special HTML characters in the prompt
+- `[after]`: Fixed compatibility issue with Forge WebUI
+- `[faceswap]`: The `export_embedding` parg will now bypass the cache to avoid errors
+- The `get_local_file_dir()` method now uses the `unprompted_dir` variable in case Unprompted is not in the usual `extensions` directory
+
+### Removed
+- Developer presets
+
+</details>
+
+<details><summary>10.6.0 - 1 December 2023</summary>

 ### Added
 - New settings `Config.ui.wizard_shortcodes`, `Config.ui.wizard_templates`, `Config.ui.wizard_capture`: Allows you to disable certain Wizard tabs in order to improve WebUI performance
--- a/docs/MANUAL.md
+++ b/docs/MANUAL.md
@ -18,6 +18,8 @@ In the meantime, you can improve performance by disabling Wizard tabs you do not

 To achieve compatibility between Unprompted and ControlNet, you must manually rename the `unprompted` extension folder to `_unprompted`. This is due to [a limitation in the Automatic1111 extension framework](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/8011) whereby priority is determined alphabetically.

+Additionally, if you're using the Forge WebUI, you should move `_unprompted` to `extensions-builtin/_unprompted` so that it can execute ahead of Forge's native ControlNet extension.
+
 </details>

 <details><summary>Compatibility with other extensions</summary>
@ -554,6 +556,8 @@ The `[set]` block supports `_show_label` which lets you toggle visibility of the

 The `[set]` block supports `_info` which is descriptive text that will appear near the UI element.

+The `[set]` block supports `_lines` and `_max_lines` to specify the number of rows shown in a `textbox` element.
+
 Supports the `[wizard]` shortcode which will group the inner `[set]` blocks into a group UI element, the type of which is defined by the first parg: `accordion`, `row`, or `column`.

 </details>
@ -750,6 +754,22 @@ RESULT: spelling is very difficult sometimes, okay!!!

 </details>

+<details><summary>[autotone]</summary>
+
+Adjusts the black point of a given image for enhanced contrast. The algorithm produces results that are virtually identical to the **Image > Auto Tone** feature in Photoshop.
+
+Supports the `file` kwarg which is the filepath to an image to modify. Defaults to the Stable Diffusion output.
+
+Supports the `show` parg which will append the original image to the output window.
+
+Supports the `out` kwarg which is a location to save the modified image to.
+
+```
+[after][autotone][/after]
+```
+
+</details>
+
 <details><summary>[bypass]</summary>

 Allows you to disable the execution of specific shortcodes for the remainder of the run. It is similar to `[override]`, but for shortcodes instead of variables. Particularly useful for debugging purposes.
@ -1131,23 +1151,25 @@ My name is [get name]

 <details><summary>[gpt]</summary>

-Processes the content with a given GPT model. This is similar to the "Magic Prompts" feature of Dynamic Prompts, if you're familiar with that.
+Processes the content with a given GPT-2 model. This is similar to the "Magic Prompts" feature of Dynamic Prompts, if you're familiar with that.

 This shortcode requires the "transformers" package which is included with the WebUI by default, but you may need to install the package manually if you're using Unprompted as a standalone program.

 You can leave the content blank for a completely randomized prompt.

-Supports the `model` kwarg which can accept a pretrained model identifier from the HuggingFace hub. Defaults to `Gustavosta/MagicPrompt-Stable-Diffusion`. The first time you use a new model, it will be downloaded to the `unprompted/models/gpt` folder.
+Supports the `model` kwarg which can accept a pretrained model identifier from the HuggingFace hub. Defaults to `LykosAI/GPT-Prompt-Expansion-Fooocus-v2`. The first time you use a new model, it will be downloaded to the `unprompted/models/gpt` folder.

 Please see the Wizard UI for a list of suggested models.

 Supports the `task` kwarg which determines behavior of the transformers pipeline module. Defaults to `text-generation`. You can set this to `summarization` if you want to shorten your prompts a la Midjourney.

+Supports the `instruction` kwarg which is a string to be prepended to the prompt. This text will be excluded from the final result. Example: `[gpt instruction="Generate a list of animals"]cat,[/gpt]` may return `cat, dog, bird, horse, cow`.
+
 Supports the `max_length` kwarg which is the maximum number of words to be returned by the shortcode. Defaults to 50.

 Supports the `min_length` kwarg which is the minimum number of words to be returned by the shortcode. Defaults to 1.

-Supports the `cache` parg to keep the model and tokenizer in memory between runs.
+Supports the `unload` parg to prevent keeping the model and tokenizer in memory between runs.


 </details>
@ -1689,7 +1711,7 @@ All of your kwargs are sent as URL parameters to the API (with the exception of

 Supports shorthand syntax with pargs, where the first parg is `types` (e.g. LORA or TextualInversion), the second parg is `query` (model name search terms), the third parg is `_weight` (optional, defaults to 1.0), and the fourth parg (also optional) is the `_file`. For example: `[civitai lora EasyNegative 0.5]`.

-The `query` value is used as the filename to look for on your filesystem. You can typically search Civitai for a direct model filename (e.g. `query="kkw-new-neg-v1.4"` will return the 'New Negative' model). However, if this isn't working for whatever reason, you can override the filesystem search with the `_file` kwarg: `[civitai query="New Negative" _file="kkw-new-neg-v1.4"]` - but consider this a last resort!
+The `query` value is used as the filename to look for on your filesystem. You can typically search Civitai for a direct model filename (e.g. `query="kkw-new-neg-v1.4"` will return the 'New Negative' model). However, if this isn't working for whatever reason, you can override the filesystem search with the `_file` kwarg: `[civitai query="New Negative" _file="kkw-new-neg-v1.4"]`.

 This shortcode will auto-correct the case-sensitivity of `types` to the API's expected format. The API is a bit inconsistent in this regard (e.g. lora = `LORA`, controlnet = `Controlnet`, aestheticgradient = `AestheticGradient`...) but Unprompted will handle it for you. Here are the other edge cases that Unprompted will catch:

@ -1734,11 +1756,15 @@ The `insightface` pipeline is currently the most developed option as it supports
 - It supports the `minimum_similarity` kwarg to bypass the faceswap if no one in the target picture bears resemblance to the new face. This kwarg takes a float value, although I haven't determined the upper and lower boundaries yet. A greater value means "more similar" and the range appears to be something like -10 to 300.
 - It supports the `export_embedding` parg which takes the average of all input faces and exports it to a safetensors embedding file. This file represents a composite face that can be used in lieu of individual images.
 - It supports the `embedding_path` kwarg which is the path to use in conjunction with `export_embedding`. Defaults to `unprompted/user/faces/blended_faces.safetensors`.
+- It supports the `gender_bonus` kwarg to boost facial similarity score when source and target genders are equal.
+- It supports the `age_influence` kwarg to penalize facial similarity score based on the difference of ages between source and target faces.

 Supports the `visibility` kwarg which is the alpha value with which to blend the result back into the original image. Defaults to 1.0.

 Supports the `unload` kwarg which allows you to free some or all of the faceswap components after inference. Useful for low memory devices, but will increase inference time. You can pass the following as a delimited string with `Config.syntax.delimiter`: `model`, `face`, `all`.

+Supports the `prefer_gpu` kwarg to run on the video card whenever possible.
+
 It is recommended to follow this shortcode with `[restore_faces]` in order to improve the resolution of the swapped result. Or, use the included Facelift template as an all-in-one solution.

 Additional pipelines may be supported in the future. Attempts were made to implement support for SimSwap, however this proved challenging due to multiple dependency conflicts.
--- a/images/posts/autotone_demo.png
+++ b/images/posts/autotone_demo.png
--- a/images/posts/magic_spice_demo.jpg
+++ b/images/posts/magic_spice_demo.jpg
--- a/images/promo_breadcrumbs.png
+++ b/images/promo_breadcrumbs.png
--- a/lib_unprompted/helpers.py
+++ b/lib_unprompted/helpers.py
@ -9,6 +9,7 @@ pil_resampling_dict["Hamming"] = 5
 pil_resampling_dict["Bicubic"] = 3
 pil_resampling_dict["Lanczos"] = 1

+
 def strip_str(string, chop):
 	"""Removes substring `chop` from the beginning or end of given `string`"""
 	while True:
@ -23,10 +24,12 @@ def strip_str(string, chop):
 			break
 	return string

+
 def sigmoid(x):
 	import math
 	return 1 / (1 + math.exp(-x))

+
 def is_equal(var_a, var_b):
 	"""Checks if two variables equal each other, taking care to account for datatypes."""
 	if (is_float(var_a)): var_a = float(var_a)
@ -57,13 +60,15 @@ def is_int(value):
 	except:
 		return False

-def ensure(var,datatype):
+
+def ensure(var, datatype):
 	"""Ensures that a variable is a given datatype"""
 	if isinstance(var, datatype): return var
 	else:
 		if datatype == list: return [var]
 		return datatype(var)

+
 def autocast(var):
 	"""Converts a variable between string, int, and float depending on how it's formatted"""
 	original_var = var
@ -74,23 +79,36 @@ def autocast(var):
 	elif (is_int(var)): var = int(var)
 	return (var)

+
 def pil_to_cv2(img):
 	import cv2, numpy
 	return cv2.cvtColor(numpy.array(img), cv2.COLOR_RGB2BGR)

+
 def cv2_to_pil(img):
 	import cv2
 	from PIL import Image
 	return Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

+
+def str_to_rgb(color_string):
+	"""Converts a color string to a tuple of RGB values"""
+	if color_string[0].isdigit():
+		return tuple(map(int, color_string.split(',')))
+	elif color_string.startswith("#"):
+		return bytes.fromhex(color_string[1:])
+
+
 def get_logger(logger=None):
 	if not logger:
 		try:
 			import logging
 			logger = logging.getLogger("Unprompted").info
-		except: logger = print
+		except:
+			logger = print
 	return logger

+
 def download_file(filename, url, logger=None, overwrite=False, headers=None):
 	import os, requests

@ -101,7 +119,7 @@ def download_file(filename, url, logger=None, overwrite=False, headers=None):
 		os.makedirs(os.path.dirname(os.path.abspath(filename)), exist_ok=True)

 		log(f"Downloading file into: {filename}...")
-		response = requests.get(url, stream=True,headers=headers)
+		response = requests.get(url, stream=True, headers=headers)
 		if response.status_code != 200:
 			log(f"Problematic status code received: {response.status_code}")
 			return False
@ -112,6 +130,7 @@ def download_file(filename, url, logger=None, overwrite=False, headers=None):
 				fout.write(block)
 	return True

+
 def import_file(full_name, path):
 	"""Allows importing of modules from full filepath, not sure why Python requires a helper function for this in 2023"""
 	from importlib import util
@ -129,6 +148,7 @@ def list_set(this_list, index, value, null_value=False):
 		this_list.append(null_value)
 	this_list[index] = value

+
 def str_with_ext(path, default_ext=".json"):
 	import os
 	if os.path.exists(path) or default_ext in path:
@ -150,6 +170,7 @@ def create_load_json(file_path, default_data={}, encoding="utf8"):

 	return data

+
 def unsharp_mask(image, amount=1.0, kernel_size=(5, 5), sigma=1.0, threshold=0):
 	"""Return a sharpened version of the image, using an unsharp mask."""
 	import numpy, cv2
@ -165,10 +186,11 @@ def unsharp_mask(image, amount=1.0, kernel_size=(5, 5), sigma=1.0, threshold=0):
 		numpy.copyto(sharpened, image, where=low_contrast_mask)
 	return Image.fromarray(sharpened)

+
 # Helper class that converts kwargs to attribute notation
 # Many libraries expect to be fed options with argparse,
 # which is not so straightforward inside of an A1111 extension
 class AttrDict(dict):
 	def __init__(self, *args, **kwargs):
 		super(AttrDict, self).__init__(*args, **kwargs)
-		self.__dict__ = self
+		self.__dict__ = self
--- a/lib_unprompted/insightface_cuda/init.py
+++ b/lib_unprompted/insightface_cuda/init.py
@ -0,0 +1,21 @@
+# coding: utf-8
+# pylint: disable=wrong-import-position
+"""InsightFace: A Face Analysis Toolkit."""
+from __future__ import absolute_import
+
+try:
+    #import mxnet as mx
+    import onnxruntime
+except ImportError:
+    raise ImportError(
+        "Unable to import dependency onnxruntime. "
+    )
+
+__version__ = '0.7.3'
+
+from . import model_zoo
+from . import utils
+from . import app
+from . import data
+from . import thirdparty
+
--- a/lib_unprompted/insightface_cuda/app/init.py
+++ b/lib_unprompted/insightface_cuda/app/init.py
@ -0,0 +1,2 @@
+from .face_analysis import *
+from .mask_renderer import *
--- a/lib_unprompted/insightface_cuda/app/common.py
+++ b/lib_unprompted/insightface_cuda/app/common.py
@ -0,0 +1,49 @@
+import numpy as np
+from numpy.linalg import norm as l2norm
+#from easydict import EasyDict
+
+class Face(dict):
+
+    def __init__(self, d=None, **kwargs):
+        if d is None:
+            d = {}
+        if kwargs:
+            d.update(**kwargs)
+        for k, v in d.items():
+            setattr(self, k, v)
+        # Class attributes
+        #for k in self.__class__.__dict__.keys():
+        #    if not (k.startswith('__') and k.endswith('__')) and not k in ('update', 'pop'):
+        #        setattr(self, k, getattr(self, k))
+
+    def __setattr__(self, name, value):
+        if isinstance(value, (list, tuple)):
+            value = [self.__class__(x)
+                    if isinstance(x, dict) else x for x in value]
+        elif isinstance(value, dict) and not isinstance(value, self.__class__):
+            value = self.__class__(value)
+        super(Face, self).__setattr__(name, value)
+        super(Face, self).__setitem__(name, value)
+
+    __setitem__ = __setattr__
+
+    def __getattr__(self, name):
+        return None
+
+    @property
+    def embedding_norm(self):
+        if self.embedding is None:
+            return None
+        return l2norm(self.embedding)
+
+    @property 
+    def normed_embedding(self):
+        if self.embedding is None:
+            return None
+        return self.embedding / self.embedding_norm
+
+    @property 
+    def sex(self):
+        if self.gender is None:
+            return None
+        return 'M' if self.gender==1 else 'F'
--- a/lib_unprompted/insightface_cuda/app/face_analysis.py
+++ b/lib_unprompted/insightface_cuda/app/face_analysis.py
@ -0,0 +1,109 @@
+# -*- coding: utf-8 -*-
+# @Organization  : insightface.ai
+# @Author        : Jia Guo
+# @Time          : 2021-05-04
+# @Function      : 
+
+
+from __future__ import division
+
+import glob
+import os.path as osp
+
+import numpy as np
+import onnxruntime
+from numpy.linalg import norm
+
+from ..model_zoo import model_zoo
+from ..utils import DEFAULT_MP_NAME, ensure_available
+from .common import Face
+
+__all__ = ['FaceAnalysis']
+
+class FaceAnalysis:
+    def __init__(self, name=DEFAULT_MP_NAME, root='~/.insightface', allowed_modules=None, **kwargs):
+        onnxruntime.set_default_logger_severity(3)
+        self.models = {}
+        self.model_dir = ensure_available('models', name, root=root)
+        onnx_files = glob.glob(osp.join(self.model_dir, '*.onnx'))
+        onnx_files = sorted(onnx_files)
+        for onnx_file in onnx_files:
+            model = model_zoo.get_model(onnx_file, **kwargs)
+            if model is None:
+                print('model not recognized:', onnx_file)
+            elif allowed_modules is not None and model.taskname not in allowed_modules:
+                print('model ignore:', onnx_file, model.taskname)
+                del model
+            elif model.taskname not in self.models and (allowed_modules is None or model.taskname in allowed_modules):
+                print('find model:', onnx_file, model.taskname, model.input_shape, model.input_mean, model.input_std)
+                self.models[model.taskname] = model
+            else:
+                print('duplicated model task type, ignore:', onnx_file, model.taskname)
+                del model
+        assert 'detection' in self.models
+        self.det_model = self.models['detection']
+
+
+    def prepare(self, ctx_id, det_thresh=0.5, det_size=(640, 640)):
+        self.det_thresh = det_thresh
+        assert det_size is not None
+        print('set det-size:', det_size)
+        self.det_size = det_size
+        for taskname, model in self.models.items():
+            if taskname=='detection':
+                model.prepare(ctx_id, input_size=det_size, det_thresh=det_thresh)
+            else:
+                model.prepare(ctx_id)
+
+    def get(self, img, max_num=0):
+        bboxes, kpss = self.det_model.detect(img,
+                                             max_num=max_num,
+                                             metric='default')
+        if bboxes.shape[0] == 0:
+            return []
+        ret = []
+        for i in range(bboxes.shape[0]):
+            bbox = bboxes[i, 0:4]
+            det_score = bboxes[i, 4]
+            kps = None
+            if kpss is not None:
+                kps = kpss[i]
+            face = Face(bbox=bbox, kps=kps, det_score=det_score)
+            for taskname, model in self.models.items():
+                if taskname=='detection':
+                    continue
+                model.get(img, face)
+            ret.append(face)
+        return ret
+
+    def draw_on(self, img, faces):
+        import cv2
+        dimg = img.copy()
+        for i in range(len(faces)):
+            face = faces[i]
+            box = face.bbox.astype(np.int)
+            color = (0, 0, 255)
+            cv2.rectangle(dimg, (box[0], box[1]), (box[2], box[3]), color, 2)
+            if face.kps is not None:
+                kps = face.kps.astype(np.int)
+                #print(landmark.shape)
+                for l in range(kps.shape[0]):
+                    color = (0, 0, 255)
+                    if l == 0 or l == 3:
+                        color = (0, 255, 0)
+                    cv2.circle(dimg, (kps[l][0], kps[l][1]), 1, color,
+                               2)
+            if face.gender is not None and face.age is not None:
+                cv2.putText(dimg,'%s,%d'%(face.sex,face.age), (box[0]-1, box[1]-4),cv2.FONT_HERSHEY_COMPLEX,0.7,(0,255,0),1)
+
+            #for key, value in face.items():
+            #    if key.startswith('landmark_3d'):
+            #        print(key, value.shape)
+            #        print(value[0:10,:])
+            #        lmk = np.round(value).astype(np.int)
+            #        for l in range(lmk.shape[0]):
+            #            color = (255, 0, 0)
+            #            cv2.circle(dimg, (lmk[l][0], lmk[l][1]), 1, color,
+            #                       2)
+        return dimg
+
--- a/lib_unprompted/insightface_cuda/app/mask_renderer.py
+++ b/lib_unprompted/insightface_cuda/app/mask_renderer.py
@ -0,0 +1,232 @@
+import os, sys, datetime
+import numpy as np
+import os.path as osp
+import albumentations as A
+from albumentations.core.transforms_interface import ImageOnlyTransform
+from .face_analysis import FaceAnalysis
+from ..utils import get_model_dir
+from ..thirdparty import face3d
+from ..data import get_image as ins_get_image
+from ..utils import DEFAULT_MP_NAME
+import cv2
+
+class MaskRenderer:
+    def __init__(self, name=DEFAULT_MP_NAME, root='~/.insightface', insfa=None):
+        #if insfa is None, enter render_only mode
+        self.mp_name = name
+        self.root = root
+        self.insfa = insfa
+        model_dir = get_model_dir(name, root)
+        bfm_file = osp.join(model_dir, 'BFM.mat')
+        assert osp.exists(bfm_file), 'should contains BFM.mat in your model directory'
+        self.bfm = face3d.morphable_model.MorphabelModel(bfm_file)
+        self.index_ind = self.bfm.kpt_ind
+        bfm_uv_file = osp.join(model_dir, 'BFM_UV.mat')
+        assert osp.exists(bfm_uv_file), 'should contains BFM_UV.mat in your model directory'
+        uv_coords = face3d.morphable_model.load.load_uv_coords(bfm_uv_file)
+        self.uv_size = (224,224)
+        self.mask_stxr =  0.1
+        self.mask_styr = 0.33
+        self.mask_etxr = 0.9
+        self.mask_etyr =  0.7
+        self.tex_h , self.tex_w, self.tex_c = self.uv_size[1] , self.uv_size[0],3
+        texcoord = np.zeros_like(uv_coords)
+        texcoord[:, 0] = uv_coords[:, 0] * (self.tex_h - 1)
+        texcoord[:, 1] = uv_coords[:, 1] * (self.tex_w - 1)
+        texcoord[:, 1] = self.tex_w - texcoord[:, 1] - 1
+        self.texcoord = np.hstack((texcoord, np.zeros((texcoord.shape[0], 1))))
+        self.X_ind = self.bfm.kpt_ind
+        self.mask_image_names = ['mask_white', 'mask_blue', 'mask_black', 'mask_green']
+        self.mask_aug_probs = [0.4, 0.4, 0.1, 0.1]
+        #self.mask_images = []
+        #self.mask_images_rgb = []
+        #for image_name in mask_image_names:
+        #    mask_image = ins_get_image(image_name)
+        #    self.mask_images.append(mask_image)
+        #    mask_image_rgb = mask_image[:,:,::-1]
+        #    self.mask_images_rgb.append(mask_image_rgb)
+
+
+    def prepare(self, ctx_id=0, det_thresh=0.5, det_size=(128, 128)):
+        self.pre_ctx_id = ctx_id
+        self.pre_det_thresh = det_thresh
+        self.pre_det_size = det_size
+
+    def transform(self, shape3D, R):
+        s = 1.0
+        shape3D[:2, :] = shape3D[:2, :]
+        shape3D = s * np.dot(R, shape3D)
+        return shape3D
+
+    def preprocess(self, vertices, w, h):
+        R1 = face3d.mesh.transform.angle2matrix([0, 180, 180])
+        t = np.array([-w // 2, -h // 2, 0])
+        vertices = vertices.T
+        vertices += t
+        vertices = self.transform(vertices.T, R1).T
+        return vertices
+
+    def project_to_2d(self,vertices,s,angles,t):
+        transformed_vertices = self.bfm.transform(vertices, s, angles, t)
+        projected_vertices = transformed_vertices.copy() # using stantard camera & orth projection
+        return projected_vertices[self.bfm.kpt_ind, :2]
+
+    def params_to_vertices(self,params  , H , W):
+        fitted_sp, fitted_ep, fitted_s, fitted_angles, fitted_t  = params
+        fitted_vertices = self.bfm.generate_vertices(fitted_sp, fitted_ep)
+        transformed_vertices = self.bfm.transform(fitted_vertices, fitted_s, fitted_angles,
+                                                  fitted_t)
+        transformed_vertices = self.preprocess(transformed_vertices.T, W, H)
+        image_vertices = face3d.mesh.transform.to_image(transformed_vertices, H, W)
+        return image_vertices
+
+    def draw_lmk(self, face_image):
+        faces = self.insfa.get(face_image, max_num=1)
+        if len(faces)==0:
+            return face_image
+        return self.insfa.draw_on(face_image, faces)
+
+    def build_params(self, face_image):
+        #landmark = self.if3d68_handler.get(face_image)
+        #if landmark is None:
+        #    return None #face not found
+        if self.insfa is None:
+            self.insfa = FaceAnalysis(name=self.mp_name, root=self.root, allowed_modules=['detection', 'landmark_3d_68'])
+            self.insfa.prepare(ctx_id=self.pre_ctx_id,  det_thresh=self.pre_det_thresh, det_size=self.pre_det_size)
+
+        faces = self.insfa.get(face_image, max_num=1)
+        if len(faces)==0:
+            return None
+        landmark = faces[0].landmark_3d_68[:,:2]
+        fitted_sp, fitted_ep, fitted_s, fitted_angles, fitted_t = self.bfm.fit(landmark, self.X_ind, max_iter = 3)
+        return [fitted_sp, fitted_ep, fitted_s, fitted_angles, fitted_t]
+
+    def generate_mask_uv(self,mask, positions):
+        uv_size = (self.uv_size[1], self.uv_size[0], 3)
+        h, w, c = uv_size
+        uv = np.zeros(shape=(self.uv_size[1],self.uv_size[0], 3), dtype=np.uint8)
+        stxr, styr  = positions[0], positions[1]
+        etxr, etyr = positions[2], positions[3]
+        stx, sty = int(w * stxr), int(h * styr)
+        etx, ety = int(w * etxr), int(h * etyr)
+        height = ety - sty
+        width = etx - stx
+        mask = cv2.resize(mask, (width, height))
+        uv[sty:ety, stx:etx] = mask
+        return uv
+
+    def render_mask(self,face_image, mask_image, params, input_is_rgb=False, auto_blend = True, positions=[0.1, 0.33, 0.9, 0.7]):
+        if isinstance(mask_image, str):
+            to_rgb = True if input_is_rgb else False
+            mask_image = ins_get_image(mask_image, to_rgb=to_rgb)
+        uv_mask_image = self.generate_mask_uv(mask_image, positions)
+        h,w,c = face_image.shape
+        image_vertices = self.params_to_vertices(params ,h,w)
+        output = (1-face3d.mesh.render.render_texture(image_vertices, self.bfm.full_triangles , uv_mask_image, self.texcoord, self.bfm.full_triangles, h , w ))*255
+        output = output.astype(np.uint8)
+        if auto_blend:
+            mask_bd = (output==255).astype(np.uint8)
+            final = face_image*mask_bd + (1-mask_bd)*output
+            return final
+        return output
+
+    #def mask_augmentation(self, face_image, label, input_is_rgb=False, p=0.1):
+    #    if np.random.random()<p:
+    #        assert isinstance(label, (list, np.ndarray)), 'make sure the rec dataset includes mask params'
+    #        assert len(label)==237 or len(lable)==235, 'make sure the rec dataset includes mask params'
+    #        if len(label)==237:
+    #            if label[1]<0.0: #invalid label for mask aug
+    #                return face_image
+    #            label = label[2:]
+    #        params = self.decode_params(label)
+    #        mask_image_name = np.random.choice(self.mask_image_names, p=self.mask_aug_probs)
+    #        pos = np.random.uniform(0.33, 0.5)
+    #        face_image = self.render_mask(face_image, mask_image_name, params, input_is_rgb=input_is_rgb, positions=[0.1, pos, 0.9, 0.7])
+    #    return face_image
+
+    @staticmethod
+    def encode_params(params):
+        p0 = list(params[0])
+        p1 = list(params[1])
+        p2 = [float(params[2])]
+        p3 = list(params[3])
+        p4 = list(params[4])
+        return p0+p1+p2+p3+p4
+
+    @staticmethod
+    def decode_params(params):
+        p0 = params[0:199]
+        p0 = np.array(p0, dtype=np.float32).reshape( (-1, 1))
+        p1 = params[199:228]
+        p1 = np.array(p1, dtype=np.float32).reshape( (-1, 1))
+        p2 = params[228]
+        p3 = tuple(params[229:232])
+        p4 = params[232:235]
+        p4 = np.array(p4, dtype=np.float32).reshape( (-1, 1))
+        return p0, p1, p2, p3, p4
+    
+class MaskAugmentation(ImageOnlyTransform):
+
+    def __init__(
+            self,
+            mask_names=['mask_white', 'mask_blue', 'mask_black', 'mask_green'],
+            mask_probs=[0.4,0.4,0.1,0.1],
+            h_low = 0.33,
+            h_high = 0.35,
+            always_apply=False,
+            p=1.0,
+            ):
+        super(MaskAugmentation, self).__init__(always_apply, p)
+        self.renderer = MaskRenderer()
+        assert len(mask_names)>0
+        assert len(mask_names)==len(mask_probs)
+        self.mask_names = mask_names
+        self.mask_probs = mask_probs
+        self.h_low = h_low
+        self.h_high = h_high
+        #self.hlabel = None
+
+
+    def apply(self, image, hlabel, mask_name, h_pos, **params):
+        #print(params.keys())
+        #hlabel = params.get('hlabel')
+        assert len(hlabel)==237 or len(hlabel)==235, 'make sure the rec dataset includes mask params'
+        if len(hlabel)==237:
+            if hlabel[1]<0.0:
+                return image
+            hlabel = hlabel[2:]
+        #print(len(hlabel))
+        mask_params = self.renderer.decode_params(hlabel)
+        image = self.renderer.render_mask(image, mask_name, mask_params, input_is_rgb=True, positions=[0.1, h_pos, 0.9, 0.7])
+        return image
+
+    @property
+    def targets_as_params(self):
+        return ["image", "hlabel"]
+
+    def get_params_dependent_on_targets(self, params):
+        hlabel = params['hlabel']
+        mask_name = np.random.choice(self.mask_names, p=self.mask_probs)
+        h_pos = np.random.uniform(self.h_low, self.h_high)
+        return {'hlabel': hlabel, 'mask_name': mask_name, 'h_pos': h_pos}
+
+    def get_transform_init_args_names(self):
+        #return ("hlabel", 'mask_names', 'mask_probs', 'h_low', 'h_high')
+        return ('mask_names', 'mask_probs', 'h_low', 'h_high')
+
+
+if __name__ == "__main__":
+    tool = MaskRenderer('antelope')
+    tool.prepare(det_size=(128,128))
+    image = cv2.imread("Tom_Hanks_54745.png")
+    params = tool.build_params(image)
+    #out = tool.draw_lmk(image)
+    #cv2.imwrite('output_lmk.jpg', out)
+    #mask_image  = cv2.imread("masks/mask1.jpg")
+    #mask_image  = cv2.imread("masks/black-mask.png")
+    #mask_image  = cv2.imread("masks/mask2.jpg")
+    mask_out = tool.render_mask(image, 'mask_blue', params)# use single thread to test the time cost
+
+    cv2.imwrite('output_mask.jpg', mask_out)
+
+
--- a/lib_unprompted/insightface_cuda/commands/init.py
+++ b/lib_unprompted/insightface_cuda/commands/init.py
@ -0,0 +1,13 @@
+from abc import ABC, abstractmethod
+from argparse import ArgumentParser
+
+
+class BaseInsightFaceCLICommand(ABC):
+    @staticmethod
+    @abstractmethod
+    def register_subcommand(parser: ArgumentParser):
+        raise NotImplementedError()
+
+    @abstractmethod
+    def run(self):
+        raise NotImplementedError()
--- a/lib_unprompted/insightface_cuda/commands/insightface_cli.py
+++ b/lib_unprompted/insightface_cuda/commands/insightface_cli.py
@ -0,0 +1,29 @@
+#!/usr/bin/env python
+
+from argparse import ArgumentParser
+
+from .model_download import ModelDownloadCommand
+from .rec_add_mask_param import RecAddMaskParamCommand
+
+def main():
+    parser = ArgumentParser("InsightFace CLI tool", usage="insightface-cli <command> [<args>]")
+    commands_parser = parser.add_subparsers(help="insightface-cli command-line helpers")
+
+    # Register commands
+    ModelDownloadCommand.register_subcommand(commands_parser)
+    RecAddMaskParamCommand.register_subcommand(commands_parser)
+
+    args = parser.parse_args()
+
+    if not hasattr(args, "func"):
+        parser.print_help()
+        exit(1)
+
+    # Run
+    service = args.func(args)
+    service.run()
+
+
+if __name__ == "__main__":
+    main()
+
--- a/lib_unprompted/insightface_cuda/commands/model_download.py
+++ b/lib_unprompted/insightface_cuda/commands/model_download.py
@ -0,0 +1,36 @@
+from argparse import ArgumentParser
+
+from . import BaseInsightFaceCLICommand
+import os
+import os.path as osp
+import zipfile
+import glob
+from ..utils import download
+
+
+def model_download_command_factory(args):
+    return ModelDownloadCommand(args.model, args.root, args.force)
+
+
+class ModelDownloadCommand(BaseInsightFaceCLICommand):
+    #_url_format = '{repo_url}models/{file_name}.zip'
+    @staticmethod
+    def register_subcommand(parser: ArgumentParser):
+        download_parser = parser.add_parser("model.download")
+        download_parser.add_argument(
+            "--root", type=str, default='~/.insightface', help="Path to location to store the models"
+        )
+        download_parser.add_argument(
+            "--force", action="store_true", help="Force the model to be download even if already in root-dir"
+        )
+        download_parser.add_argument("model", type=str, help="Name of the model to download")
+        download_parser.set_defaults(func=model_download_command_factory)
+
+    def __init__(self, model: str, root: str, force: bool):
+        self._model = model
+        self._root = root
+        self._force = force
+
+    def run(self):
+        download('models', self._model, force=self._force, root=self._root)
+
--- a/lib_unprompted/insightface_cuda/commands/rec_add_mask_param.py
+++ b/lib_unprompted/insightface_cuda/commands/rec_add_mask_param.py
@ -0,0 +1,94 @@
+
+import numbers
+import os
+from argparse import ArgumentParser, Namespace
+
+import mxnet as mx
+import numpy as np
+
+from ..app import MaskRenderer
+from ..data.rec_builder import RecBuilder
+from . import BaseInsightFaceCLICommand
+
+
+def rec_add_mask_param_command_factory(args: Namespace):
+
+    return RecAddMaskParamCommand(
+        args.input, args.output
+    )
+
+
+class RecAddMaskParamCommand(BaseInsightFaceCLICommand):
+    @staticmethod
+    def register_subcommand(parser: ArgumentParser):
+        _parser = parser.add_parser("rec.addmaskparam")
+        _parser.add_argument("input", type=str, help="input rec")
+        _parser.add_argument("output", type=str, help="output rec, with mask param")
+        _parser.set_defaults(func=rec_add_mask_param_command_factory)
+
+    def __init__(
+        self,
+        input: str,
+        output: str,
+    ):
+        self._input = input
+        self._output = output
+
+
+    def run(self):
+        tool = MaskRenderer()
+        tool.prepare(ctx_id=0, det_size=(128,128))
+        root_dir = self._input
+        path_imgrec = os.path.join(root_dir, 'train.rec')
+        path_imgidx = os.path.join(root_dir, 'train.idx')
+        imgrec = mx.recordio.MXIndexedRecordIO(path_imgidx, path_imgrec, 'r')
+        save_path = self._output
+        wrec=RecBuilder(path=save_path)
+        s = imgrec.read_idx(0)
+        header, _ = mx.recordio.unpack(s)
+        if header.flag > 0:
+            if len(header.label)==2:
+                imgidx = np.array(range(1, int(header.label[0])))
+            else:
+                imgidx = np.array(list(self.imgrec.keys))
+        else:
+            imgidx = np.array(list(self.imgrec.keys))
+        stat = [0, 0]
+        print('total:', len(imgidx))
+        for iid, idx in enumerate(imgidx):
+            #if iid==500000:
+            #    break
+            if iid%1000==0:
+                print('processing:', iid)
+            s = imgrec.read_idx(idx)
+            header, img = mx.recordio.unpack(s)
+            label = header.label
+            if not isinstance(label, numbers.Number):
+                label = label[0]
+            sample = mx.image.imdecode(img).asnumpy()
+            bgr = sample[:,:,::-1]
+            params = tool.build_params(bgr)
+            #if iid<10:
+            #    mask_out = tool.render_mask(bgr, 'mask_blue', params)
+            #    cv2.imwrite('maskout_%d.jpg'%iid, mask_out)
+            stat[1] += 1
+            if params is None:
+                wlabel = [label] + [-1.0]*236
+                stat[0] += 1
+            else:
+                #print(0, params[0].shape, params[0].dtype)
+                #print(1, params[1].shape, params[1].dtype)
+                #print(2, params[2])
+                #print(3, len(params[3]), params[3][0].__class__)
+                #print(4, params[4].shape, params[4].dtype)
+                mask_label = tool.encode_params(params)
+                wlabel = [label, 0.0]+mask_label # 237 including idlabel, total mask params size is 235
+                if iid==0:
+                    print('param size:', len(mask_label), len(wlabel), label)
+            assert len(wlabel)==237
+            wrec.add_image(img, wlabel)
+            #print(len(params))
+
+        wrec.close()
+        print('finished on', self._output, ', failed:', stat[0])
+
--- a/lib_unprompted/insightface_cuda/data/init.py
+++ b/lib_unprompted/insightface_cuda/data/init.py
@ -0,0 +1,2 @@
+from .image import get_image
+from .pickle_object import get_object
--- a/lib_unprompted/insightface_cuda/data/image.py
+++ b/lib_unprompted/insightface_cuda/data/image.py
@ -0,0 +1,27 @@
+import cv2
+import os
+import os.path as osp
+from pathlib import Path
+
+class ImageCache:
+    data = {}
+
+def get_image(name, to_rgb=False):
+    key = (name, to_rgb)
+    if key in ImageCache.data:
+        return ImageCache.data[key]
+    images_dir = osp.join(Path(__file__).parent.absolute(), 'images')
+    ext_names = ['.jpg', '.png', '.jpeg']
+    image_file = None
+    for ext_name in ext_names:
+        _image_file = osp.join(images_dir, "%s%s"%(name, ext_name))
+        if osp.exists(_image_file):
+            image_file = _image_file
+            break
+    assert image_file is not None, '%s not found'%name
+    img = cv2.imread(image_file)
+    if to_rgb:
+        img = img[:,:,::-1]
+    ImageCache.data[key] = img
+    return img
+
--- a/lib_unprompted/insightface_cuda/data/images/Tom_Hanks_54745.png
+++ b/lib_unprompted/insightface_cuda/data/images/Tom_Hanks_54745.png
--- a/lib_unprompted/insightface_cuda/data/images/mask_black.jpg
+++ b/lib_unprompted/insightface_cuda/data/images/mask_black.jpg
--- a/lib_unprompted/insightface_cuda/data/images/mask_blue.jpg
+++ b/lib_unprompted/insightface_cuda/data/images/mask_blue.jpg
--- a/lib_unprompted/insightface_cuda/data/images/mask_green.jpg
+++ b/lib_unprompted/insightface_cuda/data/images/mask_green.jpg
--- a/lib_unprompted/insightface_cuda/data/images/mask_white.jpg
+++ b/lib_unprompted/insightface_cuda/data/images/mask_white.jpg
--- a/lib_unprompted/insightface_cuda/data/images/t1.jpg
+++ b/lib_unprompted/insightface_cuda/data/images/t1.jpg
--- a/lib_unprompted/insightface_cuda/data/objects/meanshape_68.pkl
+++ b/lib_unprompted/insightface_cuda/data/objects/meanshape_68.pkl
--- a/lib_unprompted/insightface_cuda/data/pickle_object.py
+++ b/lib_unprompted/insightface_cuda/data/pickle_object.py
@ -0,0 +1,17 @@
+import cv2
+import os
+import os.path as osp
+from pathlib import Path
+import pickle
+
+def get_object(name):
+    objects_dir = osp.join(Path(__file__).parent.absolute(), 'objects')
+    if not name.endswith('.pkl'):
+        name = name+".pkl"
+    filepath = osp.join(objects_dir, name)
+    if not osp.exists(filepath):
+        return None
+    with open(filepath, 'rb') as f:
+        obj = pickle.load(f)
+    return obj
+
--- a/lib_unprompted/insightface_cuda/data/rec_builder.py
+++ b/lib_unprompted/insightface_cuda/data/rec_builder.py
@ -0,0 +1,71 @@
+import pickle
+import numpy as np
+import os
+import os.path as osp
+import sys
+import mxnet as mx
+
+
+class RecBuilder():
+    def __init__(self, path, image_size=(112, 112)):
+        self.path = path
+        self.image_size = image_size
+        self.widx = 0
+        self.wlabel = 0
+        self.max_label = -1
+        assert not osp.exists(path), '%s exists' % path
+        os.makedirs(path)
+        self.writer = mx.recordio.MXIndexedRecordIO(os.path.join(path, 'train.idx'), 
+                                                    os.path.join(path, 'train.rec'),
+                                                    'w')
+        self.meta = []
+
+    def add(self, imgs):
+        #!!! img should be BGR!!!!
+        #assert label >= 0
+        #assert label > self.last_label
+        assert len(imgs) > 0
+        label = self.wlabel
+        for img in imgs:
+            idx = self.widx
+            image_meta = {'image_index': idx, 'image_classes': [label]}
+            header = mx.recordio.IRHeader(0, label, idx, 0)
+            if isinstance(img, np.ndarray):
+                s = mx.recordio.pack_img(header,img,quality=95,img_fmt='.jpg')
+            else:
+                s = mx.recordio.pack(header, img)
+            self.writer.write_idx(idx, s)
+            self.meta.append(image_meta)
+            self.widx += 1
+        self.max_label = label
+        self.wlabel += 1
+
+
+    def add_image(self, img, label):
+        #!!! img should be BGR!!!!
+        #assert label >= 0
+        #assert label > self.last_label
+        idx = self.widx
+        header = mx.recordio.IRHeader(0, label, idx, 0)
+        if isinstance(label, list):
+            idlabel = label[0]
+        else:
+            idlabel = label
+        image_meta = {'image_index': idx, 'image_classes': [idlabel]}
+        if isinstance(img, np.ndarray):
+            s = mx.recordio.pack_img(header,img,quality=95,img_fmt='.jpg')
+        else:
+            s = mx.recordio.pack(header, img)
+        self.writer.write_idx(idx, s)
+        self.meta.append(image_meta)
+        self.widx += 1
+        self.max_label = max(self.max_label, idlabel)
+
+    def close(self):
+        with open(osp.join(self.path, 'train.meta'), 'wb') as pfile:
+            pickle.dump(self.meta, pfile, protocol=pickle.HIGHEST_PROTOCOL)
+        print('stat:', self.widx, self.wlabel)
+        with open(os.path.join(self.path, 'property'), 'w') as f:
+            f.write("%d,%d,%d\n" % (self.max_label+1, self.image_size[0], self.image_size[1]))
+            f.write("%d\n" % (self.widx))
+
--- a/lib_unprompted/insightface_cuda/model_zoo/init.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/init.py
@ -0,0 +1,6 @@
+from .model_zoo import get_model
+from .arcface_onnx import ArcFaceONNX
+from .retinaface import RetinaFace
+from .scrfd import SCRFD
+from .landmark import Landmark
+from .attribute import Attribute
--- a/lib_unprompted/insightface_cuda/model_zoo/arcface_onnx.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/arcface_onnx.py
@ -0,0 +1,89 @@
+# -*- coding: utf-8 -*-
+# @Organization  : insightface.ai
+# @Author        : Jia Guo
+# @Time          : 2021-05-04
+# @Function      :
+
+from __future__ import division
+import numpy as np
+import cv2
+import onnx
+import onnxruntime
+from ..utils import face_align
+
+__all__ = [
+    'ArcFaceONNX',
+]
+
+
+class ArcFaceONNX:
+	def __init__(self, model_file=None, session=None):
+		assert model_file is not None
+		self.model_file = model_file
+		self.session = session
+		self.taskname = 'recognition'
+		find_sub = False
+		find_mul = False
+		model = onnx.load(self.model_file)
+		graph = model.graph
+		for nid, node in enumerate(graph.node[:8]):
+			#print(nid, node.name)
+			if node.name.startswith('Sub') or node.name.startswith('_minus'):
+				find_sub = True
+			if node.name.startswith('Mul') or node.name.startswith('_mul'):
+				find_mul = True
+		if find_sub and find_mul:
+			#mxnet arcface model
+			input_mean = 0.0
+			input_std = 1.0
+		else:
+			input_mean = 127.5
+			input_std = 127.5
+		self.input_mean = input_mean
+		self.input_std = input_std
+		#print('input mean and std:', self.input_mean, self.input_std)
+		if self.session is None:
+			self.session = onnxruntime.InferenceSession(self.model_file, None)
+		input_cfg = self.session.get_inputs()[0]
+		input_shape = input_cfg.shape
+		input_name = input_cfg.name
+		self.input_size = tuple(input_shape[2:4][::-1])
+		self.input_shape = input_shape
+		outputs = self.session.get_outputs()
+		output_names = []
+		for out in outputs:
+			output_names.append(out.name)
+		self.input_name = input_name
+		self.output_names = output_names
+		assert len(self.output_names) == 1
+		self.output_shape = outputs[0].shape
+
+	def prepare(self, ctx_id, **kwargs):
+		if ctx_id < 0:
+			self.session.set_providers(['CUDAExecutionProvider'])
+
+	def get(self, img, face):
+		aimg = face_align.norm_crop(img, landmark=face.kps, image_size=self.input_size[0])
+		face.embedding = self.get_feat(aimg).flatten()
+		return face.embedding
+
+	def compute_sim(self, feat1, feat2):
+		from numpy.linalg import norm
+		feat1 = feat1.ravel()
+		feat2 = feat2.ravel()
+		sim = np.dot(feat1, feat2) / (norm(feat1) * norm(feat2))
+		return sim
+
+	def get_feat(self, imgs):
+		if not isinstance(imgs, list):
+			imgs = [imgs]
+		input_size = self.input_size
+
+		blob = cv2.dnn.blobFromImages(imgs, 1.0 / self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
+		net_out = self.session.run(self.output_names, {self.input_name: blob})[0]
+		return net_out
+
+	def forward(self, batch_data):
+		blob = (batch_data - self.input_mean) / self.input_std
+		net_out = self.session.run(self.output_names, {self.input_name: blob})[0]
+		return net_out
--- a/lib_unprompted/insightface_cuda/model_zoo/attribute.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/attribute.py
@ -0,0 +1,92 @@
+# -*- coding: utf-8 -*-
+# @Organization  : insightface.ai
+# @Author        : Jia Guo
+# @Time          : 2021-06-19
+# @Function      :
+
+from __future__ import division
+import numpy as np
+import cv2
+import onnx
+import onnxruntime
+from ..utils import face_align
+
+__all__ = [
+    'Attribute',
+]
+
+
+class Attribute:
+	def __init__(self, model_file=None, session=None):
+		assert model_file is not None
+		self.model_file = model_file
+		self.session = session
+		find_sub = False
+		find_mul = False
+		model = onnx.load(self.model_file)
+		graph = model.graph
+		for nid, node in enumerate(graph.node[:8]):
+			#print(nid, node.name)
+			if node.name.startswith('Sub') or node.name.startswith('_minus'):
+				find_sub = True
+			if node.name.startswith('Mul') or node.name.startswith('_mul'):
+				find_mul = True
+			if nid < 3 and node.name == 'bn_data':
+				find_sub = True
+				find_mul = True
+		if find_sub and find_mul:
+			#mxnet arcface model
+			input_mean = 0.0
+			input_std = 1.0
+		else:
+			input_mean = 127.5
+			input_std = 128.0
+		self.input_mean = input_mean
+		self.input_std = input_std
+		#print('input mean and std:', model_file, self.input_mean, self.input_std)
+		if self.session is None:
+			self.session = onnxruntime.InferenceSession(self.model_file, None)
+		input_cfg = self.session.get_inputs()[0]
+		input_shape = input_cfg.shape
+		input_name = input_cfg.name
+		self.input_size = tuple(input_shape[2:4][::-1])
+		self.input_shape = input_shape
+		outputs = self.session.get_outputs()
+		output_names = []
+		for out in outputs:
+			output_names.append(out.name)
+		self.input_name = input_name
+		self.output_names = output_names
+		assert len(self.output_names) == 1
+		output_shape = outputs[0].shape
+		#print('init output_shape:', output_shape)
+		if output_shape[1] == 3:
+			self.taskname = 'genderage'
+		else:
+			self.taskname = 'attribute_%d' % output_shape[1]
+
+	def prepare(self, ctx_id, **kwargs):
+		if ctx_id < 0:
+			self.session.set_providers(['CUDAExecutionProvider'])
+
+	def get(self, img, face):
+		bbox = face.bbox
+		w, h = (bbox[2] - bbox[0]), (bbox[3] - bbox[1])
+		center = (bbox[2] + bbox[0]) / 2, (bbox[3] + bbox[1]) / 2
+		rotate = 0
+		_scale = self.input_size[0] / (max(w, h) * 1.5)
+		#print('param:', img.shape, bbox, center, self.input_size, _scale, rotate)
+		aimg, M = face_align.transform(img, center, self.input_size[0], _scale, rotate)
+		input_size = tuple(aimg.shape[0:2][::-1])
+		#assert input_size==self.input_size
+		blob = cv2.dnn.blobFromImage(aimg, 1.0 / self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
+		pred = self.session.run(self.output_names, {self.input_name: blob})[0][0]
+		if self.taskname == 'genderage':
+			assert len(pred) == 3
+			gender = np.argmax(pred[:2])
+			age = int(np.round(pred[2] * 100))
+			face['gender'] = gender
+			face['age'] = age
+			return gender, age
+		else:
+			return pred
--- a/lib_unprompted/insightface_cuda/model_zoo/inswapper.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/inswapper.py
@ -0,0 +1,105 @@
+import time
+import numpy as np
+import onnxruntime
+import cv2
+import onnx
+from onnx import numpy_helper
+from ..utils import face_align
+
+
+
+
+class INSwapper():
+    def __init__(self, model_file=None, session=None):
+        self.model_file = model_file
+        self.session = session
+        model = onnx.load(self.model_file)
+        graph = model.graph
+        self.emap = numpy_helper.to_array(graph.initializer[-1])
+        self.input_mean = 0.0
+        self.input_std = 255.0
+        #print('input mean and std:', model_file, self.input_mean, self.input_std)
+        if self.session is None:
+            self.session = onnxruntime.InferenceSession(self.model_file, None)
+        inputs = self.session.get_inputs()
+        self.input_names = []
+        for inp in inputs:
+            self.input_names.append(inp.name)
+        outputs = self.session.get_outputs()
+        output_names = []
+        for out in outputs:
+            output_names.append(out.name)
+        self.output_names = output_names
+        assert len(self.output_names)==1
+        output_shape = outputs[0].shape
+        input_cfg = inputs[0]
+        input_shape = input_cfg.shape
+        self.input_shape = input_shape
+        print('inswapper-shape:', self.input_shape)
+        self.input_size = tuple(input_shape[2:4][::-1])
+
+    def forward(self, img, latent):
+        img = (img - self.input_mean) / self.input_std
+        pred = self.session.run(self.output_names, {self.input_names[0]: img, self.input_names[1]: latent})[0]
+        return pred
+
+    def get(self, img, target_face, source_face, paste_back=True):
+        aimg, M = face_align.norm_crop2(img, target_face.kps, self.input_size[0])
+        blob = cv2.dnn.blobFromImage(aimg, 1.0 / self.input_std, self.input_size,
+                                      (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
+        latent = source_face.normed_embedding.reshape((1,-1))
+        latent = np.dot(latent, self.emap)
+        latent /= np.linalg.norm(latent)
+        pred = self.session.run(self.output_names, {self.input_names[0]: blob, self.input_names[1]: latent})[0]
+        #print(latent.shape, latent.dtype, pred.shape)
+        img_fake = pred.transpose((0,2,3,1))[0]
+        bgr_fake = np.clip(255 * img_fake, 0, 255).astype(np.uint8)[:,:,::-1]
+        if not paste_back:
+            return bgr_fake, M
+        else:
+            target_img = img
+            fake_diff = bgr_fake.astype(np.float32) - aimg.astype(np.float32)
+            fake_diff = np.abs(fake_diff).mean(axis=2)
+            fake_diff[:2,:] = 0
+            fake_diff[-2:,:] = 0
+            fake_diff[:,:2] = 0
+            fake_diff[:,-2:] = 0
+            IM = cv2.invertAffineTransform(M)
+            img_white = np.full((aimg.shape[0],aimg.shape[1]), 255, dtype=np.float32)
+            bgr_fake = cv2.warpAffine(bgr_fake, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
+            img_white = cv2.warpAffine(img_white, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
+            fake_diff = cv2.warpAffine(fake_diff, IM, (target_img.shape[1], target_img.shape[0]), borderValue=0.0)
+            img_white[img_white>20] = 255
+            fthresh = 10
+            fake_diff[fake_diff<fthresh] = 0
+            fake_diff[fake_diff>=fthresh] = 255
+            img_mask = img_white
+            mask_h_inds, mask_w_inds = np.where(img_mask==255)
+            mask_h = np.max(mask_h_inds) - np.min(mask_h_inds)
+            mask_w = np.max(mask_w_inds) - np.min(mask_w_inds)
+            mask_size = int(np.sqrt(mask_h*mask_w))
+            k = max(mask_size//10, 10)
+            #k = max(mask_size//20, 6)
+            #k = 6
+            kernel = np.ones((k,k),np.uint8)
+            img_mask = cv2.erode(img_mask,kernel,iterations = 1)
+            kernel = np.ones((2,2),np.uint8)
+            fake_diff = cv2.dilate(fake_diff,kernel,iterations = 1)
+            k = max(mask_size//20, 5)
+            #k = 3
+            #k = 3
+            kernel_size = (k, k)
+            blur_size = tuple(2*i+1 for i in kernel_size)
+            img_mask = cv2.GaussianBlur(img_mask, blur_size, 0)
+            k = 5
+            kernel_size = (k, k)
+            blur_size = tuple(2*i+1 for i in kernel_size)
+            fake_diff = cv2.GaussianBlur(fake_diff, blur_size, 0)
+            img_mask /= 255
+            fake_diff /= 255
+            #img_mask = fake_diff
+            img_mask = np.reshape(img_mask, [img_mask.shape[0],img_mask.shape[1],1])
+            fake_merged = img_mask * bgr_fake + (1-img_mask) * target_img.astype(np.float32)
+            fake_merged = fake_merged.astype(np.uint8)
+            return fake_merged
+
--- a/lib_unprompted/insightface_cuda/model_zoo/landmark.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/landmark.py
@ -0,0 +1,112 @@
+# -*- coding: utf-8 -*-
+# @Organization  : insightface.ai
+# @Author        : Jia Guo
+# @Time          : 2021-05-04
+# @Function      :
+
+from __future__ import division
+import numpy as np
+import cv2
+import onnx
+import onnxruntime
+from ..utils import face_align
+from ..utils import transform
+from ..data import get_object
+
+__all__ = [
+    'Landmark',
+]
+
+
+class Landmark:
+	def __init__(self, model_file=None, session=None):
+		assert model_file is not None
+		self.model_file = model_file
+		self.session = session
+		find_sub = False
+		find_mul = False
+		model = onnx.load(self.model_file)
+		graph = model.graph
+		for nid, node in enumerate(graph.node[:8]):
+			#print(nid, node.name)
+			if node.name.startswith('Sub') or node.name.startswith('_minus'):
+				find_sub = True
+			if node.name.startswith('Mul') or node.name.startswith('_mul'):
+				find_mul = True
+			if nid < 3 and node.name == 'bn_data':
+				find_sub = True
+				find_mul = True
+		if find_sub and find_mul:
+			#mxnet arcface model
+			input_mean = 0.0
+			input_std = 1.0
+		else:
+			input_mean = 127.5
+			input_std = 128.0
+		self.input_mean = input_mean
+		self.input_std = input_std
+		#print('input mean and std:', model_file, self.input_mean, self.input_std)
+		if self.session is None:
+			self.session = onnxruntime.InferenceSession(self.model_file, None)
+		input_cfg = self.session.get_inputs()[0]
+		input_shape = input_cfg.shape
+		input_name = input_cfg.name
+		self.input_size = tuple(input_shape[2:4][::-1])
+		self.input_shape = input_shape
+		outputs = self.session.get_outputs()
+		output_names = []
+		for out in outputs:
+			output_names.append(out.name)
+		self.input_name = input_name
+		self.output_names = output_names
+		assert len(self.output_names) == 1
+		output_shape = outputs[0].shape
+		self.require_pose = False
+		#print('init output_shape:', output_shape)
+		if output_shape[1] == 3309:
+			self.lmk_dim = 3
+			self.lmk_num = 68
+			self.mean_lmk = get_object('meanshape_68.pkl')
+			self.require_pose = True
+		else:
+			self.lmk_dim = 2
+			self.lmk_num = output_shape[1] // self.lmk_dim
+		self.taskname = 'landmark_%dd_%d' % (self.lmk_dim, self.lmk_num)
+
+	def prepare(self, ctx_id, **kwargs):
+		if ctx_id < 0:
+			self.session.set_providers(['CUDAExecutionProvider'])
+
+	def get(self, img, face):
+		bbox = face.bbox
+		w, h = (bbox[2] - bbox[0]), (bbox[3] - bbox[1])
+		center = (bbox[2] + bbox[0]) / 2, (bbox[3] + bbox[1]) / 2
+		rotate = 0
+		_scale = self.input_size[0] / (max(w, h) * 1.5)
+		#print('param:', img.shape, bbox, center, self.input_size, _scale, rotate)
+		aimg, M = face_align.transform(img, center, self.input_size[0], _scale, rotate)
+		input_size = tuple(aimg.shape[0:2][::-1])
+		#assert input_size==self.input_size
+		blob = cv2.dnn.blobFromImage(aimg, 1.0 / self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
+		pred = self.session.run(self.output_names, {self.input_name: blob})[0][0]
+		if pred.shape[0] >= 3000:
+			pred = pred.reshape((-1, 3))
+		else:
+			pred = pred.reshape((-1, 2))
+		if self.lmk_num < pred.shape[0]:
+			pred = pred[self.lmk_num * -1:, :]
+		pred[:, 0:2] += 1
+		pred[:, 0:2] *= (self.input_size[0] // 2)
+		if pred.shape[1] == 3:
+			pred[:, 2] *= (self.input_size[0] // 2)
+
+		IM = cv2.invertAffineTransform(M)
+		pred = face_align.trans_points(pred, IM)
+		face[self.taskname] = pred
+		if self.require_pose:
+			P = transform.estimate_affine_matrix_3d23d(self.mean_lmk, pred)
+			s, R, t = transform.P2sRt(P)
+			rx, ry, rz = transform.matrix2angle(R)
+			pose = np.array([rx, ry, rz], dtype=np.float32)
+			face['pose'] = pose  #pitch, yaw, roll
+		return pred
--- a/lib_unprompted/insightface_cuda/model_zoo/model_store.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/model_store.py
@ -0,0 +1,103 @@
+"""
+This code file mainly comes from https://github.com/dmlc/gluon-cv/blob/master/gluoncv/model_zoo/model_store.py
+"""
+from __future__ import print_function
+
+__all__ = ['get_model_file']
+import os
+import zipfile
+import glob
+
+from ..utils import download, check_sha1
+
+_model_sha1 = {
+    name: checksum
+    for checksum, name in [
+        ('95be21b58e29e9c1237f229dae534bd854009ce0', 'arcface_r100_v1'),
+        ('', 'arcface_mfn_v1'),
+        ('39fd1e087a2a2ed70a154ac01fecaa86c315d01b', 'retinaface_r50_v1'),
+        ('2c9de8116d1f448fd1d4661f90308faae34c990a', 'retinaface_mnet025_v1'),
+        ('0db1d07921d005e6c9a5b38e059452fc5645e5a4', 'retinaface_mnet025_v2'),
+        ('7dd8111652b7aac2490c5dcddeb268e53ac643e6', 'genderage_v1'),
+    ]
+}
+
+base_repo_url = 'https://insightface.ai/files/'
+_url_format = '{repo_url}models/{file_name}.zip'
+
+
+def short_hash(name):
+    if name not in _model_sha1:
+        raise ValueError(
+            'Pretrained model for {name} is not available.'.format(name=name))
+    return _model_sha1[name][:8]
+
+
+def find_params_file(dir_path):
+    if not os.path.exists(dir_path):
+        return None
+    paths = glob.glob("%s/*.params" % dir_path)
+    if len(paths) == 0:
+        return None
+    paths = sorted(paths)
+    return paths[-1]
+
+
+def get_model_file(name, root=os.path.join('~', '.insightface', 'models')):
+    r"""Return location for the pretrained on local file system.
+
+    This function will download from online model zoo when model cannot be found or has mismatch.
+    The root directory will be created if it doesn't exist.
+
+    Parameters
+    ----------
+    name : str
+        Name of the model.
+    root : str, default '~/.mxnet/models'
+        Location for keeping the model parameters.
+
+    Returns
+    -------
+    file_path
+        Path to the requested pretrained model file.
+    """
+
+    file_name = name
+    root = os.path.expanduser(root)
+    dir_path = os.path.join(root, name)
+    file_path = find_params_file(dir_path)
+    #file_path = os.path.join(root, file_name + '.params')
+    sha1_hash = _model_sha1[name]
+    if file_path is not None:
+        if check_sha1(file_path, sha1_hash):
+            return file_path
+        else:
+            print(
+                'Mismatch in the content of model file detected. Downloading again.'
+            )
+    else:
+        print('Model file is not found. Downloading.')
+
+    if not os.path.exists(root):
+        os.makedirs(root)
+    if not os.path.exists(dir_path):
+        os.makedirs(dir_path)
+
+    zip_file_path = os.path.join(root, file_name + '.zip')
+    repo_url = base_repo_url
+    if repo_url[-1] != '/':
+        repo_url = repo_url + '/'
+    download(_url_format.format(repo_url=repo_url, file_name=file_name),
+             path=zip_file_path,
+             overwrite=True)
+    with zipfile.ZipFile(zip_file_path) as zf:
+        zf.extractall(dir_path)
+    os.remove(zip_file_path)
+    file_path = find_params_file(dir_path)
+
+    if check_sha1(file_path, sha1_hash):
+        return file_path
+    else:
+        raise ValueError(
+            'Downloaded file has different hash. Please try again.')
+
--- a/lib_unprompted/insightface_cuda/model_zoo/model_zoo.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/model_zoo.py
@ -0,0 +1,102 @@
+# -*- coding: utf-8 -*-
+# @Organization  : insightface.ai
+# @Author        : Jia Guo
+# @Time          : 2021-05-04
+# @Function      :
+
+import os
+import os.path as osp
+import glob
+import onnxruntime
+from .arcface_onnx import *
+from .retinaface import *
+#from .scrfd import *
+from .landmark import *
+from .attribute import Attribute
+from .inswapper import INSwapper
+from ..utils import download_onnx
+
+__all__ = ['get_model']
+
+
+class PickableInferenceSession(onnxruntime.InferenceSession):
+	# This is a wrapper to make the current InferenceSession class pickable.
+	def __init__(self, model_path, **kwargs):
+		super().__init__(model_path, **kwargs)
+		self.model_path = model_path
+
+	def __getstate__(self):
+		return {'model_path': self.model_path}
+
+	def __setstate__(self, values):
+		model_path = values['model_path']
+		self.__init__(model_path)
+
+
+class ModelRouter:
+	def __init__(self, onnx_file):
+		self.onnx_file = onnx_file
+
+	def get_model(self, **kwargs):
+		session = PickableInferenceSession(self.onnx_file, **kwargs)
+		print(f'Applied providers: {session._providers}, with options: {session._provider_options}')
+		inputs = session.get_inputs()
+		input_cfg = inputs[0]
+		input_shape = input_cfg.shape
+		outputs = session.get_outputs()
+
+		if len(outputs) >= 5:
+			return RetinaFace(model_file=self.onnx_file, session=session)
+		elif input_shape[2] == 192 and input_shape[3] == 192:
+			return Landmark(model_file=self.onnx_file, session=session)
+		elif input_shape[2] == 96 and input_shape[3] == 96:
+			return Attribute(model_file=self.onnx_file, session=session)
+		elif len(inputs) == 2 and input_shape[2] == 128 and input_shape[3] == 128:
+			return INSwapper(model_file=self.onnx_file, session=session)
+		elif input_shape[2] == input_shape[3] and input_shape[2] >= 112 and input_shape[2] % 16 == 0:
+			return ArcFaceONNX(model_file=self.onnx_file, session=session)
+		else:
+			#raise RuntimeError('error on model routing')
+			return None
+
+
+def find_onnx_file(dir_path):
+	if not os.path.exists(dir_path):
+		return None
+	paths = glob.glob("%s/*.onnx" % dir_path)
+	if len(paths) == 0:
+		return None
+	paths = sorted(paths)
+	return paths[-1]
+
+
+def get_default_providers():
+	return ['CUDAExecutionProvider', 'CUDAExecutionProvider']
+
+
+def get_default_provider_options():
+	return None
+
+
+def get_model(name, **kwargs):
+	root = kwargs.get('root', '~/.insightface')
+	root = os.path.expanduser(root)
+	model_root = osp.join(root, 'models')
+	allow_download = kwargs.get('download', False)
+	download_zip = kwargs.get('download_zip', False)
+	if not name.endswith('.onnx'):
+		model_dir = os.path.join(model_root, name)
+		model_file = find_onnx_file(model_dir)
+		if model_file is None:
+			return None
+	else:
+		model_file = name
+	if not osp.exists(model_file) and allow_download:
+		model_file = download_onnx('models', model_file, root=root, download_zip=download_zip)
+	assert osp.exists(model_file), 'model_file %s should exist' % model_file
+	assert osp.isfile(model_file), 'model_file %s should be a file' % model_file
+	router = ModelRouter(model_file)
+	providers = kwargs.get('providers', get_default_providers())
+	provider_options = kwargs.get('provider_options', get_default_provider_options())
+	model = router.get_model(providers=providers, provider_options=provider_options)
+	return model
--- a/lib_unprompted/insightface_cuda/model_zoo/retinaface.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/retinaface.py
@ -0,0 +1,299 @@
+# -*- coding: utf-8 -*-
+# @Organization  : insightface.ai
+# @Author        : Jia Guo
+# @Time          : 2021-09-18
+# @Function      :
+
+from __future__ import division
+import datetime
+import numpy as np
+import onnx
+import onnxruntime
+import os
+import os.path as osp
+import cv2
+import sys
+
+
+def softmax(z):
+	assert len(z.shape) == 2
+	s = np.max(z, axis=1)
+	s = s[:, np.newaxis]  # necessary step to do broadcasting
+	e_x = np.exp(z - s)
+	div = np.sum(e_x, axis=1)
+	div = div[:, np.newaxis]  # dito
+	return e_x / div
+
+
+def distance2bbox(points, distance, max_shape=None):
+	"""Decode distance prediction to bounding box.
+
+    Args:
+        points (Tensor): Shape (n, 2), [x, y].
+        distance (Tensor): Distance from the given point to 4
+            boundaries (left, top, right, bottom).
+        max_shape (tuple): Shape of the image.
+
+    Returns:
+        Tensor: Decoded bboxes.
+    """
+	x1 = points[:, 0] - distance[:, 0]
+	y1 = points[:, 1] - distance[:, 1]
+	x2 = points[:, 0] + distance[:, 2]
+	y2 = points[:, 1] + distance[:, 3]
+	if max_shape is not None:
+		x1 = x1.clamp(min=0, max=max_shape[1])
+		y1 = y1.clamp(min=0, max=max_shape[0])
+		x2 = x2.clamp(min=0, max=max_shape[1])
+		y2 = y2.clamp(min=0, max=max_shape[0])
+	return np.stack([x1, y1, x2, y2], axis=-1)
+
+
+def distance2kps(points, distance, max_shape=None):
+	"""Decode distance prediction to bounding box.
+
+    Args:
+        points (Tensor): Shape (n, 2), [x, y].
+        distance (Tensor): Distance from the given point to 4
+            boundaries (left, top, right, bottom).
+        max_shape (tuple): Shape of the image.
+
+    Returns:
+        Tensor: Decoded bboxes.
+    """
+	preds = []
+	for i in range(0, distance.shape[1], 2):
+		px = points[:, i % 2] + distance[:, i]
+		py = points[:, i % 2 + 1] + distance[:, i + 1]
+		if max_shape is not None:
+			px = px.clamp(min=0, max=max_shape[1])
+			py = py.clamp(min=0, max=max_shape[0])
+		preds.append(px)
+		preds.append(py)
+	return np.stack(preds, axis=-1)
+
+
+class RetinaFace:
+	def __init__(self, model_file=None, session=None):
+		import onnxruntime
+		self.model_file = model_file
+		self.session = session
+		self.taskname = 'detection'
+		if self.session is None:
+			assert self.model_file is not None
+			assert osp.exists(self.model_file)
+			self.session = onnxruntime.InferenceSession(self.model_file, None)
+		self.center_cache = {}
+		self.nms_thresh = 0.4
+		self.det_thresh = 0.5
+		self._init_vars()
+
+	def _init_vars(self):
+		input_cfg = self.session.get_inputs()[0]
+		input_shape = input_cfg.shape
+		#print(input_shape)
+		if isinstance(input_shape[2], str):
+			self.input_size = None
+		else:
+			self.input_size = tuple(input_shape[2:4][::-1])
+		#print('image_size:', self.image_size)
+		input_name = input_cfg.name
+		self.input_shape = input_shape
+		outputs = self.session.get_outputs()
+		output_names = []
+		for o in outputs:
+			output_names.append(o.name)
+		self.input_name = input_name
+		self.output_names = output_names
+		self.input_mean = 127.5
+		self.input_std = 128.0
+		#print(self.output_names)
+		#assert len(outputs)==10 or len(outputs)==15
+		self.use_kps = False
+		self._anchor_ratio = 1.0
+		self._num_anchors = 1
+		if len(outputs) == 6:
+			self.fmc = 3
+			self._feat_stride_fpn = [8, 16, 32]
+			self._num_anchors = 2
+		elif len(outputs) == 9:
+			self.fmc = 3
+			self._feat_stride_fpn = [8, 16, 32]
+			self._num_anchors = 2
+			self.use_kps = True
+		elif len(outputs) == 10:
+			self.fmc = 5
+			self._feat_stride_fpn = [8, 16, 32, 64, 128]
+			self._num_anchors = 1
+		elif len(outputs) == 15:
+			self.fmc = 5
+			self._feat_stride_fpn = [8, 16, 32, 64, 128]
+			self._num_anchors = 1
+			self.use_kps = True
+
+	def prepare(self, ctx_id, **kwargs):
+		if ctx_id < 0:
+			self.session.set_providers(['CUDAExecutionProvider'])
+		nms_thresh = kwargs.get('nms_thresh', None)
+		if nms_thresh is not None:
+			self.nms_thresh = nms_thresh
+		det_thresh = kwargs.get('det_thresh', None)
+		if det_thresh is not None:
+			self.det_thresh = det_thresh
+		input_size = kwargs.get('input_size', None)
+		if input_size is not None:
+			if self.input_size is not None:
+				print('warning: det_size is already set in detection model, ignore')
+			else:
+				self.input_size = input_size
+
+	def forward(self, img, threshold):
+		scores_list = []
+		bboxes_list = []
+		kpss_list = []
+		input_size = tuple(img.shape[0:2][::-1])
+		blob = cv2.dnn.blobFromImage(img, 1.0 / self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
+		net_outs = self.session.run(self.output_names, {self.input_name: blob})
+
+		input_height = blob.shape[2]
+		input_width = blob.shape[3]
+		fmc = self.fmc
+		for idx, stride in enumerate(self._feat_stride_fpn):
+			scores = net_outs[idx]
+			bbox_preds = net_outs[idx + fmc]
+			bbox_preds = bbox_preds * stride
+			if self.use_kps:
+				kps_preds = net_outs[idx + fmc * 2] * stride
+			height = input_height // stride
+			width = input_width // stride
+			K = height * width
+			key = (height, width, stride)
+			if key in self.center_cache:
+				anchor_centers = self.center_cache[key]
+			else:
+				#solution-1, c style:
+				#anchor_centers = np.zeros( (height, width, 2), dtype=np.float32 )
+				#for i in range(height):
+				#    anchor_centers[i, :, 1] = i
+				#for i in range(width):
+				#    anchor_centers[:, i, 0] = i
+
+				#solution-2:
+				#ax = np.arange(width, dtype=np.float32)
+				#ay = np.arange(height, dtype=np.float32)
+				#xv, yv = np.meshgrid(np.arange(width), np.arange(height))
+				#anchor_centers = np.stack([xv, yv], axis=-1).astype(np.float32)
+
+				#solution-3:
+				anchor_centers = np.stack(np.mgrid[:height, :width][::-1], axis=-1).astype(np.float32)
+				#print(anchor_centers.shape)
+
+				anchor_centers = (anchor_centers * stride).reshape((-1, 2))
+				if self._num_anchors > 1:
+					anchor_centers = np.stack([anchor_centers] * self._num_anchors, axis=1).reshape((-1, 2))
+				if len(self.center_cache) < 100:
+					self.center_cache[key] = anchor_centers
+
+			pos_inds = np.where(scores >= threshold)[0]
+			bboxes = distance2bbox(anchor_centers, bbox_preds)
+			pos_scores = scores[pos_inds]
+			pos_bboxes = bboxes[pos_inds]
+			scores_list.append(pos_scores)
+			bboxes_list.append(pos_bboxes)
+			if self.use_kps:
+				kpss = distance2kps(anchor_centers, kps_preds)
+				#kpss = kps_preds
+				kpss = kpss.reshape((kpss.shape[0], -1, 2))
+				pos_kpss = kpss[pos_inds]
+				kpss_list.append(pos_kpss)
+		return scores_list, bboxes_list, kpss_list
+
+	def detect(self, img, input_size=None, max_num=0, metric='default'):
+		assert input_size is not None or self.input_size is not None
+		input_size = self.input_size if input_size is None else input_size
+
+		im_ratio = float(img.shape[0]) / img.shape[1]
+		model_ratio = float(input_size[1]) / input_size[0]
+		if im_ratio > model_ratio:
+			new_height = input_size[1]
+			new_width = int(new_height / im_ratio)
+		else:
+			new_width = input_size[0]
+			new_height = int(new_width * im_ratio)
+		det_scale = float(new_height) / img.shape[0]
+		resized_img = cv2.resize(img, (new_width, new_height))
+		det_img = np.zeros((input_size[1], input_size[0], 3), dtype=np.uint8)
+		det_img[:new_height, :new_width, :] = resized_img
+
+		scores_list, bboxes_list, kpss_list = self.forward(det_img, self.det_thresh)
+
+		scores = np.vstack(scores_list)
+		scores_ravel = scores.ravel()
+		order = scores_ravel.argsort()[::-1]
+		bboxes = np.vstack(bboxes_list) / det_scale
+		if self.use_kps:
+			kpss = np.vstack(kpss_list) / det_scale
+		pre_det = np.hstack((bboxes, scores)).astype(np.float32, copy=False)
+		pre_det = pre_det[order, :]
+		keep = self.nms(pre_det)
+		det = pre_det[keep, :]
+		if self.use_kps:
+			kpss = kpss[order, :, :]
+			kpss = kpss[keep, :, :]
+		else:
+			kpss = None
+		if max_num > 0 and det.shape[0] > max_num:
+			area = (det[:, 2] - det[:, 0]) * (det[:, 3] - det[:, 1])
+			img_center = img.shape[0] // 2, img.shape[1] // 2
+			offsets = np.vstack([(det[:, 0] + det[:, 2]) / 2 - img_center[1], (det[:, 1] + det[:, 3]) / 2 - img_center[0]])
+			offset_dist_squared = np.sum(np.power(offsets, 2.0), 0)
+			if metric == 'max':
+				values = area
+			else:
+				values = area - offset_dist_squared * 2.0  # some extra weight on the centering
+			bindex = np.argsort(values)[::-1]  # some extra weight on the centering
+			bindex = bindex[0:max_num]
+			det = det[bindex, :]
+			if kpss is not None:
+				kpss = kpss[bindex, :]
+		return det, kpss
+
+	def nms(self, dets):
+		thresh = self.nms_thresh
+		x1 = dets[:, 0]
+		y1 = dets[:, 1]
+		x2 = dets[:, 2]
+		y2 = dets[:, 3]
+		scores = dets[:, 4]
+
+		areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+		order = scores.argsort()[::-1]
+
+		keep = []
+		while order.size > 0:
+			i = order[0]
+			keep.append(i)
+			xx1 = np.maximum(x1[i], x1[order[1:]])
+			yy1 = np.maximum(y1[i], y1[order[1:]])
+			xx2 = np.minimum(x2[i], x2[order[1:]])
+			yy2 = np.minimum(y2[i], y2[order[1:]])
+
+			w = np.maximum(0.0, xx2 - xx1 + 1)
+			h = np.maximum(0.0, yy2 - yy1 + 1)
+			inter = w * h
+			ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+			inds = np.where(ovr <= thresh)[0]
+			order = order[inds + 1]
+
+		return keep
+
+
+def get_retinaface(name, download=False, root='~/.insightface/models', **kwargs):
+	if not download:
+		assert os.path.exists(name)
+		return RetinaFace(name)
+	else:
+		from .model_store import get_model_file
+		_file = get_model_file("retinaface_%s" % name, root=root)
+		return retinaface(_file)
--- a/lib_unprompted/insightface_cuda/model_zoo/scrfd.py
+++ b/lib_unprompted/insightface_cuda/model_zoo/scrfd.py
@ -0,0 +1,347 @@
+# -*- coding: utf-8 -*-
+# @Organization  : insightface.ai
+# @Author        : Jia Guo
+# @Time          : 2021-05-04
+# @Function      :
+
+from __future__ import division
+import datetime
+import numpy as np
+import onnx
+import onnxruntime
+import os
+import os.path as osp
+import cv2
+import sys
+
+
+def softmax(z):
+	assert len(z.shape) == 2
+	s = np.max(z, axis=1)
+	s = s[:, np.newaxis]  # necessary step to do broadcasting
+	e_x = np.exp(z - s)
+	div = np.sum(e_x, axis=1)
+	div = div[:, np.newaxis]  # dito
+	return e_x / div
+
+
+def distance2bbox(points, distance, max_shape=None):
+	"""Decode distance prediction to bounding box.
+
+    Args:
+        points (Tensor): Shape (n, 2), [x, y].
+        distance (Tensor): Distance from the given point to 4
+            boundaries (left, top, right, bottom).
+        max_shape (tuple): Shape of the image.
+
+    Returns:
+        Tensor: Decoded bboxes.
+    """
+	x1 = points[:, 0] - distance[:, 0]
+	y1 = points[:, 1] - distance[:, 1]
+	x2 = points[:, 0] + distance[:, 2]
+	y2 = points[:, 1] + distance[:, 3]
+	if max_shape is not None:
+		x1 = x1.clamp(min=0, max=max_shape[1])
+		y1 = y1.clamp(min=0, max=max_shape[0])
+		x2 = x2.clamp(min=0, max=max_shape[1])
+		y2 = y2.clamp(min=0, max=max_shape[0])
+	return np.stack([x1, y1, x2, y2], axis=-1)
+
+
+def distance2kps(points, distance, max_shape=None):
+	"""Decode distance prediction to bounding box.
+
+    Args:
+        points (Tensor): Shape (n, 2), [x, y].
+        distance (Tensor): Distance from the given point to 4
+            boundaries (left, top, right, bottom).
+        max_shape (tuple): Shape of the image.
+
+    Returns:
+        Tensor: Decoded bboxes.
+    """
+	preds = []
+	for i in range(0, distance.shape[1], 2):
+		px = points[:, i % 2] + distance[:, i]
+		py = points[:, i % 2 + 1] + distance[:, i + 1]
+		if max_shape is not None:
+			px = px.clamp(min=0, max=max_shape[1])
+			py = py.clamp(min=0, max=max_shape[0])
+		preds.append(px)
+		preds.append(py)
+	return np.stack(preds, axis=-1)
+
+
+class SCRFD:
+	def __init__(self, model_file=None, session=None):
+		import onnxruntime
+		self.model_file = model_file
+		self.session = session
+		self.taskname = 'detection'
+		self.batched = False
+		if self.session is None:
+			assert self.model_file is not None
+			assert osp.exists(self.model_file)
+			self.session = onnxruntime.InferenceSession(self.model_file, None)
+		self.center_cache = {}
+		self.nms_thresh = 0.4
+		self.det_thresh = 0.5
+		self._init_vars()
+
+	def _init_vars(self):
+		input_cfg = self.session.get_inputs()[0]
+		input_shape = input_cfg.shape
+		#print(input_shape)
+		if isinstance(input_shape[2], str):
+			self.input_size = None
+		else:
+			self.input_size = tuple(input_shape[2:4][::-1])
+		#print('image_size:', self.image_size)
+		input_name = input_cfg.name
+		self.input_shape = input_shape
+		outputs = self.session.get_outputs()
+		if len(outputs[0].shape) == 3:
+			self.batched = True
+		output_names = []
+		for o in outputs:
+			output_names.append(o.name)
+		self.input_name = input_name
+		self.output_names = output_names
+		self.input_mean = 127.5
+		self.input_std = 128.0
+		#print(self.output_names)
+		#assert len(outputs)==10 or len(outputs)==15
+		self.use_kps = False
+		self._anchor_ratio = 1.0
+		self._num_anchors = 1
+		if len(outputs) == 6:
+			self.fmc = 3
+			self._feat_stride_fpn = [8, 16, 32]
+			self._num_anchors = 2
+		elif len(outputs) == 9:
+			self.fmc = 3
+			self._feat_stride_fpn = [8, 16, 32]
+			self._num_anchors = 2
+			self.use_kps = True
+		elif len(outputs) == 10:
+			self.fmc = 5
+			self._feat_stride_fpn = [8, 16, 32, 64, 128]
+			self._num_anchors = 1
+		elif len(outputs) == 15:
+			self.fmc = 5
+			self._feat_stride_fpn = [8, 16, 32, 64, 128]
+			self._num_anchors = 1
+			self.use_kps = True
+
+	def prepare(self, ctx_id, **kwargs):
+		if ctx_id < 0:
+			self.session.set_providers(['CUDAExecutionProvider'])
+		nms_thresh = kwargs.get('nms_thresh', None)
+		if nms_thresh is not None:
+			self.nms_thresh = nms_thresh
+		det_thresh = kwargs.get('det_thresh', None)
+		if det_thresh is not None:
+			self.det_thresh = det_thresh
+		input_size = kwargs.get('input_size', None)
+		if input_size is not None:
+			if self.input_size is not None:
+				print('warning: det_size is already set in scrfd model, ignore')
+			else:
+				self.input_size = input_size
+
+	def forward(self, img, threshold):
+		scores_list = []
+		bboxes_list = []
+		kpss_list = []
+		input_size = tuple(img.shape[0:2][::-1])
+		blob = cv2.dnn.blobFromImage(img, 1.0 / self.input_std, input_size, (self.input_mean, self.input_mean, self.input_mean), swapRB=True)
+		net_outs = self.session.run(self.output_names, {self.input_name: blob})
+
+		input_height = blob.shape[2]
+		input_width = blob.shape[3]
+		fmc = self.fmc
+		for idx, stride in enumerate(self._feat_stride_fpn):
+			# If model support batch dim, take first output
+			if self.batched:
+				scores = net_outs[idx][0]
+				bbox_preds = net_outs[idx + fmc][0]
+				bbox_preds = bbox_preds * stride
+				if self.use_kps:
+					kps_preds = net_outs[idx + fmc * 2][0] * stride
+			# If model doesn't support batching take output as is
+			else:
+				scores = net_outs[idx]
+				bbox_preds = net_outs[idx + fmc]
+				bbox_preds = bbox_preds * stride
+				if self.use_kps:
+					kps_preds = net_outs[idx + fmc * 2] * stride
+
+			height = input_height // stride
+			width = input_width // stride
+			K = height * width
+			key = (height, width, stride)
+			if key in self.center_cache:
+				anchor_centers = self.center_cache[key]
+			else:
+				#solution-1, c style:
+				#anchor_centers = np.zeros( (height, width, 2), dtype=np.float32 )
+				#for i in range(height):
+				#    anchor_centers[i, :, 1] = i
+				#for i in range(width):
+				#    anchor_centers[:, i, 0] = i
+
+				#solution-2:
+				#ax = np.arange(width, dtype=np.float32)
+				#ay = np.arange(height, dtype=np.float32)
+				#xv, yv = np.meshgrid(np.arange(width), np.arange(height))
+				#anchor_centers = np.stack([xv, yv], axis=-1).astype(np.float32)
+
+				#solution-3:
+				anchor_centers = np.stack(np.mgrid[:height, :width][::-1], axis=-1).astype(np.float32)
+				#print(anchor_centers.shape)
+
+				anchor_centers = (anchor_centers * stride).reshape((-1, 2))
+				if self._num_anchors > 1:
+					anchor_centers = np.stack([anchor_centers] * self._num_anchors, axis=1).reshape((-1, 2))
+				if len(self.center_cache) < 100:
+					self.center_cache[key] = anchor_centers
+
+			pos_inds = np.where(scores >= threshold)[0]
+			bboxes = distance2bbox(anchor_centers, bbox_preds)
+			pos_scores = scores[pos_inds]
+			pos_bboxes = bboxes[pos_inds]
+			scores_list.append(pos_scores)
+			bboxes_list.append(pos_bboxes)
+			if self.use_kps:
+				kpss = distance2kps(anchor_centers, kps_preds)
+				#kpss = kps_preds
+				kpss = kpss.reshape((kpss.shape[0], -1, 2))
+				pos_kpss = kpss[pos_inds]
+				kpss_list.append(pos_kpss)
+		return scores_list, bboxes_list, kpss_list
+
+	def detect(self, img, input_size=None, max_num=0, metric='default'):
+		assert input_size is not None or self.input_size is not None
+		input_size = self.input_size if input_size is None else input_size
+
+		im_ratio = float(img.shape[0]) / img.shape[1]
+		model_ratio = float(input_size[1]) / input_size[0]
+		if im_ratio > model_ratio:
+			new_height = input_size[1]
+			new_width = int(new_height / im_ratio)
+		else:
+			new_width = input_size[0]
+			new_height = int(new_width * im_ratio)
+		det_scale = float(new_height) / img.shape[0]
+		resized_img = cv2.resize(img, (new_width, new_height))
+		det_img = np.zeros((input_size[1], input_size[0], 3), dtype=np.uint8)
+		det_img[:new_height, :new_width, :] = resized_img
+
+		scores_list, bboxes_list, kpss_list = self.forward(det_img, self.det_thresh)
+
+		scores = np.vstack(scores_list)
+		scores_ravel = scores.ravel()
+		order = scores_ravel.argsort()[::-1]
+		bboxes = np.vstack(bboxes_list) / det_scale
+		if self.use_kps:
+			kpss = np.vstack(kpss_list) / det_scale
+		pre_det = np.hstack((bboxes, scores)).astype(np.float32, copy=False)
+		pre_det = pre_det[order, :]
+		keep = self.nms(pre_det)
+		det = pre_det[keep, :]
+		if self.use_kps:
+			kpss = kpss[order, :, :]
+			kpss = kpss[keep, :, :]
+		else:
+			kpss = None
+		if max_num > 0 and det.shape[0] > max_num:
+			area = (det[:, 2] - det[:, 0]) * (det[:, 3] - det[:, 1])
+			img_center = img.shape[0] // 2, img.shape[1] // 2
+			offsets = np.vstack([(det[:, 0] + det[:, 2]) / 2 - img_center[1], (det[:, 1] + det[:, 3]) / 2 - img_center[0]])
+			offset_dist_squared = np.sum(np.power(offsets, 2.0), 0)
+			if metric == 'max':
+				values = area
+			else:
+				values = area - offset_dist_squared * 2.0  # some extra weight on the centering
+			bindex = np.argsort(values)[::-1]  # some extra weight on the centering
+			bindex = bindex[0:max_num]
+			det = det[bindex, :]
+			if kpss is not None:
+				kpss = kpss[bindex, :]
+		return det, kpss
+
+	def nms(self, dets):
+		thresh = self.nms_thresh
+		x1 = dets[:, 0]
+		y1 = dets[:, 1]
+		x2 = dets[:, 2]
+		y2 = dets[:, 3]
+		scores = dets[:, 4]
+
+		areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+		order = scores.argsort()[::-1]
+
+		keep = []
+		while order.size > 0:
+			i = order[0]
+			keep.append(i)
+			xx1 = np.maximum(x1[i], x1[order[1:]])
+			yy1 = np.maximum(y1[i], y1[order[1:]])
+			xx2 = np.minimum(x2[i], x2[order[1:]])
+			yy2 = np.minimum(y2[i], y2[order[1:]])
+
+			w = np.maximum(0.0, xx2 - xx1 + 1)
+			h = np.maximum(0.0, yy2 - yy1 + 1)
+			inter = w * h
+			ovr = inter / (areas[i] + areas[order[1:]] - inter)
+
+			inds = np.where(ovr <= thresh)[0]
+			order = order[inds + 1]
+
+		return keep
+
+
+def get_scrfd(name, download=False, root='~/.insightface/models', **kwargs):
+	if not download:
+		assert os.path.exists(name)
+		return SCRFD(name)
+	else:
+		from .model_store import get_model_file
+		_file = get_model_file("scrfd_%s" % name, root=root)
+		return SCRFD(_file)
+
+
+def scrfd_2p5gkps(**kwargs):
+	return get_scrfd("2p5gkps", download=True, **kwargs)
+
+
+if __name__ == '__main__':
+	import glob
+	detector = SCRFD(model_file='./det.onnx')
+	detector.prepare(-1)
+	img_paths = ['tests/data/t1.jpg']
+	for img_path in img_paths:
+		img = cv2.imread(img_path)
+
+		for _ in range(1):
+			ta = datetime.datetime.now()
+			#bboxes, kpss = detector.detect(img, 0.5, input_size = (640, 640))
+			bboxes, kpss = detector.detect(img, 0.5)
+			tb = datetime.datetime.now()
+			print('all cost:', (tb - ta).total_seconds() * 1000)
+		print(img_path, bboxes.shape)
+		if kpss is not None:
+			print(kpss.shape)
+		for i in range(bboxes.shape[0]):
+			bbox = bboxes[i]
+			x1, y1, x2, y2, score = bbox.astype(np.int)
+			cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)
+			if kpss is not None:
+				kps = kpss[i]
+				for kp in kps:
+					kp = kp.astype(np.int)
+					cv2.circle(img, tuple(kp), 1, (0, 0, 255), 2)
+		filename = img_path.split('/')[-1]
+		print('output:', filename)
+		cv2.imwrite('./outputs/%s' % filename, img)
--- a/lib_unprompted/insightface_cuda/thirdparty/init.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/init.py
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/init.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/init.py
@ -0,0 +1,4 @@
+#import mesh
+#import morphable_model
+from . import mesh
+from . import morphable_model
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/init.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/init.py
@ -0,0 +1,15 @@
+#from __future__ import absolute_import
+#from cython import mesh_core_cython
+#import io
+#import vis
+#import transform
+#import light
+#import render
+
+# from .cython import mesh_core_cython
+# from . import io
+# from . import vis
+# from . import transform
+# from . import light
+# from . import render
+
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core.cpp
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core.cpp
@ -0,0 +1,375 @@
+/*
+functions that can not be optimazed by vertorization in python.
+1. rasterization.(need process each triangle)
+2. normal of each vertex.(use one-ring, need process each vertex)
+3. write obj(seems that it can be verctorized? anyway, writing it in c++ is simple, so also add function here. --> however, why writting in c++ is still slow?)
+
+Author: Yao Feng 
+Mail: yaofeng1995@gmail.com
+*/
+
+#include "mesh_core.h"
+
+
+/* Judge whether the point is in the triangle
+Method:
+    http://blackpawn.com/texts/pointinpoly/
+Args:
+    point: [x, y] 
+    tri_points: three vertices(2d points) of a triangle. 2 coords x 3 vertices
+Returns:
+    bool: true for in triangle
+*/
+bool isPointInTri(point p, point p0, point p1, point p2)
+{   
+    // vectors
+    point v0, v1, v2;
+    v0 = p2 - p0;
+    v1 = p1 - p0;
+    v2 = p - p0;
+
+    // dot products
+    float dot00 = v0.dot(v0); //v0.x * v0.x + v0.y * v0.y //np.dot(v0.T, v0)
+    float dot01 = v0.dot(v1); //v0.x * v1.x + v0.y * v1.y //np.dot(v0.T, v1)
+    float dot02 = v0.dot(v2); //v0.x * v2.x + v0.y * v2.y //np.dot(v0.T, v2)
+    float dot11 = v1.dot(v1); //v1.x * v1.x + v1.y * v1.y //np.dot(v1.T, v1)
+    float dot12 = v1.dot(v2); //v1.x * v2.x + v1.y * v2.y//np.dot(v1.T, v2)
+
+    // barycentric coordinates
+    float inverDeno;
+    if(dot00*dot11 - dot01*dot01 == 0)
+        inverDeno = 0;
+    else
+        inverDeno = 1/(dot00*dot11 - dot01*dot01);
+
+    float u = (dot11*dot02 - dot01*dot12)*inverDeno;
+    float v = (dot00*dot12 - dot01*dot02)*inverDeno;
+
+    // check if point in triangle
+    return (u >= 0) && (v >= 0) && (u + v < 1);
+}
+
+
+void get_point_weight(float* weight, point p, point p0, point p1, point p2)
+{   
+    // vectors
+    point v0, v1, v2;
+    v0 = p2 - p0; 
+    v1 = p1 - p0; 
+    v2 = p - p0; 
+
+    // dot products
+    float dot00 = v0.dot(v0); //v0.x * v0.x + v0.y * v0.y //np.dot(v0.T, v0)
+    float dot01 = v0.dot(v1); //v0.x * v1.x + v0.y * v1.y //np.dot(v0.T, v1)
+    float dot02 = v0.dot(v2); //v0.x * v2.x + v0.y * v2.y //np.dot(v0.T, v2)
+    float dot11 = v1.dot(v1); //v1.x * v1.x + v1.y * v1.y //np.dot(v1.T, v1)
+    float dot12 = v1.dot(v2); //v1.x * v2.x + v1.y * v2.y//np.dot(v1.T, v2)
+
+    // barycentric coordinates
+    float inverDeno;
+    if(dot00*dot11 - dot01*dot01 == 0)
+        inverDeno = 0;
+    else
+        inverDeno = 1/(dot00*dot11 - dot01*dot01);
+
+    float u = (dot11*dot02 - dot01*dot12)*inverDeno;
+    float v = (dot00*dot12 - dot01*dot02)*inverDeno;
+
+    // weight
+    weight[0] = 1 - u - v;
+    weight[1] = v;
+    weight[2] = u;
+}
+
+
+void _get_normal_core(
+    float* normal, float* tri_normal, int* triangles,
+    int ntri)
+{
+    int i, j;
+    int tri_p0_ind, tri_p1_ind, tri_p2_ind;
+
+    for(i = 0; i < ntri; i++)
+    {
+        tri_p0_ind = triangles[3*i];
+        tri_p1_ind = triangles[3*i + 1];
+        tri_p2_ind = triangles[3*i + 2];
+
+        for(j = 0; j < 3; j++)
+        {
+            normal[3*tri_p0_ind + j] = normal[3*tri_p0_ind + j] + tri_normal[3*i + j];
+            normal[3*tri_p1_ind + j] = normal[3*tri_p1_ind + j] + tri_normal[3*i + j];
+            normal[3*tri_p2_ind + j] = normal[3*tri_p2_ind + j] + tri_normal[3*i + j];
+        }
+    }
+}
+
+
+void _rasterize_triangles_core(
+    float* vertices, int* triangles, 
+    float* depth_buffer, int* triangle_buffer, float* barycentric_weight,
+    int nver, int ntri,
+    int h, int w)
+{
+    int i;
+    int x, y, k;
+    int tri_p0_ind, tri_p1_ind, tri_p2_ind;
+    point p0, p1, p2, p;
+    int x_min, x_max, y_min, y_max;
+    float p_depth, p0_depth, p1_depth, p2_depth;
+    float weight[3];
+
+    for(i = 0; i < ntri; i++)
+    {
+        tri_p0_ind = triangles[3*i];
+        tri_p1_ind = triangles[3*i + 1];
+        tri_p2_ind = triangles[3*i + 2];
+
+        p0.x = vertices[3*tri_p0_ind]; p0.y = vertices[3*tri_p0_ind + 1]; p0_depth = vertices[3*tri_p0_ind + 2];
+        p1.x = vertices[3*tri_p1_ind]; p1.y = vertices[3*tri_p1_ind + 1]; p1_depth = vertices[3*tri_p1_ind + 2];
+        p2.x = vertices[3*tri_p2_ind]; p2.y = vertices[3*tri_p2_ind + 1]; p2_depth = vertices[3*tri_p2_ind + 2];
+        
+        x_min = max((int)ceil(min(p0.x, min(p1.x, p2.x))), 0);
+        x_max = min((int)floor(max(p0.x, max(p1.x, p2.x))), w - 1);
+      
+        y_min = max((int)ceil(min(p0.y, min(p1.y, p2.y))), 0);
+        y_max = min((int)floor(max(p0.y, max(p1.y, p2.y))), h - 1);
+
+        if(x_max < x_min || y_max < y_min)
+        {
+            continue;
+        }
+
+        for(y = y_min; y <= y_max; y++) //h
+        {
+            for(x = x_min; x <= x_max; x++) //w
+            {
+                p.x = x; p.y = y;
+                if(p.x < 2 || p.x > w - 3 || p.y < 2 || p.y > h - 3 || isPointInTri(p, p0, p1, p2))
+                {
+                    get_point_weight(weight, p, p0, p1, p2);
+                    p_depth = weight[0]*p0_depth + weight[1]*p1_depth + weight[2]*p2_depth;
+
+                    if((p_depth > depth_buffer[y*w + x]))
+                    {
+                        depth_buffer[y*w + x] = p_depth;
+                        triangle_buffer[y*w + x] = i;
+                        for(k = 0; k < 3; k++)
+                        {
+                            barycentric_weight[y*w*3 + x*3 + k] = weight[k];
+                        }
+                    }
+                }
+            }
+        }
+    }
+}
+
+
+void _render_colors_core(
+    float* image, float* vertices, int* triangles, 
+    float* colors, 
+    float* depth_buffer,
+    int nver, int ntri,
+    int h, int w, int c)
+{
+    int i;
+    int x, y, k;
+    int tri_p0_ind, tri_p1_ind, tri_p2_ind;
+    point p0, p1, p2, p;
+    int x_min, x_max, y_min, y_max;
+    float p_depth, p0_depth, p1_depth, p2_depth;
+    float p_color, p0_color, p1_color, p2_color;
+    float weight[3];
+
+    for(i = 0; i < ntri; i++)
+    {
+        tri_p0_ind = triangles[3*i];
+        tri_p1_ind = triangles[3*i + 1];
+        tri_p2_ind = triangles[3*i + 2];
+
+        p0.x = vertices[3*tri_p0_ind]; p0.y = vertices[3*tri_p0_ind + 1]; p0_depth = vertices[3*tri_p0_ind + 2];
+        p1.x = vertices[3*tri_p1_ind]; p1.y = vertices[3*tri_p1_ind + 1]; p1_depth = vertices[3*tri_p1_ind + 2];
+        p2.x = vertices[3*tri_p2_ind]; p2.y = vertices[3*tri_p2_ind + 1]; p2_depth = vertices[3*tri_p2_ind + 2];
+        
+        x_min = max((int)ceil(min(p0.x, min(p1.x, p2.x))), 0);
+        x_max = min((int)floor(max(p0.x, max(p1.x, p2.x))), w - 1);
+      
+        y_min = max((int)ceil(min(p0.y, min(p1.y, p2.y))), 0);
+        y_max = min((int)floor(max(p0.y, max(p1.y, p2.y))), h - 1);
+
+        if(x_max < x_min || y_max < y_min)
+        {
+            continue;
+        }
+
+        for(y = y_min; y <= y_max; y++) //h
+        {
+            for(x = x_min; x <= x_max; x++) //w
+            {
+                p.x = x; p.y = y;
+                if(p.x < 2 || p.x > w - 3 || p.y < 2 || p.y > h - 3 || isPointInTri(p, p0, p1, p2))
+                {
+                    get_point_weight(weight, p, p0, p1, p2);
+                    p_depth = weight[0]*p0_depth + weight[1]*p1_depth + weight[2]*p2_depth;
+
+                    if((p_depth > depth_buffer[y*w + x]))
+                    {
+                        for(k = 0; k < c; k++) // c
+                        {   
+                            p0_color = colors[c*tri_p0_ind + k];
+                            p1_color = colors[c*tri_p1_ind + k];
+                            p2_color = colors[c*tri_p2_ind + k]; 
+
+                            p_color = weight[0]*p0_color + weight[1]*p1_color + weight[2]*p2_color;
+                            image[y*w*c + x*c + k] = p_color;
+                        }
+
+                        depth_buffer[y*w + x] = p_depth;
+                    }
+                }
+            }
+        }
+    }
+}
+
+
+void _render_texture_core(
+    float* image, float* vertices, int* triangles, 
+    float* texture, float* tex_coords, int* tex_triangles, 
+    float* depth_buffer,
+    int nver, int tex_nver, int ntri, 
+    int h, int w, int c, 
+    int tex_h, int tex_w, int tex_c, 
+    int mapping_type)
+{
+    int i;
+    int x, y, k;
+    int tri_p0_ind, tri_p1_ind, tri_p2_ind;
+    int tex_tri_p0_ind, tex_tri_p1_ind, tex_tri_p2_ind;
+    point p0, p1, p2, p;
+    point tex_p0, tex_p1, tex_p2, tex_p;
+    int x_min, x_max, y_min, y_max;
+    float weight[3];
+    float p_depth, p0_depth, p1_depth, p2_depth;
+    float xd, yd;
+    float ul, ur, dl, dr;
+    for(i = 0; i < ntri; i++)
+    {
+        // mesh
+        tri_p0_ind = triangles[3*i];
+        tri_p1_ind = triangles[3*i + 1];
+        tri_p2_ind = triangles[3*i + 2];
+
+        p0.x = vertices[3*tri_p0_ind]; p0.y = vertices[3*tri_p0_ind + 1]; p0_depth = vertices[3*tri_p0_ind + 2];
+        p1.x = vertices[3*tri_p1_ind]; p1.y = vertices[3*tri_p1_ind + 1]; p1_depth = vertices[3*tri_p1_ind + 2];
+        p2.x = vertices[3*tri_p2_ind]; p2.y = vertices[3*tri_p2_ind + 1]; p2_depth = vertices[3*tri_p2_ind + 2];
+       
+        // texture
+        tex_tri_p0_ind = tex_triangles[3*i];
+        tex_tri_p1_ind = tex_triangles[3*i + 1];
+        tex_tri_p2_ind = tex_triangles[3*i + 2];
+
+        tex_p0.x = tex_coords[3*tex_tri_p0_ind]; tex_p0.y = tex_coords[3*tri_p0_ind + 1];
+        tex_p1.x = tex_coords[3*tex_tri_p1_ind]; tex_p1.y = tex_coords[3*tri_p1_ind + 1];
+        tex_p2.x = tex_coords[3*tex_tri_p2_ind]; tex_p2.y = tex_coords[3*tri_p2_ind + 1];
+
+
+        x_min = max((int)ceil(min(p0.x, min(p1.x, p2.x))), 0);
+        x_max = min((int)floor(max(p0.x, max(p1.x, p2.x))), w - 1);
+      
+        y_min = max((int)ceil(min(p0.y, min(p1.y, p2.y))), 0);
+        y_max = min((int)floor(max(p0.y, max(p1.y, p2.y))), h - 1);
+
+
+        if(x_max < x_min || y_max < y_min)
+        {
+            continue;
+        }
+
+        for(y = y_min; y <= y_max; y++) //h
+        {
+            for(x = x_min; x <= x_max; x++) //w
+            {
+                p.x = x; p.y = y;
+                if(p.x < 2 || p.x > w - 3 || p.y < 2 || p.y > h - 3 || isPointInTri(p, p0, p1, p2))
+                {
+                    get_point_weight(weight, p, p0, p1, p2);
+                    p_depth = weight[0]*p0_depth + weight[1]*p1_depth + weight[2]*p2_depth;
+                    
+                    if((p_depth > depth_buffer[y*w + x]))
+                    {
+                        // -- color from texture
+                        // cal weight in mesh tri
+                        get_point_weight(weight, p, p0, p1, p2);
+                        // cal coord in texture
+                        tex_p = tex_p0*weight[0] + tex_p1*weight[1] + tex_p2*weight[2];
+                        tex_p.x = max(min(tex_p.x, float(tex_w - 1)), float(0)); 
+                        tex_p.y = max(min(tex_p.y, float(tex_h - 1)), float(0)); 
+
+                        yd = tex_p.y - floor(tex_p.y);
+                        xd = tex_p.x - floor(tex_p.x);
+                        for(k = 0; k < c; k++)
+                        {
+                            if(mapping_type==0)// nearest
+                            {   
+                                image[y*w*c + x*c + k] = texture[int(round(tex_p.y))*tex_w*tex_c + int(round(tex_p.x))*tex_c + k];
+                            }
+                            else//bilinear interp
+                            { 
+                                ul = texture[(int)floor(tex_p.y)*tex_w*tex_c + (int)floor(tex_p.x)*tex_c + k];
+                                ur = texture[(int)floor(tex_p.y)*tex_w*tex_c + (int)ceil(tex_p.x)*tex_c + k];
+                                dl = texture[(int)ceil(tex_p.y)*tex_w*tex_c + (int)floor(tex_p.x)*tex_c + k];
+                                dr = texture[(int)ceil(tex_p.y)*tex_w*tex_c + (int)ceil(tex_p.x)*tex_c + k];
+
+                                image[y*w*c + x*c + k] = ul*(1-xd)*(1-yd) + ur*xd*(1-yd) + dl*(1-xd)*yd + dr*xd*yd;
+                            }
+
+                        }
+
+                        depth_buffer[y*w + x] = p_depth;
+                    } 
+                }
+            }
+        }
+    }
+}
+
+
+
+// ------------------------------------------------- write
+// obj write
+// Ref: https://github.com/patrikhuber/eos/blob/master/include/eos/core/Mesh.hpp
+void _write_obj_with_colors_texture(string filename, string mtl_name, 
+    float* vertices, int* triangles, float* colors, float* uv_coords,
+    int nver, int ntri, int ntexver)
+{
+    int i;
+
+    ofstream obj_file(filename.c_str());
+
+    // first line of the obj file: the mtl name
+    obj_file << "mtllib " << mtl_name << endl;
+    
+    // write vertices 
+    for (i = 0; i < nver; ++i) 
+    {
+        obj_file << "v " << vertices[3*i] << " " << vertices[3*i + 1] << " " << vertices[3*i + 2] << colors[3*i] << " " << colors[3*i + 1] << " " << colors[3*i + 2] <<  endl;
+    }
+
+    // write uv coordinates
+    for (i = 0; i < ntexver; ++i) 
+    {
+        //obj_file << "vt " << uv_coords[2*i] << " " << (1 - uv_coords[2*i + 1]) << endl;
+        obj_file << "vt " << uv_coords[2*i] << " " << uv_coords[2*i + 1] << endl;
+    }
+
+    obj_file << "usemtl FaceTexture" << endl;
+    // write triangles
+    for (i = 0; i < ntri; ++i) 
+    {
+        // obj_file << "f " << triangles[3*i] << "/" << triangles[3*i] << " " << triangles[3*i + 1] << "/" << triangles[3*i + 1] << " " << triangles[3*i + 2] << "/" << triangles[3*i + 2] << endl;
+        obj_file << "f " << triangles[3*i + 2] << "/" << triangles[3*i + 2] << " " << triangles[3*i + 1] << "/" << triangles[3*i + 1] << " " << triangles[3*i] << "/" << triangles[3*i] << endl;
+    }
+
+}
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core.h
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core.h
@ -0,0 +1,83 @@
+#ifndef MESH_CORE_HPP_
+#define MESH_CORE_HPP_
+
+#include <stdio.h>
+#include <cmath>
+#include <algorithm>  
+#include <string>
+#include <iostream>
+#include <fstream>
+
+using namespace std;
+
+class point
+{
+ public:
+    float x;
+    float y;
+
+    float dot(point p)
+    {
+        return this->x * p.x + this->y * p.y;
+    }
+
+    point operator-(const point& p)
+    {
+        point np;
+        np.x = this->x - p.x;
+        np.y = this->y - p.y;
+        return np;
+    }
+
+    point operator+(const point& p)
+    {
+        point np;
+        np.x = this->x + p.x;
+        np.y = this->y + p.y;
+        return np;
+    }
+
+    point operator*(float s)
+    {
+        point np;
+        np.x = s * this->x;
+        np.y = s * this->y;
+        return np;
+    }
+}; 
+
+
+bool isPointInTri(point p, point p0, point p1, point p2, int h, int w);
+void get_point_weight(float* weight, point p, point p0, point p1, point p2);
+
+void _get_normal_core(
+    float* normal, float* tri_normal, int* triangles,
+    int ntri);
+
+void _rasterize_triangles_core(
+    float* vertices, int* triangles, 
+    float* depth_buffer, int* triangle_buffer, float* barycentric_weight,
+    int nver, int ntri,
+    int h, int w);
+
+void _render_colors_core(
+    float* image, float* vertices, int* triangles, 
+    float* colors, 
+    float* depth_buffer,
+    int nver, int ntri,
+    int h, int w, int c);
+
+void _render_texture_core(
+    float* image, float* vertices, int* triangles, 
+    float* texture, float* tex_coords, int* tex_triangles, 
+    float* depth_buffer,
+    int nver, int tex_nver, int ntri, 
+    int h, int w, int c, 
+    int tex_h, int tex_w, int tex_c, 
+    int mapping_type);
+
+void _write_obj_with_colors_texture(string filename, string mtl_name, 
+    float* vertices, int* triangles, float* colors, float* uv_coords,
+    int nver, int ntri, int ntexver);
+
+#endif
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core_cython.c
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core_cython.c
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core_cython.cpp
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core_cython.cpp
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core_cython.pyx
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/mesh_core_cython.pyx
@ -0,0 +1,109 @@
+import numpy as np
+cimport numpy as np
+from libcpp.string cimport string
+
+# use the Numpy-C-API from Cython
+np.import_array()
+
+# cdefine the signature of our c function
+cdef extern from "mesh_core.h":
+    void _rasterize_triangles_core(
+        float* vertices, int* triangles, 
+        float* depth_buffer, int* triangle_buffer, float* barycentric_weight,
+        int nver, int ntri,
+        int h, int w)
+
+    void _render_colors_core(
+        float* image, float* vertices, int* triangles, 
+        float* colors, 
+        float* depth_buffer,
+        int nver, int ntri,
+        int h, int w, int c)
+
+    void _render_texture_core(
+        float* image, float* vertices, int* triangles, 
+        float* texture, float* tex_coords, int* tex_triangles, 
+        float* depth_buffer,
+        int nver, int tex_nver, int ntri, 
+        int h, int w, int c, 
+        int tex_h, int tex_w, int tex_c, 
+        int mapping_type)
+
+    void _get_normal_core(
+        float* normal, float* tri_normal, int* triangles,
+        int ntri)
+
+    void _write_obj_with_colors_texture(string filename, string mtl_name, 
+        float* vertices, int* triangles, float* colors, float* uv_coords,
+        int nver, int ntri, int ntexver)
+
+def get_normal_core(np.ndarray[float, ndim=2, mode = "c"] normal not None, 
+                np.ndarray[float, ndim=2, mode = "c"] tri_normal not None, 
+                np.ndarray[int, ndim=2, mode="c"] triangles not None, 
+                int ntri
+                ):
+    _get_normal_core(
+        <float*> np.PyArray_DATA(normal), <float*> np.PyArray_DATA(tri_normal), <int*> np.PyArray_DATA(triangles),  
+        ntri)
+
+def rasterize_triangles_core(
+                np.ndarray[float, ndim=2, mode = "c"] vertices not None, 
+                np.ndarray[int, ndim=2, mode="c"] triangles not None, 
+                np.ndarray[float, ndim=2, mode = "c"] depth_buffer not None,
+                np.ndarray[int, ndim=2, mode = "c"] triangle_buffer not None,
+                np.ndarray[float, ndim=2, mode = "c"] barycentric_weight not None,
+                int nver, int ntri,
+                int h, int w
+                ):   
+    _rasterize_triangles_core(
+        <float*> np.PyArray_DATA(vertices), <int*> np.PyArray_DATA(triangles),  
+        <float*> np.PyArray_DATA(depth_buffer), <int*> np.PyArray_DATA(triangle_buffer), <float*> np.PyArray_DATA(barycentric_weight),
+        nver, ntri,
+        h, w)
+
+def render_colors_core(np.ndarray[float, ndim=3, mode = "c"] image not None, 
+                np.ndarray[float, ndim=2, mode = "c"] vertices not None, 
+                np.ndarray[int, ndim=2, mode="c"] triangles not None, 
+                np.ndarray[float, ndim=2, mode = "c"] colors not None, 
+                np.ndarray[float, ndim=2, mode = "c"] depth_buffer not None,
+                int nver, int ntri,
+                int h, int w, int c
+                ):   
+    _render_colors_core(
+        <float*> np.PyArray_DATA(image), <float*> np.PyArray_DATA(vertices), <int*> np.PyArray_DATA(triangles),  
+        <float*> np.PyArray_DATA(colors), 
+        <float*> np.PyArray_DATA(depth_buffer),
+        nver, ntri,
+        h, w, c)
+
+def render_texture_core(np.ndarray[float, ndim=3, mode = "c"] image not None, 
+                np.ndarray[float, ndim=2, mode = "c"] vertices not None, 
+                np.ndarray[int, ndim=2, mode="c"] triangles not None, 
+                np.ndarray[float, ndim=3, mode = "c"] texture not None, 
+                np.ndarray[float, ndim=2, mode = "c"] tex_coords not None, 
+                np.ndarray[int, ndim=2, mode="c"] tex_triangles not None, 
+                np.ndarray[float, ndim=2, mode = "c"] depth_buffer not None,
+                int nver, int tex_nver, int ntri,
+                int h, int w, int c,
+                int tex_h, int tex_w, int tex_c,
+                int mapping_type
+                ):   
+    _render_texture_core(
+        <float*> np.PyArray_DATA(image), <float*> np.PyArray_DATA(vertices), <int*> np.PyArray_DATA(triangles),  
+        <float*> np.PyArray_DATA(texture), <float*> np.PyArray_DATA(tex_coords), <int*> np.PyArray_DATA(tex_triangles),  
+        <float*> np.PyArray_DATA(depth_buffer),
+        nver, tex_nver, ntri,
+        h, w, c, 
+        tex_h, tex_w, tex_c, 
+        mapping_type)
+
+def write_obj_with_colors_texture_core(string filename, string mtl_name, 
+                np.ndarray[float, ndim=2, mode = "c"] vertices not None, 
+                np.ndarray[int, ndim=2, mode="c"] triangles not None, 
+                np.ndarray[float, ndim=2, mode = "c"] colors not None, 
+                np.ndarray[float, ndim=2, mode = "c"] uv_coords not None, 
+                int nver, int ntri, int ntexver
+                ):
+    _write_obj_with_colors_texture(filename, mtl_name, 
+        <float*> np.PyArray_DATA(vertices), <int*> np.PyArray_DATA(triangles), <float*> np.PyArray_DATA(colors), <float*> np.PyArray_DATA(uv_coords),
+        nver, ntri, ntexver)
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/setup.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/cython/setup.py
@ -0,0 +1,20 @@
+'''
+python setup.py build_ext -i
+to compile
+'''
+
+# setup.py
+from distutils.core import setup, Extension
+from Cython.Build import cythonize
+from Cython.Distutils import build_ext
+import numpy
+
+setup(
+	name = 'mesh_core_cython',
+    cmdclass={'build_ext': build_ext},
+    ext_modules=[Extension("mesh_core_cython",
+                 sources=["mesh_core_cython.pyx", "mesh_core.cpp"],
+                 language='c++',
+                 include_dirs=[numpy.get_include()])],
+)
+
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/io.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/io.py
@ -0,0 +1,142 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import os
+from skimage import io
+from time import time
+
+from .cython import mesh_core_cython
+
+## TODO
+## TODO: c++ version
+def read_obj(obj_name):
+	''' read mesh
+	'''
+	return 0
+
+# ------------------------- write
+def write_asc(path, vertices):
+    '''
+    Args:
+        vertices: shape = (nver, 3)
+    '''
+    if path.split('.')[-1] == 'asc':
+        np.savetxt(path, vertices)
+    else:
+        np.savetxt(path + '.asc', vertices)
+
+def write_obj_with_colors(obj_name, vertices, triangles, colors):
+    ''' Save 3D face model with texture represented by colors.
+    Args:
+        obj_name: str
+        vertices: shape = (nver, 3)
+        triangles: shape = (ntri, 3)
+        colors: shape = (nver, 3)
+    '''
+    triangles = triangles.copy()
+    triangles += 1 # meshlab start with 1
+    
+    if obj_name.split('.')[-1] != 'obj':
+        obj_name = obj_name + '.obj'
+        
+    # write obj
+    with open(obj_name, 'w') as f:
+        
+        # write vertices & colors
+        for i in range(vertices.shape[0]):
+            # s = 'v {} {} {} \n'.format(vertices[0,i], vertices[1,i], vertices[2,i])
+            s = 'v {} {} {} {} {} {}\n'.format(vertices[i, 0], vertices[i, 1], vertices[i, 2], colors[i, 0], colors[i, 1], colors[i, 2])
+            f.write(s)
+
+        # write f: ver ind/ uv ind
+        [k, ntri] = triangles.shape
+        for i in range(triangles.shape[0]):
+            # s = 'f {} {} {}\n'.format(triangles[i, 0], triangles[i, 1], triangles[i, 2])
+            s = 'f {} {} {}\n'.format(triangles[i, 2], triangles[i, 1], triangles[i, 0])
+            f.write(s)
+
+## TODO: c++ version
+def write_obj_with_texture(obj_name, vertices, triangles, texture, uv_coords):
+    ''' Save 3D face model with texture represented by texture map.
+    Ref: https://github.com/patrikhuber/eos/blob/bd00155ebae4b1a13b08bf5a991694d682abbada/include/eos/core/Mesh.hpp
+    Args:
+        obj_name: str
+        vertices: shape = (nver, 3)
+        triangles: shape = (ntri, 3)
+        texture: shape = (256,256,3)
+        uv_coords: shape = (nver, 3) max value<=1
+    '''
+    if obj_name.split('.')[-1] != 'obj':
+        obj_name = obj_name + '.obj'
+    mtl_name = obj_name.replace('.obj', '.mtl')
+    texture_name = obj_name.replace('.obj', '_texture.png')
+    
+    triangles = triangles.copy()
+    triangles += 1 # mesh lab start with 1
+    
+    # write obj
+    with open(obj_name, 'w') as f:
+        # first line: write mtlib(material library)
+        s = "mtllib {}\n".format(os.path.abspath(mtl_name))
+        f.write(s)
+
+        # write vertices
+        for i in range(vertices.shape[0]):
+            s = 'v {} {} {}\n'.format(vertices[i, 0], vertices[i, 1], vertices[i, 2])
+            f.write(s)
+        
+        # write uv coords
+        for i in range(uv_coords.shape[0]):
+            s = 'vt {} {}\n'.format(uv_coords[i,0], 1 - uv_coords[i,1])
+            f.write(s)
+
+        f.write("usemtl FaceTexture\n")
+
+        # write f: ver ind/ uv ind
+        for i in range(triangles.shape[0]):
+            s = 'f {}/{} {}/{} {}/{}\n'.format(triangles[i,2], triangles[i,2], triangles[i,1], triangles[i,1], triangles[i,0], triangles[i,0])
+            f.write(s)
+
+    # write mtl
+    with open(mtl_name, 'w') as f:
+        f.write("newmtl FaceTexture\n")
+        s = 'map_Kd {}\n'.format(os.path.abspath(texture_name)) # map to image
+        f.write(s)
+
+    # write texture as png
+    imsave(texture_name, texture)
+
+# c++ version
+def write_obj_with_colors_texture(obj_name, vertices, triangles, colors, texture, uv_coords):
+    ''' Save 3D face model with texture. 
+    Ref: https://github.com/patrikhuber/eos/blob/bd00155ebae4b1a13b08bf5a991694d682abbada/include/eos/core/Mesh.hpp
+    Args:
+        obj_name: str
+        vertices: shape = (nver, 3)
+        triangles: shape = (ntri, 3)
+        colors: shape = (nver, 3)
+        texture: shape = (256,256,3)
+        uv_coords: shape = (nver, 3) max value<=1
+    '''
+    if obj_name.split('.')[-1] != 'obj':
+        obj_name = obj_name + '.obj'
+    mtl_name = obj_name.replace('.obj', '.mtl')
+    texture_name = obj_name.replace('.obj', '_texture.png')
+    
+    triangles = triangles.copy()
+    triangles += 1 # mesh lab start with 1
+    
+    # write obj
+    vertices, colors, uv_coords = vertices.astype(np.float32).copy(), colors.astype(np.float32).copy(), uv_coords.astype(np.float32).copy()
+    mesh_core_cython.write_obj_with_colors_texture_core(str.encode(obj_name), str.encode(os.path.abspath(mtl_name)), vertices, triangles, colors, uv_coords, vertices.shape[0], triangles.shape[0], uv_coords.shape[0])
+   
+    # write mtl
+    with open(mtl_name, 'w') as f:
+        f.write("newmtl FaceTexture\n")
+        s = 'map_Kd {}\n'.format(os.path.abspath(texture_name)) # map to image
+        f.write(s)
+
+    # write texture as png
+    io.imsave(texture_name, texture)
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/light.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/light.py
@ -0,0 +1,213 @@
+'''
+Functions about lighting mesh(changing colors/texture of mesh).
+1. add light to colors/texture (shade each vertex)
+2. fit light according to colors/texture & image.
+'''
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+from .cython import mesh_core_cython
+
+def get_normal(vertices, triangles):
+    ''' calculate normal direction in each vertex
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3]
+    Returns:
+        normal: [nver, 3]
+    '''
+    pt0 = vertices[triangles[:, 0], :] # [ntri, 3]
+    pt1 = vertices[triangles[:, 1], :] # [ntri, 3]
+    pt2 = vertices[triangles[:, 2], :] # [ntri, 3]
+    tri_normal = np.cross(pt0 - pt1, pt0 - pt2) # [ntri, 3]. normal of each triangle
+
+    normal = np.zeros_like(vertices, dtype = np.float32).copy() # [nver, 3]
+    # for i in range(triangles.shape[0]):
+    #     normal[triangles[i, 0], :] = normal[triangles[i, 0], :] + tri_normal[i, :]
+    #     normal[triangles[i, 1], :] = normal[triangles[i, 1], :] + tri_normal[i, :]
+    #     normal[triangles[i, 2], :] = normal[triangles[i, 2], :] + tri_normal[i, :]
+    mesh_core_cython.get_normal_core(normal, tri_normal.astype(np.float32).copy(), triangles.copy(), triangles.shape[0])
+
+    # normalize to unit length
+    mag = np.sum(normal**2, 1) # [nver]
+    zero_ind = (mag == 0)
+    mag[zero_ind] = 1;
+    normal[zero_ind, 0] = np.ones((np.sum(zero_ind)))
+
+    normal = normal/np.sqrt(mag[:,np.newaxis])
+
+    return normal
+
+# TODO: test
+def add_light_sh(vertices, triangles, colors, sh_coeff):
+    ''' 
+    In 3d face, usually assume:
+    1. The surface of face is Lambertian(reflect only the low frequencies of lighting)
+    2. Lighting can be an arbitrary combination of point sources
+    --> can be expressed in terms of spherical harmonics(omit the lighting coefficients)
+    I = albedo * (sh(n) x sh_coeff)
+    
+    albedo: n x 1
+    sh_coeff: 9 x 1
+    Y(n) = (1, n_x, n_y, n_z, n_xn_y, n_xn_z, n_yn_z, n_x^2 - n_y^2, 3n_z^2 - 1)': n x 9 
+    # Y(n) = (1, n_x, n_y, n_z)': n x 4
+
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3]
+        colors: [nver, 3] albedo
+        sh_coeff: [9, 1] spherical harmonics coefficients
+
+    Returns:
+        lit_colors: [nver, 3]
+    '''
+    assert vertices.shape[0] == colors.shape[0]
+    nver = vertices.shape[0]
+    normal = get_normal(vertices, triangles) # [nver, 3]
+    sh = np.array((np.ones(nver), n[:,0], n[:,1], n[:,2], n[:,0]*n[:,1], n[:,0]*n[:,2], n[:,1]*n[:,2], n[:,0]**2 - n[:,1]**2, 3*(n[:,2]**2) - 1)) # [nver, 9]
+    ref = sh.dot(sh_coeff) #[nver, 1]
+    lit_colors = colors*ref
+    return lit_colors
+
+
+def add_light(vertices, triangles, colors, light_positions = 0, light_intensities = 0):
+    ''' Gouraud shading. add point lights.
+    In 3d face, usually assume:
+    1. The surface of face is Lambertian(reflect only the low frequencies of lighting)
+    2. Lighting can be an arbitrary combination of point sources
+    3. No specular (unless skin is oil, 23333)
+
+    Ref: https://cs184.eecs.berkeley.edu/lecture/pipeline    
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3]
+        light_positions: [nlight, 3] 
+        light_intensities: [nlight, 3]
+    Returns:
+        lit_colors: [nver, 3]
+    '''
+    nver = vertices.shape[0]
+    normals = get_normal(vertices, triangles) # [nver, 3]
+
+    # ambient
+    # La = ka*Ia
+
+    # diffuse
+    # Ld = kd*(I/r^2)max(0, nxl)
+    direction_to_lights = vertices[np.newaxis, :, :] - light_positions[:, np.newaxis, :] # [nlight, nver, 3]
+    direction_to_lights_n = np.sqrt(np.sum(direction_to_lights**2, axis = 2)) # [nlight, nver]
+    direction_to_lights = direction_to_lights/direction_to_lights_n[:, :, np.newaxis]
+    normals_dot_lights = normals[np.newaxis, :, :]*direction_to_lights # [nlight, nver, 3]
+    normals_dot_lights = np.sum(normals_dot_lights, axis = 2) # [nlight, nver]
+    diffuse_output = colors[np.newaxis, :, :]*normals_dot_lights[:, :, np.newaxis]*light_intensities[:, np.newaxis, :]
+    diffuse_output = np.sum(diffuse_output, axis = 0) # [nver, 3]
+    
+    # specular
+    # h = (v + l)/(|v + l|) bisector
+    # Ls = ks*(I/r^2)max(0, nxh)^p
+    # increasing p narrows the reflectionlob
+
+    lit_colors = diffuse_output # only diffuse part here.
+    lit_colors = np.minimum(np.maximum(lit_colors, 0), 1)
+    return lit_colors
+
+
+
+## TODO. estimate light(sh coeff)
+## -------------------------------- estimate. can not use now. 
+def fit_light(image, vertices, colors, triangles, vis_ind, lamb = 10, max_iter = 3):
+    [h, w, c] = image.shape
+
+    # surface normal
+    norm = get_normal(vertices, triangles)
+    
+    nver = vertices.shape[1]
+
+    # vertices --> corresponding image pixel
+    pt2d = vertices[:2, :]
+
+    pt2d[0,:] = np.minimum(np.maximum(pt2d[0,:], 0), w - 1)
+    pt2d[1,:] = np.minimum(np.maximum(pt2d[1,:], 0), h - 1)
+    pt2d = np.round(pt2d).astype(np.int32) # 2 x nver
+
+    image_pixel = image[pt2d[1,:], pt2d[0,:], :] # nver x 3
+    image_pixel = image_pixel.T # 3 x nver
+
+    # vertices --> corresponding mean texture pixel with illumination
+    # Spherical Harmonic Basis
+    harmonic_dim = 9
+    nx = norm[0,:];
+    ny = norm[1,:];
+    nz = norm[2,:];
+    harmonic = np.zeros((nver, harmonic_dim))
+
+    pi = np.pi
+    harmonic[:,0] = np.sqrt(1/(4*pi)) * np.ones((nver,));
+    harmonic[:,1] = np.sqrt(3/(4*pi)) * nx;
+    harmonic[:,2] = np.sqrt(3/(4*pi)) * ny;
+    harmonic[:,3] = np.sqrt(3/(4*pi)) * nz;
+    harmonic[:,4] = 1/2. * np.sqrt(3/(4*pi)) * (2*nz**2 - nx**2 - ny**2);
+    harmonic[:,5] = 3 * np.sqrt(5/(12*pi)) * (ny*nz);
+    harmonic[:,6] = 3 * np.sqrt(5/(12*pi)) * (nx*nz);
+    harmonic[:,7] = 3 * np.sqrt(5/(12*pi)) * (nx*ny);
+    harmonic[:,8] = 3/2. * np.sqrt(5/(12*pi)) * (nx*nx - ny*ny);
+    
+    '''
+    I' = sum(albedo * lj * hj) j = 0:9 (albedo = tex)
+    set A = albedo*h (n x 9)
+        alpha = lj (9 x 1)
+        Y = I (n x 1)
+        Y' = A.dot(alpha)
+
+    opt function:
+        ||Y - A*alpha|| + lambda*(alpha'*alpha)
+    result:
+        A'*(Y - A*alpha) + lambda*alpha = 0
+        ==>
+        (A'*A*alpha - lambda)*alpha = A'*Y
+        left: 9 x 9
+        right: 9 x 1
+    '''
+    n_vis_ind = len(vis_ind)
+    n = n_vis_ind*c
+
+    Y = np.zeros((n, 1))
+    A = np.zeros((n, 9))
+    light = np.zeros((3, 1))
+
+    for k in range(c):
+        Y[k*n_vis_ind:(k+1)*n_vis_ind, :] = image_pixel[k, vis_ind][:, np.newaxis]
+        A[k*n_vis_ind:(k+1)*n_vis_ind, :] = texture[k, vis_ind][:, np.newaxis] * harmonic[vis_ind, :]
+        Ac = texture[k, vis_ind][:, np.newaxis]
+        Yc = image_pixel[k, vis_ind][:, np.newaxis]
+        light[k] = (Ac.T.dot(Yc))/(Ac.T.dot(Ac))
+
+    for i in range(max_iter):
+
+        Yc = Y.copy()
+        for k in range(c):
+            Yc[k*n_vis_ind:(k+1)*n_vis_ind, :]  /= light[k]
+
+        # update alpha
+        equation_left = np.dot(A.T, A) + lamb*np.eye(harmonic_dim); # why + ?
+        equation_right = np.dot(A.T, Yc) 
+        alpha = np.dot(np.linalg.inv(equation_left), equation_right)
+
+        # update light
+        for k in range(c):
+            Ac = A[k*n_vis_ind:(k+1)*n_vis_ind, :].dot(alpha)
+            Yc = Y[k*n_vis_ind:(k+1)*n_vis_ind, :]
+            light[k] = (Ac.T.dot(Yc))/(Ac.T.dot(Ac))
+
+    appearance = np.zeros_like(texture)
+    for k in range(c):
+        tmp = np.dot(harmonic*texture[k, :][:, np.newaxis], alpha*light[k])
+        appearance[k,:] = tmp.T
+
+    appearance = np.minimum(np.maximum(appearance, 0), 1)
+
+    return appearance
+
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/render.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/render.py
@ -0,0 +1,135 @@
+'''
+functions about rendering mesh(from 3d obj to 2d image).
+only use rasterization render here.
+Note that:
+1. Generally, render func includes camera, light, raterize. Here no camera and light(I write these in other files)
+2. Generally, the input vertices are normalized to [-1,1] and cetered on [0, 0]. (in world space)
+   Here, the vertices are using image coords, which centers on [w/2, h/2] with the y-axis pointing to oppisite direction.
+ Means: render here only conducts interpolation.(I just want to make the input flexible)
+
+Author: Yao Feng 
+Mail: yaofeng1995@gmail.com
+'''
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+from time import time
+
+from .cython import mesh_core_cython
+
+def rasterize_triangles(vertices, triangles, h, w):
+    ''' 
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3]
+        h: height
+        w: width
+    Returns:
+        depth_buffer: [h, w] saves the depth, here, the bigger the z, the fronter the point.
+        triangle_buffer: [h, w] saves the tri id(-1 for no triangle). 
+        barycentric_weight: [h, w, 3] saves corresponding barycentric weight.
+
+    # Each triangle has 3 vertices & Each vertex has 3 coordinates x, y, z.
+    # h, w is the size of rendering
+    '''
+
+    # initial 
+    depth_buffer = np.zeros([h, w]) - 999999. #set the initial z to the farest position
+    triangle_buffer = np.zeros([h, w], dtype = np.int32) - 1  # if tri id = -1, the pixel has no triangle correspondance
+    barycentric_weight = np.zeros([h, w, 3], dtype = np.float32)  # 
+    
+    vertices = vertices.astype(np.float32).copy()
+    triangles = triangles.astype(np.int32).copy()
+
+    mesh_core_cython.rasterize_triangles_core(
+                vertices, triangles,
+                depth_buffer, triangle_buffer, barycentric_weight, 
+                vertices.shape[0], triangles.shape[0], 
+                h, w)
+
+def render_colors(vertices, triangles, colors, h, w, c = 3, BG = None):
+    ''' render mesh with colors
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3] 
+        colors: [nver, 3]
+        h: height
+        w: width  
+        c: channel
+        BG: background image
+    Returns:
+        image: [h, w, c]. rendered image./rendering.
+    '''
+
+    # initial 
+    if BG is None:
+        image = np.zeros((h, w, c), dtype = np.float32)
+    else:
+        assert BG.shape[0] == h and BG.shape[1] == w and BG.shape[2] == c
+        image = BG
+    depth_buffer = np.zeros([h, w], dtype = np.float32, order = 'C') - 999999.
+
+    # change orders. --> C-contiguous order(column major)
+    vertices = vertices.astype(np.float32).copy()
+    triangles = triangles.astype(np.int32).copy()
+    colors = colors.astype(np.float32).copy()
+    ###
+    st = time()
+    mesh_core_cython.render_colors_core(
+                image, vertices, triangles,
+                colors,
+                depth_buffer,
+                vertices.shape[0], triangles.shape[0], 
+                h, w, c)
+    return image
+
+
+def render_texture(vertices, triangles, texture, tex_coords, tex_triangles, h, w, c = 3, mapping_type = 'nearest', BG = None):
+    ''' render mesh with texture map
+    Args:
+        vertices: [3, nver]
+        triangles: [3, ntri]
+        texture: [tex_h, tex_w, 3]
+        tex_coords: [ntexcoords, 3]
+        tex_triangles: [ntri, 3]
+        h: height of rendering
+        w: width of rendering
+        c: channel
+        mapping_type: 'bilinear' or 'nearest'
+    '''
+    # initial 
+    if BG is None:
+        image = np.zeros((h, w, c), dtype = np.float32)
+    else:
+        assert BG.shape[0] == h and BG.shape[1] == w and BG.shape[2] == c
+        image = BG
+
+    depth_buffer = np.zeros([h, w], dtype = np.float32, order = 'C') - 999999.
+    
+    tex_h, tex_w, tex_c = texture.shape
+    if mapping_type == 'nearest':
+        mt = int(0)
+    elif mapping_type == 'bilinear':
+        mt = int(1)
+    else:
+        mt = int(0)
+    
+    # -> C order
+    vertices = vertices.astype(np.float32).copy()
+    triangles = triangles.astype(np.int32).copy()
+    texture = texture.astype(np.float32).copy()
+    tex_coords = tex_coords.astype(np.float32).copy()
+    tex_triangles = tex_triangles.astype(np.int32).copy()
+
+    mesh_core_cython.render_texture_core(
+                image, vertices, triangles,
+                texture, tex_coords, tex_triangles,
+                depth_buffer,
+                vertices.shape[0], tex_coords.shape[0], triangles.shape[0], 
+                h, w, c,
+                tex_h, tex_w, tex_c,
+                mt)
+    return image
+
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/transform.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/transform.py
@ -0,0 +1,383 @@
+'''
+Functions about transforming mesh(changing the position: modify vertices).
+1. forward: transform(transform, camera, project).
+2. backward: estimate transform matrix from correspondences.
+
+Author: Yao Feng 
+Mail: yaofeng1995@gmail.com
+'''
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import math
+from math import cos, sin
+
+def angle2matrix(angles):
+    ''' get rotation matrix from three rotation angles(degree). right-handed.
+    Args:
+        angles: [3,]. x, y, z angles
+        x: pitch. positive for looking down.
+        y: yaw. positive for looking left. 
+        z: roll. positive for tilting head right. 
+    Returns:
+        R: [3, 3]. rotation matrix.
+    '''
+    x, y, z = np.deg2rad(angles[0]), np.deg2rad(angles[1]), np.deg2rad(angles[2])
+    # x
+    Rx=np.array([[1,      0,       0],
+                 [0, cos(x),  -sin(x)],
+                 [0, sin(x),   cos(x)]])
+    # y
+    Ry=np.array([[ cos(y), 0, sin(y)],
+                 [      0, 1,      0],
+                 [-sin(y), 0, cos(y)]])
+    # z
+    Rz=np.array([[cos(z), -sin(z), 0],
+                 [sin(z),  cos(z), 0],
+                 [     0,       0, 1]])
+    
+    R=Rz.dot(Ry.dot(Rx))
+    return R.astype(np.float32)
+
+def angle2matrix_3ddfa(angles):
+    ''' get rotation matrix from three rotation angles(radian). The same as in 3DDFA.
+    Args:
+        angles: [3,]. x, y, z angles
+        x: pitch.
+        y: yaw. 
+        z: roll. 
+    Returns:
+        R: 3x3. rotation matrix.
+    '''
+    # x, y, z = np.deg2rad(angles[0]), np.deg2rad(angles[1]), np.deg2rad(angles[2])
+    x, y, z = angles[0], angles[1], angles[2]
+    
+    # x
+    Rx=np.array([[1,      0,       0],
+                 [0, cos(x),  sin(x)],
+                 [0, -sin(x),   cos(x)]])
+    # y
+    Ry=np.array([[ cos(y), 0, -sin(y)],
+                 [      0, 1,      0],
+                 [sin(y), 0, cos(y)]])
+    # z
+    Rz=np.array([[cos(z), sin(z), 0],
+                 [-sin(z),  cos(z), 0],
+                 [     0,       0, 1]])
+    R = Rx.dot(Ry).dot(Rz)
+    return R.astype(np.float32)
+
+
+## ------------------------------------------ 1. transform(transform, project, camera).
+## ---------- 3d-3d transform. Transform obj in world space
+def rotate(vertices, angles):
+    ''' rotate vertices. 
+    X_new = R.dot(X). X: 3 x 1   
+    Args:
+        vertices: [nver, 3]. 
+        rx, ry, rz: degree angles
+        rx: pitch. positive for looking down 
+        ry: yaw. positive for looking left
+        rz: roll. positive for tilting head right
+    Returns:
+        rotated vertices: [nver, 3]
+    '''
+    R = angle2matrix(angles)
+    rotated_vertices = vertices.dot(R.T)
+
+    return rotated_vertices
+
+def similarity_transform(vertices, s, R, t3d):
+    ''' similarity transform. dof = 7.
+    3D: s*R.dot(X) + t
+    Homo: M = [[sR, t],[0^T, 1]].  M.dot(X)
+    Args:(float32)
+        vertices: [nver, 3]. 
+        s: [1,]. scale factor.
+        R: [3,3]. rotation matrix.
+        t3d: [3,]. 3d translation vector.
+    Returns:
+        transformed vertices: [nver, 3]
+    '''
+    t3d = np.squeeze(np.array(t3d, dtype = np.float32))
+    transformed_vertices = s * vertices.dot(R.T) + t3d[np.newaxis, :]
+
+    return transformed_vertices
+
+
+## -------------- Camera. from world space to camera space
+# Ref: https://cs184.eecs.berkeley.edu/lecture/transforms-2
+def normalize(x):
+    epsilon = 1e-12
+    norm = np.sqrt(np.sum(x**2, axis = 0))
+    norm = np.maximum(norm, epsilon)
+    return x/norm
+
+def lookat_camera(vertices, eye, at = None, up = None):
+    """ 'look at' transformation: from world space to camera space
+    standard camera space: 
+        camera located at the origin. 
+        looking down negative z-axis. 
+        vertical vector is y-axis.
+    Xcam = R(X - C)
+    Homo: [[R, -RC], [0, 1]]
+    Args:
+      vertices: [nver, 3] 
+      eye: [3,] the XYZ world space position of the camera.
+      at: [3,] a position along the center of the camera's gaze.
+      up: [3,] up direction 
+    Returns:
+      transformed_vertices: [nver, 3]
+    """
+    if at is None:
+      at = np.array([0, 0, 0], np.float32)
+    if up is None:
+      up = np.array([0, 1, 0], np.float32)
+
+    eye = np.array(eye).astype(np.float32)
+    at = np.array(at).astype(np.float32)
+    z_aixs = -normalize(at - eye) # look forward
+    x_aixs = normalize(np.cross(up, z_aixs)) # look right
+    y_axis = np.cross(z_aixs, x_aixs) # look up
+
+    R = np.stack((x_aixs, y_axis, z_aixs))#, axis = 0) # 3 x 3
+    transformed_vertices = vertices - eye # translation
+    transformed_vertices = transformed_vertices.dot(R.T) # rotation
+    return transformed_vertices
+
+## --------- 3d-2d project. from camera space to image plane
+# generally, image plane only keeps x,y channels, here reserve z channel for calculating z-buffer.
+def orthographic_project(vertices):
+    ''' scaled orthographic projection(just delete z)
+        assumes: variations in depth over the object is small relative to the mean distance from camera to object
+        x -> x*f/z, y -> x*f/z, z -> f.
+        for point i,j. zi~=zj. so just delete z
+        ** often used in face
+        Homo: P = [[1,0,0,0], [0,1,0,0], [0,0,1,0]]
+    Args:
+        vertices: [nver, 3]
+    Returns:
+        projected_vertices: [nver, 3] if isKeepZ=True. [nver, 2] if isKeepZ=False.
+    '''
+    return vertices.copy()
+
+def perspective_project(vertices, fovy, aspect_ratio = 1., near = 0.1, far = 1000.):
+    ''' perspective projection.
+    Args:
+        vertices: [nver, 3]
+        fovy: vertical angular field of view. degree.
+        aspect_ratio : width / height of field of view
+        near : depth of near clipping plane
+        far : depth of far clipping plane
+    Returns:
+        projected_vertices: [nver, 3] 
+    '''
+    fovy = np.deg2rad(fovy)
+    top = near*np.tan(fovy)
+    bottom = -top 
+    right = top*aspect_ratio
+    left = -right
+
+    #-- homo
+    P = np.array([[near/right, 0, 0, 0],
+                 [0, near/top, 0, 0],
+                 [0, 0, -(far+near)/(far-near), -2*far*near/(far-near)],
+                 [0, 0, -1, 0]])
+    vertices_homo = np.hstack((vertices, np.ones((vertices.shape[0], 1)))) # [nver, 4]
+    projected_vertices = vertices_homo.dot(P.T)
+    projected_vertices = projected_vertices/projected_vertices[:,3:]
+    projected_vertices = projected_vertices[:,:3]
+    projected_vertices[:,2] = -projected_vertices[:,2]
+
+    #-- non homo. only fovy
+    # projected_vertices = vertices.copy()
+    # projected_vertices[:,0] = -(near/right)*vertices[:,0]/vertices[:,2]
+    # projected_vertices[:,1] = -(near/top)*vertices[:,1]/vertices[:,2]
+    return projected_vertices
+
+
+def to_image(vertices, h, w, is_perspective = False):
+    ''' change vertices to image coord system
+    3d system: XYZ, center(0, 0, 0)
+    2d image: x(u), y(v). center(w/2, h/2), flip y-axis. 
+    Args:
+        vertices: [nver, 3]
+        h: height of the rendering
+        w : width of the rendering
+    Returns:
+        projected_vertices: [nver, 3]  
+    '''
+    image_vertices = vertices.copy()
+    if is_perspective:
+        # if perspective, the projected vertices are normalized to [-1, 1]. so change it to image size first.
+        image_vertices[:,0] = image_vertices[:,0]*w/2
+        image_vertices[:,1] = image_vertices[:,1]*h/2
+    # move to center of image
+    image_vertices[:,0] = image_vertices[:,0] + w/2
+    image_vertices[:,1] = image_vertices[:,1] + h/2
+    # flip vertices along y-axis.
+    image_vertices[:,1] = h - image_vertices[:,1] - 1
+    return image_vertices
+
+
+#### -------------------------------------------2. estimate transform matrix from correspondences.
+def estimate_affine_matrix_3d23d(X, Y):
+    ''' Using least-squares solution 
+    Args:
+        X: [n, 3]. 3d points(fixed)
+        Y: [n, 3]. corresponding 3d points(moving). Y = PX
+    Returns:
+        P_Affine: (3, 4). Affine camera matrix (the third row is [0, 0, 0, 1]).
+    '''
+    X_homo = np.hstack((X, np.ones([X.shape[1],1]))) #n x 4
+    P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
+    return P
+    
+def estimate_affine_matrix_3d22d(X, x):
+    ''' Using Golden Standard Algorithm for estimating an affine camera
+        matrix P from world to image correspondences.
+        See Alg.7.2. in MVGCV 
+        Code Ref: https://github.com/patrikhuber/eos/blob/master/include/eos/fitting/affine_camera_estimation.hpp
+        x_homo = X_homo.dot(P_Affine)
+    Args:
+        X: [n, 3]. corresponding 3d points(fixed)
+        x: [n, 2]. n>=4. 2d points(moving). x = PX
+    Returns:
+        P_Affine: [3, 4]. Affine camera matrix
+    '''
+    X = X.T; x = x.T
+    assert(x.shape[1] == X.shape[1])
+    n = x.shape[1]
+    assert(n >= 4)
+
+    #--- 1. normalization
+    # 2d points
+    mean = np.mean(x, 1) # (2,)
+    x = x - np.tile(mean[:, np.newaxis], [1, n])
+    average_norm = np.mean(np.sqrt(np.sum(x**2, 0)))
+    scale = np.sqrt(2) / average_norm
+    x = scale * x
+
+    T = np.zeros((3,3), dtype = np.float32)
+    T[0, 0] = T[1, 1] = scale
+    T[:2, 2] = -mean*scale
+    T[2, 2] = 1
+
+    # 3d points
+    X_homo = np.vstack((X, np.ones((1, n))))
+    mean = np.mean(X, 1) # (3,)
+    X = X - np.tile(mean[:, np.newaxis], [1, n])
+    m = X_homo[:3,:] - X
+    average_norm = np.mean(np.sqrt(np.sum(X**2, 0)))
+    scale = np.sqrt(3) / average_norm
+    X = scale * X
+
+    U = np.zeros((4,4), dtype = np.float32)
+    U[0, 0] = U[1, 1] = U[2, 2] = scale
+    U[:3, 3] = -mean*scale
+    U[3, 3] = 1
+
+    # --- 2. equations
+    A = np.zeros((n*2, 8), dtype = np.float32);
+    X_homo = np.vstack((X, np.ones((1, n)))).T
+    A[:n, :4] = X_homo
+    A[n:, 4:] = X_homo
+    b = np.reshape(x, [-1, 1])
+ 
+    # --- 3. solution
+    p_8 = np.linalg.pinv(A).dot(b)
+    P = np.zeros((3, 4), dtype = np.float32)
+    P[0, :] = p_8[:4, 0]
+    P[1, :] = p_8[4:, 0]
+    P[-1, -1] = 1
+
+    # --- 4. denormalization
+    P_Affine = np.linalg.inv(T).dot(P.dot(U))
+    return P_Affine
+
+def P2sRt(P):
+    ''' decompositing camera matrix P
+    Args: 
+        P: (3, 4). Affine Camera Matrix.
+    Returns:
+        s: scale factor.
+        R: (3, 3). rotation matrix.
+        t: (3,). translation. 
+    '''
+    t = P[:, 3]
+    R1 = P[0:1, :3]
+    R2 = P[1:2, :3]
+    s = (np.linalg.norm(R1) + np.linalg.norm(R2))/2.0
+    r1 = R1/np.linalg.norm(R1)
+    r2 = R2/np.linalg.norm(R2)
+    r3 = np.cross(r1, r2)
+
+    R = np.concatenate((r1, r2, r3), 0)
+    return s, R, t
+
+#Ref: https://www.learnopencv.com/rotation-matrix-to-euler-angles/
+def isRotationMatrix(R):
+    ''' checks if a matrix is a valid rotation matrix(whether orthogonal or not)
+    '''
+    Rt = np.transpose(R)
+    shouldBeIdentity = np.dot(Rt, R)
+    I = np.identity(3, dtype = R.dtype)
+    n = np.linalg.norm(I - shouldBeIdentity)
+    return n < 1e-6
+
+def matrix2angle(R):
+    ''' get three Euler angles from Rotation Matrix
+    Args:
+        R: (3,3). rotation matrix
+    Returns:
+        x: pitch
+        y: yaw
+        z: roll
+    '''
+    assert(isRotationMatrix)
+    sy = math.sqrt(R[0,0] * R[0,0] +  R[1,0] * R[1,0])
+     
+    singular = sy < 1e-6
+ 
+    if  not singular :
+        x = math.atan2(R[2,1] , R[2,2])
+        y = math.atan2(-R[2,0], sy)
+        z = math.atan2(R[1,0], R[0,0])
+    else :
+        x = math.atan2(-R[1,2], R[1,1])
+        y = math.atan2(-R[2,0], sy)
+        z = 0
+
+    # rx, ry, rz = np.rad2deg(x), np.rad2deg(y), np.rad2deg(z)
+    rx, ry, rz = x*180/np.pi, y*180/np.pi, z*180/np.pi
+    return rx, ry, rz
+
+# def matrix2angle(R):
+#     ''' compute three Euler angles from a Rotation Matrix. Ref: http://www.gregslabaugh.net/publications/euler.pdf
+#     Args:
+#         R: (3,3). rotation matrix
+#     Returns:
+#         x: yaw
+#         y: pitch
+#         z: roll
+#     '''
+#     # assert(isRotationMatrix(R))
+
+#     if R[2,0] !=1 or R[2,0] != -1:
+#         x = math.asin(R[2,0])
+#         y = math.atan2(R[2,1]/cos(x), R[2,2]/cos(x))
+#         z = math.atan2(R[1,0]/cos(x), R[0,0]/cos(x))
+        
+#     else:# Gimbal lock
+#         z = 0 #can be anything
+#         if R[2,0] == -1:
+#             x = np.pi/2
+#             y = z + math.atan2(R[0,1], R[0,2])
+#         else:
+#             x = -np.pi/2
+#             y = -z + math.atan2(-R[0,1], -R[0,2])
+
+#     return x, y, z
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/vis.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh/vis.py
@ -0,0 +1,24 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import matplotlib.pyplot as plt
+from skimage import measure
+from mpl_toolkits.mplot3d import Axes3D
+
+def plot_mesh(vertices, triangles, subplot = [1,1,1], title = 'mesh', el = 90, az = -90, lwdt=.1, dist = 6, color = "grey"):
+	'''
+	plot the mesh 
+	Args:
+		vertices: [nver, 3]
+		triangles: [ntri, 3]
+	'''
+	ax = plt.subplot(subplot[0], subplot[1], subplot[2], projection = '3d')
+	ax.plot_trisurf(vertices[:, 0], vertices[:, 1], vertices[:, 2], triangles = triangles, lw = lwdt, color = color, alpha = 1)
+	ax.axis("off")
+	ax.view_init(elev = el, azim = az)
+	ax.dist = dist
+	plt.title(title)
+
+### -------------- Todo: use vtk to visualize mesh? or visvis? or VisPy?
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/init.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/init.py
@ -0,0 +1,10 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from . import io
+from . import vis
+from . import transform
+from . import light
+from . import render
+
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/io.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/io.py
@ -0,0 +1,170 @@
+''' io: read&write mesh
+1. read obj as array(TODO)
+2. write arrays to obj
+
+Preparation knowledge:
+representations of 3d face: mesh, point cloud...
+storage format: obj, ply, bin, asc, mat...
+'''
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import os
+from skimage import io
+
+## TODO
+## TODO: c++ version
+def read_obj(obj_name):
+	''' read mesh
+	'''
+	return 0
+
+# ------------------------- write
+def write_asc(path, vertices):
+    '''
+    Args:
+        vertices: shape = (nver, 3)
+    '''
+    if path.split('.')[-1] == 'asc':
+        np.savetxt(path, vertices)
+    else:
+        np.savetxt(path + '.asc', vertices)
+
+def write_obj_with_colors(obj_name, vertices, triangles, colors):
+    ''' Save 3D face model with texture represented by colors.
+    Args:
+        obj_name: str
+        vertices: shape = (nver, 3)
+        triangles: shape = (ntri, 3)
+        colors: shape = (nver, 3)
+    '''
+    triangles = triangles.copy()
+    triangles += 1 # meshlab start with 1
+    
+    if obj_name.split('.')[-1] != 'obj':
+        obj_name = obj_name + '.obj'
+        
+    # write obj
+    with open(obj_name, 'w') as f:
+        
+        # write vertices & colors
+        for i in range(vertices.shape[0]):
+            # s = 'v {} {} {} \n'.format(vertices[0,i], vertices[1,i], vertices[2,i])
+            s = 'v {} {} {} {} {} {}\n'.format(vertices[i, 0], vertices[i, 1], vertices[i, 2], colors[i, 0], colors[i, 1], colors[i, 2])
+            f.write(s)
+
+        # write f: ver ind/ uv ind
+        [k, ntri] = triangles.shape
+        for i in range(triangles.shape[0]):
+            # s = 'f {} {} {}\n'.format(triangles[i, 0], triangles[i, 1], triangles[i, 2])
+            s = 'f {} {} {}\n'.format(triangles[i, 2], triangles[i, 1], triangles[i, 0])
+            f.write(s)
+
+## TODO: c++ version
+def write_obj_with_texture(obj_name, vertices, triangles, texture, uv_coords):
+    ''' Save 3D face model with texture represented by texture map.
+    Ref: https://github.com/patrikhuber/eos/blob/bd00155ebae4b1a13b08bf5a991694d682abbada/include/eos/core/Mesh.hpp
+    Args:
+        obj_name: str
+        vertices: shape = (nver, 3)
+        triangles: shape = (ntri, 3)
+        texture: shape = (256,256,3)
+        uv_coords: shape = (nver, 3) max value<=1
+    '''
+    if obj_name.split('.')[-1] != 'obj':
+        obj_name = obj_name + '.obj'
+    mtl_name = obj_name.replace('.obj', '.mtl')
+    texture_name = obj_name.replace('.obj', '_texture.png')
+    
+    triangles = triangles.copy()
+    triangles += 1 # mesh lab start with 1
+    
+    # write obj
+    with open(obj_name, 'w') as f:
+        # first line: write mtlib(material library)
+        s = "mtllib {}\n".format(os.path.abspath(mtl_name))
+        f.write(s)
+
+        # write vertices
+        for i in range(vertices.shape[0]):
+            s = 'v {} {} {}\n'.format(vertices[i, 0], vertices[i, 1], vertices[i, 2])
+            f.write(s)
+        
+        # write uv coords
+        for i in range(uv_coords.shape[0]):
+            # s = 'vt {} {}\n'.format(uv_coords[i,0], 1 - uv_coords[i,1])
+            s = 'vt {} {}\n'.format(uv_coords[i,0], uv_coords[i,1])
+            f.write(s)
+
+        f.write("usemtl FaceTexture\n")
+
+        # write f: ver ind/ uv ind
+        for i in range(triangles.shape[0]):
+            s = 'f {}/{} {}/{} {}/{}\n'.format(triangles[i,2], triangles[i,2], triangles[i,1], triangles[i,1], triangles[i,0], triangles[i,0])
+            f.write(s)
+
+    # write mtl
+    with open(mtl_name, 'w') as f:
+        f.write("newmtl FaceTexture\n")
+        s = 'map_Kd {}\n'.format(os.path.abspath(texture_name)) # map to image
+        f.write(s)
+
+    # write texture as png
+    imsave(texture_name, texture)
+
+
+def write_obj_with_colors_texture(obj_name, vertices, triangles, colors, texture, uv_coords):
+    ''' Save 3D face model with texture. 
+    Ref: https://github.com/patrikhuber/eos/blob/bd00155ebae4b1a13b08bf5a991694d682abbada/include/eos/core/Mesh.hpp
+    Args:
+        obj_name: str
+        vertices: shape = (nver, 3)
+        triangles: shape = (ntri, 3)
+        colors: shape = (nver, 3)
+        texture: shape = (256,256,3)
+        uv_coords: shape = (nver, 3) max value<=1
+    '''
+    if obj_name.split('.')[-1] != 'obj':
+        obj_name = obj_name + '.obj'
+    mtl_name = obj_name.replace('.obj', '.mtl')
+    texture_name = obj_name.replace('.obj', '_texture.png')
+    
+    triangles = triangles.copy()
+    triangles += 1 # mesh lab start with 1
+    
+    # write obj
+    with open(obj_name, 'w') as f:
+        # first line: write mtlib(material library)
+        s = "mtllib {}\n".format(os.path.abspath(mtl_name))
+        f.write(s)
+
+        # write vertices
+        for i in range(vertices.shape[0]):
+            s = 'v {} {} {} {} {} {}\n'.format(vertices[i, 0], vertices[i, 1], vertices[i, 2], colors[i, 0], colors[i, 1], colors[i, 2])
+            f.write(s)
+        
+        # write uv coords
+        for i in range(uv_coords.shape[0]):
+            # s = 'vt {} {}\n'.format(uv_coords[i,0], 1 - uv_coords[i,1])
+            s = 'vt {} {}\n'.format(uv_coords[i,0], uv_coords[i,1])
+            f.write(s)
+
+        f.write("usemtl FaceTexture\n")
+
+        # write f: ver ind/ uv ind
+        for i in range(triangles.shape[0]):
+            # s = 'f {}/{} {}/{} {}/{}\n'.format(triangles[i,0], triangles[i,0], triangles[i,1], triangles[i,1], triangles[i,2], triangles[i,2])
+            s = 'f {}/{} {}/{} {}/{}\n'.format(triangles[i,2], triangles[i,2], triangles[i,1], triangles[i,1], triangles[i,0], triangles[i,0])
+            f.write(s)
+
+    # write mtl
+    with open(mtl_name, 'w') as f:
+        f.write("newmtl FaceTexture\n")
+        s = 'map_Kd {}\n'.format(os.path.abspath(texture_name)) # map to image
+        f.write(s)
+
+    # write texture as png
+    io.imsave(texture_name, texture)
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/light.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/light.py
@ -0,0 +1,215 @@
+'''
+Functions about lighting mesh(changing colors/texture of mesh).
+1. add light to colors/texture (shade each vertex)
+2. fit light according to colors/texture & image.
+
+Preparation knowledge:
+lighting: https://cs184.eecs.berkeley.edu/lecture/pipeline
+spherical harmonics in human face: '3D Face Reconstruction from a Single Image Using a Single Reference Face Shape'
+'''
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+def get_normal(vertices, triangles):
+    ''' calculate normal direction in each vertex
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3]
+    Returns:
+        normal: [nver, 3]
+    '''
+    pt0 = vertices[triangles[:, 0], :] # [ntri, 3]
+    pt1 = vertices[triangles[:, 1], :] # [ntri, 3]
+    pt2 = vertices[triangles[:, 2], :] # [ntri, 3]
+    tri_normal = np.cross(pt0 - pt1, pt0 - pt2) # [ntri, 3]. normal of each triangle
+
+    normal = np.zeros_like(vertices) # [nver, 3]
+    for i in range(triangles.shape[0]):
+        normal[triangles[i, 0], :] = normal[triangles[i, 0], :] + tri_normal[i, :]
+        normal[triangles[i, 1], :] = normal[triangles[i, 1], :] + tri_normal[i, :]
+        normal[triangles[i, 2], :] = normal[triangles[i, 2], :] + tri_normal[i, :]
+    
+    # normalize to unit length
+    mag = np.sum(normal**2, 1) # [nver]
+    zero_ind = (mag == 0)
+    mag[zero_ind] = 1;
+    normal[zero_ind, 0] = np.ones((np.sum(zero_ind)))
+
+    normal = normal/np.sqrt(mag[:,np.newaxis])
+
+    return normal
+
+# TODO: test
+def add_light_sh(vertices, triangles, colors, sh_coeff):
+    ''' 
+    In 3d face, usually assume:
+    1. The surface of face is Lambertian(reflect only the low frequencies of lighting)
+    2. Lighting can be an arbitrary combination of point sources
+    --> can be expressed in terms of spherical harmonics(omit the lighting coefficients)
+    I = albedo * (sh(n) x sh_coeff)
+    
+    albedo: n x 1
+    sh_coeff: 9 x 1
+    Y(n) = (1, n_x, n_y, n_z, n_xn_y, n_xn_z, n_yn_z, n_x^2 - n_y^2, 3n_z^2 - 1)': n x 9 
+    # Y(n) = (1, n_x, n_y, n_z)': n x 4
+
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3]
+        colors: [nver, 3] albedo
+        sh_coeff: [9, 1] spherical harmonics coefficients
+
+    Returns:
+        lit_colors: [nver, 3]
+    '''
+    assert vertices.shape[0] == colors.shape[0]
+    nver = vertices.shape[0]
+    normal = get_normal(vertices, triangles) # [nver, 3]
+    sh = np.array((np.ones(nver), n[:,0], n[:,1], n[:,2], n[:,0]*n[:,1], n[:,0]*n[:,2], n[:,1]*n[:,2], n[:,0]**2 - n[:,1]**2, 3*(n[:,2]**2) - 1)) # [nver, 9]
+    ref = sh.dot(sh_coeff) #[nver, 1]
+    lit_colors = colors*ref
+    return lit_colors
+
+
+def add_light(vertices, triangles, colors, light_positions = 0, light_intensities = 0):
+    ''' Gouraud shading. add point lights.
+    In 3d face, usually assume:
+    1. The surface of face is Lambertian(reflect only the low frequencies of lighting)
+    2. Lighting can be an arbitrary combination of point sources
+    3. No specular (unless skin is oil, 23333)
+
+    Ref: https://cs184.eecs.berkeley.edu/lecture/pipeline    
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3]
+        light_positions: [nlight, 3] 
+        light_intensities: [nlight, 3]
+    Returns:
+        lit_colors: [nver, 3]
+    '''
+    nver = vertices.shape[0]
+    normals = get_normal(vertices, triangles) # [nver, 3]
+
+    # ambient
+    # La = ka*Ia
+
+    # diffuse
+    # Ld = kd*(I/r^2)max(0, nxl)
+    direction_to_lights = vertices[np.newaxis, :, :] - light_positions[:, np.newaxis, :] # [nlight, nver, 3]
+    direction_to_lights_n = np.sqrt(np.sum(direction_to_lights**2, axis = 2)) # [nlight, nver]
+    direction_to_lights = direction_to_lights/direction_to_lights_n[:, :, np.newaxis]
+    normals_dot_lights = normals[np.newaxis, :, :]*direction_to_lights # [nlight, nver, 3]
+    normals_dot_lights = np.sum(normals_dot_lights, axis = 2) # [nlight, nver]
+    diffuse_output = colors[np.newaxis, :, :]*normals_dot_lights[:, :, np.newaxis]*light_intensities[:, np.newaxis, :]
+    diffuse_output = np.sum(diffuse_output, axis = 0) # [nver, 3]
+    
+    # specular
+    # h = (v + l)/(|v + l|) bisector
+    # Ls = ks*(I/r^2)max(0, nxh)^p
+    # increasing p narrows the reflectionlob
+
+    lit_colors = diffuse_output # only diffuse part here.
+    lit_colors = np.minimum(np.maximum(lit_colors, 0), 1)
+    return lit_colors
+
+
+
+## TODO. estimate light(sh coeff)
+## -------------------------------- estimate. can not use now. 
+def fit_light(image, vertices, colors, triangles, vis_ind, lamb = 10, max_iter = 3):
+    [h, w, c] = image.shape
+
+    # surface normal
+    norm = get_normal(vertices, triangles)
+    
+    nver = vertices.shape[1]
+
+    # vertices --> corresponding image pixel
+    pt2d = vertices[:2, :]
+
+    pt2d[0,:] = np.minimum(np.maximum(pt2d[0,:], 0), w - 1)
+    pt2d[1,:] = np.minimum(np.maximum(pt2d[1,:], 0), h - 1)
+    pt2d = np.round(pt2d).astype(np.int32) # 2 x nver
+
+    image_pixel = image[pt2d[1,:], pt2d[0,:], :] # nver x 3
+    image_pixel = image_pixel.T # 3 x nver
+
+    # vertices --> corresponding mean texture pixel with illumination
+    # Spherical Harmonic Basis
+    harmonic_dim = 9
+    nx = norm[0,:];
+    ny = norm[1,:];
+    nz = norm[2,:];
+    harmonic = np.zeros((nver, harmonic_dim))
+
+    pi = np.pi
+    harmonic[:,0] = np.sqrt(1/(4*pi)) * np.ones((nver,));
+    harmonic[:,1] = np.sqrt(3/(4*pi)) * nx;
+    harmonic[:,2] = np.sqrt(3/(4*pi)) * ny;
+    harmonic[:,3] = np.sqrt(3/(4*pi)) * nz;
+    harmonic[:,4] = 1/2. * np.sqrt(3/(4*pi)) * (2*nz**2 - nx**2 - ny**2);
+    harmonic[:,5] = 3 * np.sqrt(5/(12*pi)) * (ny*nz);
+    harmonic[:,6] = 3 * np.sqrt(5/(12*pi)) * (nx*nz);
+    harmonic[:,7] = 3 * np.sqrt(5/(12*pi)) * (nx*ny);
+    harmonic[:,8] = 3/2. * np.sqrt(5/(12*pi)) * (nx*nx - ny*ny);
+    
+    '''
+    I' = sum(albedo * lj * hj) j = 0:9 (albedo = tex)
+    set A = albedo*h (n x 9)
+        alpha = lj (9 x 1)
+        Y = I (n x 1)
+        Y' = A.dot(alpha)
+
+    opt function:
+        ||Y - A*alpha|| + lambda*(alpha'*alpha)
+    result:
+        A'*(Y - A*alpha) + lambda*alpha = 0
+        ==>
+        (A'*A*alpha - lambda)*alpha = A'*Y
+        left: 9 x 9
+        right: 9 x 1
+    '''
+    n_vis_ind = len(vis_ind)
+    n = n_vis_ind*c
+
+    Y = np.zeros((n, 1))
+    A = np.zeros((n, 9))
+    light = np.zeros((3, 1))
+
+    for k in range(c):
+        Y[k*n_vis_ind:(k+1)*n_vis_ind, :] = image_pixel[k, vis_ind][:, np.newaxis]
+        A[k*n_vis_ind:(k+1)*n_vis_ind, :] = texture[k, vis_ind][:, np.newaxis] * harmonic[vis_ind, :]
+        Ac = texture[k, vis_ind][:, np.newaxis]
+        Yc = image_pixel[k, vis_ind][:, np.newaxis]
+        light[k] = (Ac.T.dot(Yc))/(Ac.T.dot(Ac))
+
+    for i in range(max_iter):
+
+        Yc = Y.copy()
+        for k in range(c):
+            Yc[k*n_vis_ind:(k+1)*n_vis_ind, :]  /= light[k]
+
+        # update alpha
+        equation_left = np.dot(A.T, A) + lamb*np.eye(harmonic_dim); # why + ?
+        equation_right = np.dot(A.T, Yc) 
+        alpha = np.dot(np.linalg.inv(equation_left), equation_right)
+
+        # update light
+        for k in range(c):
+            Ac = A[k*n_vis_ind:(k+1)*n_vis_ind, :].dot(alpha)
+            Yc = Y[k*n_vis_ind:(k+1)*n_vis_ind, :]
+            light[k] = (Ac.T.dot(Yc))/(Ac.T.dot(Ac))
+
+    appearance = np.zeros_like(texture)
+    for k in range(c):
+        tmp = np.dot(harmonic*texture[k, :][:, np.newaxis], alpha*light[k])
+        appearance[k,:] = tmp.T
+
+    appearance = np.minimum(np.maximum(appearance, 0), 1)
+
+    return appearance
+
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/render.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/render.py
@ -0,0 +1,287 @@
+'''
+functions about rendering mesh(from 3d obj to 2d image).
+only use rasterization render here.
+Note that:
+1. Generally, render func includes camera, light, raterize. Here no camera and light(I write these in other files)
+2. Generally, the input vertices are normalized to [-1,1] and cetered on [0, 0]. (in world space)
+   Here, the vertices are using image coords, which centers on [w/2, h/2] with the y-axis pointing to oppisite direction.
+Means: render here only conducts interpolation.(I just want to make the input flexible)
+
+Preparation knowledge:
+z-buffer: https://cs184.eecs.berkeley.edu/lecture/pipeline
+
+Author: Yao Feng 
+Mail: yaofeng1995@gmail.com
+'''
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+from time import time
+
+def isPointInTri(point, tri_points):
+    ''' Judge whether the point is in the triangle
+    Method:
+        http://blackpawn.com/texts/pointinpoly/
+    Args:
+        point: (2,). [u, v] or [x, y] 
+        tri_points: (3 vertices, 2 coords). three vertices(2d points) of a triangle. 
+    Returns:
+        bool: true for in triangle
+    '''
+    tp = tri_points
+
+    # vectors
+    v0 = tp[2,:] - tp[0,:]
+    v1 = tp[1,:] - tp[0,:]
+    v2 = point - tp[0,:]
+
+    # dot products
+    dot00 = np.dot(v0.T, v0)
+    dot01 = np.dot(v0.T, v1)
+    dot02 = np.dot(v0.T, v2)
+    dot11 = np.dot(v1.T, v1)
+    dot12 = np.dot(v1.T, v2)
+
+    # barycentric coordinates
+    if dot00*dot11 - dot01*dot01 == 0:
+        inverDeno = 0
+    else:
+        inverDeno = 1/(dot00*dot11 - dot01*dot01)
+
+    u = (dot11*dot02 - dot01*dot12)*inverDeno
+    v = (dot00*dot12 - dot01*dot02)*inverDeno
+
+    # check if point in triangle
+    return (u >= 0) & (v >= 0) & (u + v < 1)
+
+def get_point_weight(point, tri_points):
+    ''' Get the weights of the position
+    Methods: https://gamedev.stackexchange.com/questions/23743/whats-the-most-efficient-way-to-find-barycentric-coordinates
+     -m1.compute the area of the triangles formed by embedding the point P inside the triangle
+     -m2.Christer Ericson's book "Real-Time Collision Detection". faster.(used)
+    Args:
+        point: (2,). [u, v] or [x, y] 
+        tri_points: (3 vertices, 2 coords). three vertices(2d points) of a triangle. 
+    Returns:
+        w0: weight of v0
+        w1: weight of v1
+        w2: weight of v3
+     '''
+    tp = tri_points
+    # vectors
+    v0 = tp[2,:] - tp[0,:]
+    v1 = tp[1,:] - tp[0,:]
+    v2 = point - tp[0,:]
+
+    # dot products
+    dot00 = np.dot(v0.T, v0)
+    dot01 = np.dot(v0.T, v1)
+    dot02 = np.dot(v0.T, v2)
+    dot11 = np.dot(v1.T, v1)
+    dot12 = np.dot(v1.T, v2)
+
+    # barycentric coordinates
+    if dot00*dot11 - dot01*dot01 == 0:
+        inverDeno = 0
+    else:
+        inverDeno = 1/(dot00*dot11 - dot01*dot01)
+
+    u = (dot11*dot02 - dot01*dot12)*inverDeno
+    v = (dot00*dot12 - dot01*dot02)*inverDeno
+
+    w0 = 1 - u - v
+    w1 = v
+    w2 = u
+
+    return w0, w1, w2
+
+def rasterize_triangles(vertices, triangles, h, w):
+    ''' 
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3]
+        h: height
+        w: width
+    Returns:
+        depth_buffer: [h, w] saves the depth, here, the bigger the z, the fronter the point.
+        triangle_buffer: [h, w] saves the tri id(-1 for no triangle). 
+        barycentric_weight: [h, w, 3] saves corresponding barycentric weight.
+
+    # Each triangle has 3 vertices & Each vertex has 3 coordinates x, y, z.
+    # h, w is the size of rendering
+    '''
+    # initial 
+    depth_buffer = np.zeros([h, w]) - 999999. #+ np.min(vertices[2,:]) - 999999. # set the initial z to the farest position
+    triangle_buffer = np.zeros([h, w], dtype = np.int32) - 1  # if tri id = -1, the pixel has no triangle correspondance
+    barycentric_weight = np.zeros([h, w, 3], dtype = np.float32)  # 
+    
+    for i in range(triangles.shape[0]):
+        tri = triangles[i, :] # 3 vertex indices
+
+        # the inner bounding box
+        umin = max(int(np.ceil(np.min(vertices[tri, 0]))), 0)
+        umax = min(int(np.floor(np.max(vertices[tri, 0]))), w-1)
+
+        vmin = max(int(np.ceil(np.min(vertices[tri, 1]))), 0)
+        vmax = min(int(np.floor(np.max(vertices[tri, 1]))), h-1)
+
+        if umax<umin or vmax<vmin:
+            continue
+
+        for u in range(umin, umax+1):
+            for v in range(vmin, vmax+1):
+                if not isPointInTri([u,v], vertices[tri, :2]): 
+                    continue
+                w0, w1, w2 = get_point_weight([u, v], vertices[tri, :2]) # barycentric weight
+                point_depth = w0*vertices[tri[0], 2] + w1*vertices[tri[1], 2] + w2*vertices[tri[2], 2]
+                if point_depth > depth_buffer[v, u]:
+                    depth_buffer[v, u] = point_depth
+                    triangle_buffer[v, u] = i
+                    barycentric_weight[v, u, :] = np.array([w0, w1, w2])
+
+    return depth_buffer, triangle_buffer, barycentric_weight
+
+
+def render_colors_ras(vertices, triangles, colors, h, w, c = 3):
+    ''' render mesh with colors(rasterize triangle first)
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3] 
+        colors: [nver, 3]
+        h: height
+        w: width    
+        c: channel
+    Returns:
+        image: [h, w, c]. rendering.
+    '''
+    assert vertices.shape[0] == colors.shape[0]
+
+    depth_buffer, triangle_buffer, barycentric_weight = rasterize_triangles(vertices, triangles, h, w)
+
+    triangle_buffer_flat = np.reshape(triangle_buffer, [-1]) # [h*w]
+    barycentric_weight_flat = np.reshape(barycentric_weight, [-1, c]) #[h*w, c]
+    weight = barycentric_weight_flat[:, :, np.newaxis] # [h*w, 3(ver in tri), 1]
+
+    colors_flat = colors[triangles[triangle_buffer_flat, :], :] # [h*w(tri id in pixel), 3(ver in tri), c(color in ver)]
+    colors_flat = weight*colors_flat # [h*w, 3, 3]
+    colors_flat = np.sum(colors_flat, 1) #[h*w, 3]. add tri.
+
+    image = np.reshape(colors_flat, [h, w, c])
+    # mask = (triangle_buffer[:,:] > -1).astype(np.float32)
+    # image = image*mask[:,:,np.newaxis]
+    return image
+
+
+def render_colors(vertices, triangles, colors, h, w, c = 3):
+    ''' render mesh with colors
+    Args:
+        vertices: [nver, 3]
+        triangles: [ntri, 3] 
+        colors: [nver, 3]
+        h: height
+        w: width    
+    Returns:
+        image: [h, w, c]. 
+    '''
+    assert vertices.shape[0] == colors.shape[0]
+    
+    # initial 
+    image = np.zeros((h, w, c))
+    depth_buffer = np.zeros([h, w]) - 999999.
+
+    for i in range(triangles.shape[0]):
+        tri = triangles[i, :] # 3 vertex indices
+
+        # the inner bounding box
+        umin = max(int(np.ceil(np.min(vertices[tri, 0]))), 0)
+        umax = min(int(np.floor(np.max(vertices[tri, 0]))), w-1)
+
+        vmin = max(int(np.ceil(np.min(vertices[tri, 1]))), 0)
+        vmax = min(int(np.floor(np.max(vertices[tri, 1]))), h-1)
+
+        if umax<umin or vmax<vmin:
+            continue
+
+        for u in range(umin, umax+1):
+            for v in range(vmin, vmax+1):
+                if not isPointInTri([u,v], vertices[tri, :2]): 
+                    continue
+                w0, w1, w2 = get_point_weight([u, v], vertices[tri, :2])
+                point_depth = w0*vertices[tri[0], 2] + w1*vertices[tri[1], 2] + w2*vertices[tri[2], 2]
+
+                if point_depth > depth_buffer[v, u]:
+                    depth_buffer[v, u] = point_depth
+                    image[v, u, :] = w0*colors[tri[0], :] + w1*colors[tri[1], :] + w2*colors[tri[2], :]
+    return image
+
+
+def render_texture(vertices, triangles, texture, tex_coords, tex_triangles, h, w, c = 3, mapping_type = 'nearest'):
+    ''' render mesh with texture map
+    Args:
+        vertices: [nver], 3
+        triangles: [ntri, 3]
+        texture: [tex_h, tex_w, 3]
+        tex_coords: [ntexcoords, 3]
+        tex_triangles: [ntri, 3]
+        h: height of rendering
+        w: width of rendering
+        c: channel
+        mapping_type: 'bilinear' or 'nearest'
+    '''
+    assert triangles.shape[0] == tex_triangles.shape[0]
+    tex_h, tex_w, _ = texture.shape
+
+    # initial 
+    image = np.zeros((h, w, c))
+    depth_buffer = np.zeros([h, w]) - 999999.
+
+    for i in range(triangles.shape[0]):
+        tri = triangles[i, :] # 3 vertex indices
+        tex_tri = tex_triangles[i, :] # 3 tex indice
+
+        # the inner bounding box
+        umin = max(int(np.ceil(np.min(vertices[tri, 0]))), 0)
+        umax = min(int(np.floor(np.max(vertices[tri, 0]))), w-1)
+
+        vmin = max(int(np.ceil(np.min(vertices[tri, 1]))), 0)
+        vmax = min(int(np.floor(np.max(vertices[tri, 1]))), h-1)
+
+        if umax<umin or vmax<vmin:
+            continue
+
+        for u in range(umin, umax+1):
+            for v in range(vmin, vmax+1):
+                if not isPointInTri([u,v], vertices[tri, :2]): 
+                    continue
+                w0, w1, w2 = get_point_weight([u, v], vertices[tri, :2])
+                point_depth = w0*vertices[tri[0], 2] + w1*vertices[tri[1], 2] + w2*vertices[tri[2], 2]
+                if point_depth > depth_buffer[v, u]:
+                    # update depth
+                    depth_buffer[v, u] = point_depth    
+                    
+                    # tex coord
+                    tex_xy = w0*tex_coords[tex_tri[0], :] + w1*tex_coords[tex_tri[1], :] + w2*tex_coords[tex_tri[2], :]
+                    tex_xy[0] = max(min(tex_xy[0], float(tex_w - 1)), 0.0); 
+                    tex_xy[1] = max(min(tex_xy[1], float(tex_h - 1)), 0.0); 
+
+                    # nearest
+                    if mapping_type == 'nearest':
+                        tex_xy = np.round(tex_xy).astype(np.int32)
+                        tex_value = texture[tex_xy[1], tex_xy[0], :] 
+
+                    # bilinear
+                    elif mapping_type == 'bilinear':
+                        # next 4 pixels
+                        ul = texture[int(np.floor(tex_xy[1])), int(np.floor(tex_xy[0])), :]
+                        ur = texture[int(np.floor(tex_xy[1])), int(np.ceil(tex_xy[0])), :]
+                        dl = texture[int(np.ceil(tex_xy[1])), int(np.floor(tex_xy[0])), :]
+                        dr = texture[int(np.ceil(tex_xy[1])), int(np.ceil(tex_xy[0])), :]
+
+                        yd = tex_xy[1] - np.floor(tex_xy[1])
+                        xd = tex_xy[0] - np.floor(tex_xy[0])
+                        tex_value = ul*(1-xd)*(1-yd) + ur*xd*(1-yd) + dl*(1-xd)*yd + dr*xd*yd
+
+                    image[v, u, :] = tex_value
+    return image
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/transform.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/transform.py
@ -0,0 +1,385 @@
+'''
+Functions about transforming mesh(changing the position: modify vertices).
+1. forward: transform(transform, camera, project).
+2. backward: estimate transform matrix from correspondences.
+
+Preparation knowledge:
+transform&camera model:
+https://cs184.eecs.berkeley.edu/lecture/transforms-2
+Part I: camera geometry and single view geometry in MVGCV
+'''
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import math
+from math import cos, sin
+
+def angle2matrix(angles):
+    ''' get rotation matrix from three rotation angles(degree). right-handed.
+    Args:
+        angles: [3,]. x, y, z angles
+        x: pitch. positive for looking down.
+        y: yaw. positive for looking left. 
+        z: roll. positive for tilting head right. 
+    Returns:
+        R: [3, 3]. rotation matrix.
+    '''
+    x, y, z = np.deg2rad(angles[0]), np.deg2rad(angles[1]), np.deg2rad(angles[2])
+    # x
+    Rx=np.array([[1,      0,       0],
+                 [0, cos(x),  -sin(x)],
+                 [0, sin(x),   cos(x)]])
+    # y
+    Ry=np.array([[ cos(y), 0, sin(y)],
+                 [      0, 1,      0],
+                 [-sin(y), 0, cos(y)]])
+    # z
+    Rz=np.array([[cos(z), -sin(z), 0],
+                 [sin(z),  cos(z), 0],
+                 [     0,       0, 1]])
+    
+    R=Rz.dot(Ry.dot(Rx))
+    return R.astype(np.float32)
+
+def angle2matrix_3ddfa(angles):
+    ''' get rotation matrix from three rotation angles(radian). The same as in 3DDFA.
+    Args:
+        angles: [3,]. x, y, z angles
+        x: pitch.
+        y: yaw. 
+        z: roll. 
+    Returns:
+        R: 3x3. rotation matrix.
+    '''
+    # x, y, z = np.deg2rad(angles[0]), np.deg2rad(angles[1]), np.deg2rad(angles[2])
+    x, y, z = angles[0], angles[1], angles[2]
+    
+    # x
+    Rx=np.array([[1,      0,       0],
+                 [0, cos(x),  sin(x)],
+                 [0, -sin(x),   cos(x)]])
+    # y
+    Ry=np.array([[ cos(y), 0, -sin(y)],
+                 [      0, 1,      0],
+                 [sin(y), 0, cos(y)]])
+    # z
+    Rz=np.array([[cos(z), sin(z), 0],
+                 [-sin(z),  cos(z), 0],
+                 [     0,       0, 1]])
+    R = Rx.dot(Ry).dot(Rz)
+    return R.astype(np.float32)
+
+
+## ------------------------------------------ 1. transform(transform, project, camera).
+## ---------- 3d-3d transform. Transform obj in world space
+def rotate(vertices, angles):
+    ''' rotate vertices. 
+    X_new = R.dot(X). X: 3 x 1   
+    Args:
+        vertices: [nver, 3]. 
+        rx, ry, rz: degree angles
+        rx: pitch. positive for looking down 
+        ry: yaw. positive for looking left
+        rz: roll. positive for tilting head right
+    Returns:
+        rotated vertices: [nver, 3]
+    '''
+    R = angle2matrix(angles)
+    rotated_vertices = vertices.dot(R.T)
+
+    return rotated_vertices
+
+def similarity_transform(vertices, s, R, t3d):
+    ''' similarity transform. dof = 7.
+    3D: s*R.dot(X) + t
+    Homo: M = [[sR, t],[0^T, 1]].  M.dot(X)
+    Args:(float32)
+        vertices: [nver, 3]. 
+        s: [1,]. scale factor.
+        R: [3,3]. rotation matrix.
+        t3d: [3,]. 3d translation vector.
+    Returns:
+        transformed vertices: [nver, 3]
+    '''
+    t3d = np.squeeze(np.array(t3d, dtype = np.float32))
+    transformed_vertices = s * vertices.dot(R.T) + t3d[np.newaxis, :]
+
+    return transformed_vertices
+
+
+## -------------- Camera. from world space to camera space
+# Ref: https://cs184.eecs.berkeley.edu/lecture/transforms-2
+def normalize(x):
+    epsilon = 1e-12
+    norm = np.sqrt(np.sum(x**2, axis = 0))
+    norm = np.maximum(norm, epsilon)
+    return x/norm
+
+def lookat_camera(vertices, eye, at = None, up = None):
+    """ 'look at' transformation: from world space to camera space
+    standard camera space: 
+        camera located at the origin. 
+        looking down negative z-axis. 
+        vertical vector is y-axis.
+    Xcam = R(X - C)
+    Homo: [[R, -RC], [0, 1]]
+    Args:
+      vertices: [nver, 3] 
+      eye: [3,] the XYZ world space position of the camera.
+      at: [3,] a position along the center of the camera's gaze.
+      up: [3,] up direction 
+    Returns:
+      transformed_vertices: [nver, 3]
+    """
+    if at is None:
+      at = np.array([0, 0, 0], np.float32)
+    if up is None:
+      up = np.array([0, 1, 0], np.float32)
+
+    eye = np.array(eye).astype(np.float32)
+    at = np.array(at).astype(np.float32)
+    z_aixs = -normalize(at - eye) # look forward
+    x_aixs = normalize(np.cross(up, z_aixs)) # look right
+    y_axis = np.cross(z_aixs, x_aixs) # look up
+
+    R = np.stack((x_aixs, y_axis, z_aixs))#, axis = 0) # 3 x 3
+    transformed_vertices = vertices - eye # translation
+    transformed_vertices = transformed_vertices.dot(R.T) # rotation
+    return transformed_vertices
+
+## --------- 3d-2d project. from camera space to image plane
+# generally, image plane only keeps x,y channels, here reserve z channel for calculating z-buffer.
+def orthographic_project(vertices):
+    ''' scaled orthographic projection(just delete z)
+        assumes: variations in depth over the object is small relative to the mean distance from camera to object
+        x -> x*f/z, y -> x*f/z, z -> f.
+        for point i,j. zi~=zj. so just delete z
+        ** often used in face
+        Homo: P = [[1,0,0,0], [0,1,0,0], [0,0,1,0]]
+    Args:
+        vertices: [nver, 3]
+    Returns:
+        projected_vertices: [nver, 3] if isKeepZ=True. [nver, 2] if isKeepZ=False.
+    '''
+    return vertices.copy()
+
+def perspective_project(vertices, fovy, aspect_ratio = 1., near = 0.1, far = 1000.):
+    ''' perspective projection.
+    Args:
+        vertices: [nver, 3]
+        fovy: vertical angular field of view. degree.
+        aspect_ratio : width / height of field of view
+        near : depth of near clipping plane
+        far : depth of far clipping plane
+    Returns:
+        projected_vertices: [nver, 3] 
+    '''
+    fovy = np.deg2rad(fovy)
+    top = near*np.tan(fovy)
+    bottom = -top 
+    right = top*aspect_ratio
+    left = -right
+
+    #-- homo
+    P = np.array([[near/right, 0, 0, 0],
+                 [0, near/top, 0, 0],
+                 [0, 0, -(far+near)/(far-near), -2*far*near/(far-near)],
+                 [0, 0, -1, 0]])
+    vertices_homo = np.hstack((vertices, np.ones((vertices.shape[0], 1)))) # [nver, 4]
+    projected_vertices = vertices_homo.dot(P.T)
+    projected_vertices = projected_vertices/projected_vertices[:,3:]
+    projected_vertices = projected_vertices[:,:3]
+    projected_vertices[:,2] = -projected_vertices[:,2]
+
+    #-- non homo. only fovy
+    # projected_vertices = vertices.copy()
+    # projected_vertices[:,0] = -(near/right)*vertices[:,0]/vertices[:,2]
+    # projected_vertices[:,1] = -(near/top)*vertices[:,1]/vertices[:,2]
+    return projected_vertices
+
+
+def to_image(vertices, h, w, is_perspective = False):
+    ''' change vertices to image coord system
+    3d system: XYZ, center(0, 0, 0)
+    2d image: x(u), y(v). center(w/2, h/2), flip y-axis. 
+    Args:
+        vertices: [nver, 3]
+        h: height of the rendering
+        w : width of the rendering
+    Returns:
+        projected_vertices: [nver, 3]  
+    '''
+    image_vertices = vertices.copy()
+    if is_perspective:
+        # if perspective, the projected vertices are normalized to [-1, 1]. so change it to image size first.
+        image_vertices[:,0] = image_vertices[:,0]*w/2
+        image_vertices[:,1] = image_vertices[:,1]*h/2
+    # move to center of image
+    image_vertices[:,0] = image_vertices[:,0] + w/2
+    image_vertices[:,1] = image_vertices[:,1] + h/2
+    # flip vertices along y-axis.
+    image_vertices[:,1] = h - image_vertices[:,1] - 1
+    return image_vertices
+
+
+#### -------------------------------------------2. estimate transform matrix from correspondences.
+def estimate_affine_matrix_3d23d(X, Y):
+    ''' Using least-squares solution 
+    Args:
+        X: [n, 3]. 3d points(fixed)
+        Y: [n, 3]. corresponding 3d points(moving). Y = PX
+    Returns:
+        P_Affine: (3, 4). Affine camera matrix (the third row is [0, 0, 0, 1]).
+    '''
+    X_homo = np.hstack((X, np.ones([X.shape[1],1]))) #n x 4
+    P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
+    return P
+    
+def estimate_affine_matrix_3d22d(X, x):
+    ''' Using Golden Standard Algorithm for estimating an affine camera
+        matrix P from world to image correspondences.
+        See Alg.7.2. in MVGCV 
+        Code Ref: https://github.com/patrikhuber/eos/blob/master/include/eos/fitting/affine_camera_estimation.hpp
+        x_homo = X_homo.dot(P_Affine)
+    Args:
+        X: [n, 3]. corresponding 3d points(fixed)
+        x: [n, 2]. n>=4. 2d points(moving). x = PX
+    Returns:
+        P_Affine: [3, 4]. Affine camera matrix
+    '''
+    X = X.T; x = x.T
+    assert(x.shape[1] == X.shape[1])
+    n = x.shape[1]
+    assert(n >= 4)
+
+    #--- 1. normalization
+    # 2d points
+    mean = np.mean(x, 1) # (2,)
+    x = x - np.tile(mean[:, np.newaxis], [1, n])
+    average_norm = np.mean(np.sqrt(np.sum(x**2, 0)))
+    scale = np.sqrt(2) / average_norm
+    x = scale * x
+
+    T = np.zeros((3,3), dtype = np.float32)
+    T[0, 0] = T[1, 1] = scale
+    T[:2, 2] = -mean*scale
+    T[2, 2] = 1
+
+    # 3d points
+    X_homo = np.vstack((X, np.ones((1, n))))
+    mean = np.mean(X, 1) # (3,)
+    X = X - np.tile(mean[:, np.newaxis], [1, n])
+    m = X_homo[:3,:] - X
+    average_norm = np.mean(np.sqrt(np.sum(X**2, 0)))
+    scale = np.sqrt(3) / average_norm
+    X = scale * X
+
+    U = np.zeros((4,4), dtype = np.float32)
+    U[0, 0] = U[1, 1] = U[2, 2] = scale
+    U[:3, 3] = -mean*scale
+    U[3, 3] = 1
+
+    # --- 2. equations
+    A = np.zeros((n*2, 8), dtype = np.float32);
+    X_homo = np.vstack((X, np.ones((1, n)))).T
+    A[:n, :4] = X_homo
+    A[n:, 4:] = X_homo
+    b = np.reshape(x, [-1, 1])
+ 
+    # --- 3. solution
+    p_8 = np.linalg.pinv(A).dot(b)
+    P = np.zeros((3, 4), dtype = np.float32)
+    P[0, :] = p_8[:4, 0]
+    P[1, :] = p_8[4:, 0]
+    P[-1, -1] = 1
+
+    # --- 4. denormalization
+    P_Affine = np.linalg.inv(T).dot(P.dot(U))
+    return P_Affine
+
+def P2sRt(P):
+    ''' decompositing camera matrix P
+    Args: 
+        P: (3, 4). Affine Camera Matrix.
+    Returns:
+        s: scale factor.
+        R: (3, 3). rotation matrix.
+        t: (3,). translation. 
+    '''
+    t = P[:, 3]
+    R1 = P[0:1, :3]
+    R2 = P[1:2, :3]
+    s = (np.linalg.norm(R1) + np.linalg.norm(R2))/2.0
+    r1 = R1/np.linalg.norm(R1)
+    r2 = R2/np.linalg.norm(R2)
+    r3 = np.cross(r1, r2)
+
+    R = np.concatenate((r1, r2, r3), 0)
+    return s, R, t
+
+#Ref: https://www.learnopencv.com/rotation-matrix-to-euler-angles/
+def isRotationMatrix(R):
+    ''' checks if a matrix is a valid rotation matrix(whether orthogonal or not)
+    '''
+    Rt = np.transpose(R)
+    shouldBeIdentity = np.dot(Rt, R)
+    I = np.identity(3, dtype = R.dtype)
+    n = np.linalg.norm(I - shouldBeIdentity)
+    return n < 1e-6
+
+def matrix2angle(R):
+    ''' get three Euler angles from Rotation Matrix
+    Args:
+        R: (3,3). rotation matrix
+    Returns:
+        x: pitch
+        y: yaw
+        z: roll
+    '''
+    assert(isRotationMatrix)
+    sy = math.sqrt(R[0,0] * R[0,0] +  R[1,0] * R[1,0])
+     
+    singular = sy < 1e-6
+ 
+    if  not singular :
+        x = math.atan2(R[2,1] , R[2,2])
+        y = math.atan2(-R[2,0], sy)
+        z = math.atan2(R[1,0], R[0,0])
+    else :
+        x = math.atan2(-R[1,2], R[1,1])
+        y = math.atan2(-R[2,0], sy)
+        z = 0
+
+    # rx, ry, rz = np.rad2deg(x), np.rad2deg(y), np.rad2deg(z)
+    rx, ry, rz = x*180/np.pi, y*180/np.pi, z*180/np.pi
+    return rx, ry, rz
+
+# def matrix2angle(R):
+#     ''' compute three Euler angles from a Rotation Matrix. Ref: http://www.gregslabaugh.net/publications/euler.pdf
+#     Args:
+#         R: (3,3). rotation matrix
+#     Returns:
+#         x: yaw
+#         y: pitch
+#         z: roll
+#     '''
+#     # assert(isRotationMatrix(R))
+
+#     if R[2,0] !=1 or R[2,0] != -1:
+#         x = math.asin(R[2,0])
+#         y = math.atan2(R[2,1]/cos(x), R[2,2]/cos(x))
+#         z = math.atan2(R[1,0]/cos(x), R[0,0]/cos(x))
+        
+#     else:# Gimbal lock
+#         z = 0 #can be anything
+#         if R[2,0] == -1:
+#             x = np.pi/2
+#             y = z + math.atan2(R[0,1], R[0,2])
+#         else:
+#             x = -np.pi/2
+#             y = -z + math.atan2(-R[0,1], -R[0,2])
+
+#     return x, y, z
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/vis.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/mesh_numpy/vis.py
@ -0,0 +1,24 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import matplotlib.pyplot as plt
+from skimage import measure
+from mpl_toolkits.mplot3d import Axes3D
+
+def plot_mesh(vertices, triangles, subplot = [1,1,1], title = 'mesh', el = 90, az = -90, lwdt=.1, dist = 6, color = "grey"):
+	'''
+	plot the mesh 
+	Args:
+		vertices: [nver, 3]
+		triangles: [ntri, 3]
+	'''
+	ax = plt.subplot(subplot[0], subplot[1], subplot[2], projection = '3d')
+	ax.plot_trisurf(vertices[:, 0], vertices[:, 1], vertices[:, 2], triangles = triangles, lw = lwdt, color = color, alpha = 1)
+	ax.axis("off")
+	ax.view_init(elev = el, azim = az)
+	ax.dist = dist
+	plt.title(title)
+
+### -------------- Todo: use vtk to visualize mesh? or visvis? or VisPy?
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/morphable_model/init.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/morphable_model/init.py
@ -0,0 +1,7 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from .. import mesh
+from .morphabel_model import MorphabelModel
+from . import load
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/morphable_model/fit.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/morphable_model/fit.py
@ -0,0 +1,272 @@
+'''
+Estimating parameters about vertices: shape para, exp para, pose para(s, R, t)
+'''
+import numpy as np
+from .. import mesh
+
+''' TODO: a clear document. 
+Given: image_points, 3D Model, Camera Matrix(s, R, t2d)
+Estimate: shape parameters, expression parameters
+
+Inference: 
+
+    projected_vertices = s*P*R(mu + shape + exp) + t2d  --> image_points
+    s*P*R*shape + s*P*R(mu + exp) + t2d --> image_poitns
+
+    # Define:
+    X = vertices
+    x_hat = projected_vertices
+    x = image_points
+    A = s*P*R
+    b = s*P*R(mu + exp) + t2d
+    ==>
+    x_hat = A*shape + b  (2 x n)
+
+    A*shape (2 x n)
+    shape = reshape(shapePC * sp) (3 x n)
+    shapePC*sp : (3n x 1)
+
+    * flatten:
+    x_hat_flatten = A*shape + b_flatten  (2n x 1)
+    A*shape (2n x 1)
+    --> A*shapePC (2n x 199)  sp: 199 x 1
+    
+    # Define:
+    pc_2d = A* reshape(shapePC)
+    pc_2d_flatten = flatten(pc_2d) (2n x 199)
+
+    =====>
+    x_hat_flatten = pc_2d_flatten * sp + b_flatten ---> x_flatten (2n x 1)
+
+    Goals:
+    (ignore flatten, pc_2d-->pc)
+    min E = || x_hat - x || + lambda*sum(sp/sigma)^2
+          = || pc * sp + b - x || + lambda*sum(sp/sigma)^2
+
+    Solve:
+    d(E)/d(sp) = 0
+    2 * pc' * (pc * sp + b - x) + 2 * lambda * sp / (sigma' * sigma) = 0
+
+    Get:
+    (pc' * pc + lambda / (sigma'* sigma)) * sp  = pc' * (x - b)
+
+'''
+
+def estimate_shape(x, shapeMU, shapePC, shapeEV, expression, s, R, t2d, lamb = 3000):
+    '''
+    Args:
+        x: (2, n). image points (to be fitted)
+        shapeMU: (3n, 1)
+        shapePC: (3n, n_sp)
+        shapeEV: (n_sp, 1)
+        expression: (3, n)
+        s: scale
+        R: (3, 3). rotation matrix
+        t2d: (2,). 2d translation
+        lambda: regulation coefficient
+
+    Returns:
+        shape_para: (n_sp, 1) shape parameters(coefficients)
+    '''
+    x = x.copy()
+    assert(shapeMU.shape[0] == shapePC.shape[0])
+    assert(shapeMU.shape[0] == x.shape[1]*3)
+
+    dof = shapePC.shape[1]
+
+    n = x.shape[1]
+    sigma = shapeEV
+    t2d = np.array(t2d)
+    P = np.array([[1, 0, 0], [0, 1, 0]], dtype = np.float32)
+    A = s*P.dot(R)
+
+    # --- calc pc
+    pc_3d = np.resize(shapePC.T, [dof, n, 3]) # 199 x n x 3
+    pc_3d = np.reshape(pc_3d, [dof*n, 3]) 
+    pc_2d = pc_3d.dot(A.T.copy()) # 199 x n x 2
+    
+    pc = np.reshape(pc_2d, [dof, -1]).T # 2n x 199
+
+    # --- calc b
+    # shapeMU
+    mu_3d = np.resize(shapeMU, [n, 3]).T # 3 x n
+    # expression
+    exp_3d = expression
+    # 
+    b = A.dot(mu_3d + exp_3d) + np.tile(t2d[:, np.newaxis], [1, n]) # 2 x n
+    b = np.reshape(b.T, [-1, 1]) # 2n x 1
+
+    # --- solve
+    equation_left = np.dot(pc.T, pc) + lamb * np.diagflat(1/sigma**2)
+    x = np.reshape(x.T, [-1, 1])
+    equation_right = np.dot(pc.T, x - b)
+
+    shape_para = np.dot(np.linalg.inv(equation_left), equation_right)
+
+    return shape_para
+
+def estimate_expression(x, shapeMU, expPC, expEV, shape, s, R, t2d, lamb = 2000):
+    '''
+    Args:
+        x: (2, n). image points (to be fitted)
+        shapeMU: (3n, 1)
+        expPC: (3n, n_ep)
+        expEV: (n_ep, 1)
+        shape: (3, n)
+        s: scale
+        R: (3, 3). rotation matrix
+        t2d: (2,). 2d translation
+        lambda: regulation coefficient
+
+    Returns:
+        exp_para: (n_ep, 1) shape parameters(coefficients)
+    '''
+    x = x.copy()
+    assert(shapeMU.shape[0] == expPC.shape[0])
+    assert(shapeMU.shape[0] == x.shape[1]*3)
+
+    dof = expPC.shape[1]
+
+    n = x.shape[1]
+    sigma = expEV
+    t2d = np.array(t2d)
+    P = np.array([[1, 0, 0], [0, 1, 0]], dtype = np.float32)
+    A = s*P.dot(R)
+
+    # --- calc pc
+    pc_3d = np.resize(expPC.T, [dof, n, 3]) 
+    pc_3d = np.reshape(pc_3d, [dof*n, 3]) 
+    pc_2d = pc_3d.dot(A.T) 
+    pc = np.reshape(pc_2d, [dof, -1]).T # 2n x 29
+
+    # --- calc b
+    # shapeMU
+    mu_3d = np.resize(shapeMU, [n, 3]).T # 3 x n
+    # expression
+    shape_3d = shape
+    # 
+    b = A.dot(mu_3d + shape_3d) + np.tile(t2d[:, np.newaxis], [1, n]) # 2 x n
+    b = np.reshape(b.T, [-1, 1]) # 2n x 1
+
+    # --- solve
+    equation_left = np.dot(pc.T, pc) + lamb * np.diagflat(1/sigma**2)
+    x = np.reshape(x.T, [-1, 1])
+    equation_right = np.dot(pc.T, x - b)
+
+    exp_para = np.dot(np.linalg.inv(equation_left), equation_right)
+    
+    return exp_para
+
+
+# ---------------- fit 
+def fit_points(x, X_ind, model, n_sp, n_ep, max_iter = 4):
+    '''
+    Args:
+        x: (n, 2) image points
+        X_ind: (n,) corresponding Model vertex indices
+        model: 3DMM
+        max_iter: iteration
+    Returns:
+        sp: (n_sp, 1). shape parameters
+        ep: (n_ep, 1). exp parameters
+        s, R, t
+    '''
+    x = x.copy().T
+
+    #-- init
+    sp = np.zeros((n_sp, 1), dtype = np.float32)
+    ep = np.zeros((n_ep, 1), dtype = np.float32)
+
+    #-------------------- estimate
+    X_ind_all = np.tile(X_ind[np.newaxis, :], [3, 1])*3
+    X_ind_all[1, :] += 1
+    X_ind_all[2, :] += 2
+    valid_ind = X_ind_all.flatten('F')
+
+    shapeMU = model['shapeMU'][valid_ind, :]
+    shapePC = model['shapePC'][valid_ind, :n_sp]
+    expPC = model['expPC'][valid_ind, :n_ep]
+
+    for i in range(max_iter):
+        X = shapeMU + shapePC.dot(sp) + expPC.dot(ep)
+        X = np.reshape(X, [int(len(X)/3), 3]).T
+        
+        #----- estimate pose
+        P = mesh.transform.estimate_affine_matrix_3d22d(X.T, x.T)
+        s, R, t = mesh.transform.P2sRt(P)
+        rx, ry, rz = mesh.transform.matrix2angle(R)
+        #print('Iter:{}; estimated pose: s {}, rx {}, ry {}, rz {}, t1 {}, t2 {}'.format(i, s, rx, ry, rz, t[0], t[1]))
+
+        #----- estimate shape
+        # expression
+        shape = shapePC.dot(sp)
+        shape = np.reshape(shape, [int(len(shape)/3), 3]).T
+        ep = estimate_expression(x, shapeMU, expPC, model['expEV'][:n_ep,:], shape, s, R, t[:2], lamb = 20)
+
+        # shape
+        expression = expPC.dot(ep)
+        expression = np.reshape(expression, [int(len(expression)/3), 3]).T
+        if i == 0 :
+            sp = estimate_shape(x, shapeMU, shapePC, model['shapeEV'][:n_sp,:], expression, s, R, t[:2], lamb = 40)
+
+    return sp, ep, s, R, t
+
+
+# ---------------- fitting process
+def fit_points_for_show(x, X_ind, model, n_sp, n_ep, max_iter = 4):
+    '''
+    Args:
+        x: (n, 2) image points
+        X_ind: (n,) corresponding Model vertex indices
+        model: 3DMM
+        max_iter: iteration
+    Returns:
+        sp: (n_sp, 1). shape parameters
+        ep: (n_ep, 1). exp parameters
+        s, R, t
+    '''
+    x = x.copy().T
+
+    #-- init
+    sp = np.zeros((n_sp, 1), dtype = np.float32)
+    ep = np.zeros((n_ep, 1), dtype = np.float32)
+
+    #-------------------- estimate
+    X_ind_all = np.tile(X_ind[np.newaxis, :], [3, 1])*3
+    X_ind_all[1, :] += 1
+    X_ind_all[2, :] += 2
+    valid_ind = X_ind_all.flatten('F')
+
+    shapeMU = model['shapeMU'][valid_ind, :]
+    shapePC = model['shapePC'][valid_ind, :n_sp]
+    expPC = model['expPC'][valid_ind, :n_ep]
+
+    s = 4e-04
+    R = mesh.transform.angle2matrix([0, 0, 0])
+    t = [0, 0, 0]
+    lsp = []; lep = []; ls = []; lR = []; lt = []
+    for i in range(max_iter):
+        X = shapeMU + shapePC.dot(sp) + expPC.dot(ep)
+        X = np.reshape(X, [int(len(X)/3), 3]).T
+        lsp.append(sp); lep.append(ep); ls.append(s), lR.append(R), lt.append(t)
+        
+        #----- estimate pose
+        P = mesh.transform.estimate_affine_matrix_3d22d(X.T, x.T)
+        s, R, t = mesh.transform.P2sRt(P)
+        lsp.append(sp); lep.append(ep); ls.append(s), lR.append(R), lt.append(t)
+
+        #----- estimate shape
+        # expression
+        shape = shapePC.dot(sp)
+        shape = np.reshape(shape, [int(len(shape)/3), 3]).T
+        ep = estimate_expression(x, shapeMU, expPC, model['expEV'][:n_ep,:], shape, s, R, t[:2], lamb = 20)
+        lsp.append(sp); lep.append(ep); ls.append(s), lR.append(R), lt.append(t)
+
+        # shape
+        expression = expPC.dot(ep)
+        expression = np.reshape(expression, [int(len(expression)/3), 3]).T
+        sp = estimate_shape(x, shapeMU, shapePC, model['shapeEV'][:n_sp,:], expression, s, R, t[:2], lamb = 40)
+
+    # print('ls', ls)
+    # print('lR', lR)
+    return np.array(lsp), np.array(lep), np.array(ls), np.array(lR), np.array(lt)
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/morphable_model/load.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/morphable_model/load.py
@ -0,0 +1,110 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import scipy.io as sio
+
+### ---------------------------------  load BFM data
+def load_BFM(model_path):
+    ''' load BFM 3DMM model
+    Args:
+        model_path: path to BFM model. 
+    Returns:
+        model: (nver = 53215, ntri = 105840). nver: number of vertices. ntri: number of triangles.
+            'shapeMU': [3*nver, 1]
+            'shapePC': [3*nver, 199]
+            'shapeEV': [199, 1]
+            'expMU': [3*nver, 1]
+            'expPC': [3*nver, 29]
+            'expEV': [29, 1]
+            'texMU': [3*nver, 1]
+            'texPC': [3*nver, 199]
+            'texEV': [199, 1]
+            'tri': [ntri, 3] (start from 1, should sub 1 in python and c++)
+            'tri_mouth': [114, 3] (start from 1, as a supplement to mouth triangles)
+            'kpt_ind': [68,] (start from 1)
+    PS:
+        You can change codes according to your own saved data.
+        Just make sure the model has corresponding attributes.
+    '''
+    C = sio.loadmat(model_path)
+    model = C['model']
+    model = model[0,0]
+
+    # change dtype from double(np.float64) to np.float32, 
+    # since big matrix process(espetially matrix dot) is too slow in python.
+    model['shapeMU'] = (model['shapeMU'] + model['expMU']).astype(np.float32)
+    model['shapePC'] = model['shapePC'].astype(np.float32)
+    model['shapeEV'] = model['shapeEV'].astype(np.float32)
+    model['expEV'] = model['expEV'].astype(np.float32)
+    model['expPC'] = model['expPC'].astype(np.float32)
+
+    # matlab start with 1. change to 0 in python.
+    model['tri'] = model['tri'].T.copy(order = 'C').astype(np.int32) - 1
+    model['tri_mouth'] = model['tri_mouth'].T.copy(order = 'C').astype(np.int32) - 1
+    
+    # kpt ind
+    model['kpt_ind'] = (np.squeeze(model['kpt_ind']) - 1).astype(np.int32)
+
+    return model
+
+def load_BFM_info(path = 'BFM_info.mat'):
+    ''' load 3DMM model extra information
+    Args:
+        path: path to BFM info. 
+    Returns:  
+        model_info:
+            'symlist': 2 x 26720
+            'symlist_tri': 2 x 52937
+            'segbin': 4 x n (0: nose, 1: eye, 2: mouth, 3: cheek)
+            'segbin_tri': 4 x ntri 
+            'face_contour': 1 x 28
+            'face_contour_line': 1 x 512
+            'face_contour_front': 1 x 28
+            'face_contour_front_line': 1 x 512
+            'nose_hole': 1 x 142
+            'nose_hole_right': 1 x 71
+            'nose_hole_left': 1 x 71
+            'parallel': 17 x 1 cell
+            'parallel_face_contour': 28 x 1 cell
+            'uv_coords': n x 2
+    '''
+    C = sio.loadmat(path)
+    model_info = C['model_info']
+    model_info = model_info[0,0]
+    return model_info
+
+def load_uv_coords(path = 'BFM_UV.mat'):
+    ''' load uv coords of BFM
+    Args:
+        path: path to data.
+    Returns:  
+        uv_coords: [nver, 2]. range: 0-1
+    '''
+    C = sio.loadmat(path)
+    uv_coords = C['UV'].copy(order = 'C')
+    return uv_coords
+
+def load_pncc_code(path = 'pncc_code.mat'):
+    ''' load pncc code of BFM
+    PNCC code: Defined in 'Face Alignment Across Large Poses: A 3D Solution Xiangyu'
+    download at http://www.cbsr.ia.ac.cn/users/xiangyuzhu/projects/3DDFA/main.htm.
+    Args:
+        path: path to data.
+    Returns:  
+        pncc_code: [nver, 3]
+    '''
+    C = sio.loadmat(path)
+    pncc_code = C['vertex_code'].T
+    return pncc_code
+
+## 
+def get_organ_ind(model_info):
+    ''' get nose, eye, mouth index
+    '''
+    valid_bin = model_info['segbin'].astype(bool)
+    organ_ind = np.nonzero(valid_bin[0,:])[0]
+    for i in range(1, valid_bin.shape[0] - 1):
+        organ_ind = np.union1d(organ_ind, np.nonzero(valid_bin[i,:])[0])
+    return organ_ind.astype(np.int32)
--- a/lib_unprompted/insightface_cuda/thirdparty/face3d/morphable_model/morphabel_model.py
+++ b/lib_unprompted/insightface_cuda/thirdparty/face3d/morphable_model/morphabel_model.py
@ -0,0 +1,143 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+import scipy.io as sio
+from .. import mesh
+from . import fit
+from . import load
+
+class  MorphabelModel(object):
+    """docstring for  MorphabelModel
+    model: nver: number of vertices. ntri: number of triangles. *: must have. ~: can generate ones array for place holder.
+            'shapeMU': [3*nver, 1]. *
+            'shapePC': [3*nver, n_shape_para]. *
+            'shapeEV': [n_shape_para, 1]. ~
+            'expMU': [3*nver, 1]. ~ 
+            'expPC': [3*nver, n_exp_para]. ~
+            'expEV': [n_exp_para, 1]. ~
+            'texMU': [3*nver, 1]. ~
+            'texPC': [3*nver, n_tex_para]. ~
+            'texEV': [n_tex_para, 1]. ~
+            'tri': [ntri, 3] (start from 1, should sub 1 in python and c++). *
+            'tri_mouth': [114, 3] (start from 1, as a supplement to mouth triangles). ~
+            'kpt_ind': [68,] (start from 1). ~
+    """
+    def __init__(self, model_path, model_type = 'BFM'):
+        super( MorphabelModel, self).__init__()
+        if model_type=='BFM':
+            self.model = load.load_BFM(model_path)
+        else:
+            print('sorry, not support other 3DMM model now')
+            exit()
+            
+        # fixed attributes
+        self.nver = self.model['shapePC'].shape[0]/3
+        self.ntri = self.model['tri'].shape[0]
+        self.n_shape_para = self.model['shapePC'].shape[1]
+        self.n_exp_para = self.model['expPC'].shape[1]
+        self.n_tex_para = self.model['texMU'].shape[1]
+        
+        self.kpt_ind = self.model['kpt_ind']
+        self.triangles = self.model['tri']
+        self.full_triangles = np.vstack((self.model['tri'], self.model['tri_mouth']))
+
+    # ------------------------------------- shape: represented with mesh(vertices & triangles(fixed))
+    def get_shape_para(self, type = 'random'):
+        if type == 'zero':
+            sp = np.random.zeros((self.n_shape_para, 1))
+        elif type == 'random':
+            sp = np.random.rand(self.n_shape_para, 1)*1e04
+        return sp
+
+    def get_exp_para(self, type = 'random'):
+        if type == 'zero':
+            ep = np.zeros((self.n_exp_para, 1))
+        elif type == 'random':
+            ep = -1.5 + 3*np.random.random([self.n_exp_para, 1])
+            ep[6:, 0] = 0
+
+        return ep 
+
+    def generate_vertices(self, shape_para, exp_para):
+        '''
+        Args:
+            shape_para: (n_shape_para, 1)
+            exp_para: (n_exp_para, 1) 
+        Returns:
+            vertices: (nver, 3)
+        '''
+        vertices = self.model['shapeMU'] + self.model['shapePC'].dot(shape_para) + self.model['expPC'].dot(exp_para)
+        vertices = np.reshape(vertices, [int(3), int(len(vertices)/3)], 'F').T
+
+        return vertices
+
+    # -------------------------------------- texture: here represented with rgb value(colors) in vertices.
+    def get_tex_para(self, type = 'random'):
+        if type == 'zero':
+            tp = np.zeros((self.n_tex_para, 1))
+        elif type == 'random':
+            tp = np.random.rand(self.n_tex_para, 1)
+        return tp
+
+    def generate_colors(self, tex_para):
+        '''
+        Args:
+            tex_para: (n_tex_para, 1)
+        Returns:
+            colors: (nver, 3)
+        '''
+        colors = self.model['texMU'] + self.model['texPC'].dot(tex_para*self.model['texEV'])
+        colors = np.reshape(colors, [int(3), int(len(colors)/3)], 'F').T/255.  
+        
+        return colors
+
+
+    # ------------------------------------------- transformation
+    # -------------  transform
+    def rotate(self, vertices, angles):
+        ''' rotate face
+        Args:
+            vertices: [nver, 3]
+            angles: [3] x, y, z rotation angle(degree)
+            x: pitch. positive for looking down 
+            y: yaw. positive for looking left
+            z: roll. positive for tilting head right
+        Returns:
+            vertices: rotated vertices
+        '''
+        return mesh.transform.rotate(vertices, angles)
+
+    def transform(self, vertices, s, angles, t3d):
+        R = mesh.transform.angle2matrix(angles)
+        return mesh.transform.similarity_transform(vertices, s, R, t3d)
+
+    def transform_3ddfa(self, vertices, s, angles, t3d): # only used for processing 300W_LP data
+        R = mesh.transform.angle2matrix_3ddfa(angles)
+        return mesh.transform.similarity_transform(vertices, s, R, t3d)
+
+    # --------------------------------------------------- fitting
+    def fit(self, x, X_ind, max_iter = 4, isShow = False):
+        ''' fit 3dmm & pose parameters
+        Args:
+            x: (n, 2) image points
+            X_ind: (n,) corresponding Model vertex indices
+            max_iter: iteration
+            isShow: whether to reserve middle results for show
+        Returns:
+            fitted_sp: (n_sp, 1). shape parameters
+            fitted_ep: (n_ep, 1). exp parameters
+            s, angles, t
+        '''
+        if isShow:
+            fitted_sp, fitted_ep, s, R, t = fit.fit_points_for_show(x, X_ind, self.model, n_sp = self.n_shape_para, n_ep = self.n_exp_para, max_iter = max_iter)
+            angles = np.zeros((R.shape[0], 3))
+            for i in range(R.shape[0]):
+                angles[i] = mesh.transform.matrix2angle(R[i])
+        else:
+            fitted_sp, fitted_ep, s, R, t = fit.fit_points(x, X_ind, self.model, n_sp = self.n_shape_para, n_ep = self.n_exp_para, max_iter = max_iter)
+            angles = mesh.transform.matrix2angle(R)
+        return fitted_sp, fitted_ep, s, angles, t
+
+
--- a/lib_unprompted/insightface_cuda/utils/init.py
+++ b/lib_unprompted/insightface_cuda/utils/init.py
@ -0,0 +1,18 @@
+from __future__ import absolute_import
+
+#from . import bbox
+#from . import viz
+#from . import random
+#from . import metrics
+#from . import parallel
+
+from .storage import download, ensure_available, download_onnx
+from .filesystem import get_model_dir
+from .filesystem import makedirs, try_import_dali
+from .constant import *
+#from .bbox import bbox_iou
+#from .block import recursive_visit, set_lr_mult, freeze_bn
+#from .lr_scheduler import LRSequential, LRScheduler
+#from .plot_history import TrainingHistory
+#from .export_helper import export_block
+#from .sync_loader_helper import split_data, split_and_load
--- a/lib_unprompted/insightface_cuda/utils/constant.py
+++ b/lib_unprompted/insightface_cuda/utils/constant.py
@ -0,0 +1,3 @@
+
+DEFAULT_MP_NAME = 'buffalo_l'
+
--- a/lib_unprompted/insightface_cuda/utils/download.py
+++ b/lib_unprompted/insightface_cuda/utils/download.py
@ -0,0 +1,95 @@
+"""
+This code file mainly comes from https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/download.py
+"""
+import os
+import hashlib
+import requests
+from tqdm import tqdm
+
+
+def check_sha1(filename, sha1_hash):
+    """Check whether the sha1 hash of the file content matches the expected hash.
+    Parameters
+    ----------
+    filename : str
+        Path to the file.
+    sha1_hash : str
+        Expected sha1 hash in hexadecimal digits.
+    Returns
+    -------
+    bool
+        Whether the file content matches the expected hash.
+    """
+    sha1 = hashlib.sha1()
+    with open(filename, 'rb') as f:
+        while True:
+            data = f.read(1048576)
+            if not data:
+                break
+            sha1.update(data)
+
+    sha1_file = sha1.hexdigest()
+    l = min(len(sha1_file), len(sha1_hash))
+    return sha1.hexdigest()[0:l] == sha1_hash[0:l]
+
+
+def download_file(url, path=None, overwrite=False, sha1_hash=None):
+    """Download an given URL
+    Parameters
+    ----------
+    url : str
+        URL to download
+    path : str, optional
+        Destination path to store downloaded file. By default stores to the
+        current directory with same name as in url.
+    overwrite : bool, optional
+        Whether to overwrite destination file if already exists.
+    sha1_hash : str, optional
+        Expected sha1 hash in hexadecimal digits. Will ignore existing file when hash is specified
+        but doesn't match.
+    Returns
+    -------
+    str
+        The file path of the downloaded file.
+    """
+    if path is None:
+        fname = url.split('/')[-1]
+    else:
+        path = os.path.expanduser(path)
+        if os.path.isdir(path):
+            fname = os.path.join(path, url.split('/')[-1])
+        else:
+            fname = path
+
+    if overwrite or not os.path.exists(fname) or (
+            sha1_hash and not check_sha1(fname, sha1_hash)):
+        dirname = os.path.dirname(os.path.abspath(os.path.expanduser(fname)))
+        if not os.path.exists(dirname):
+            os.makedirs(dirname)
+
+        print('Downloading %s from %s...' % (fname, url))
+        r = requests.get(url, stream=True)
+        if r.status_code != 200:
+            raise RuntimeError("Failed downloading url %s" % url)
+        total_length = r.headers.get('content-length')
+        with open(fname, 'wb') as f:
+            if total_length is None:  # no content length header
+                for chunk in r.iter_content(chunk_size=1024):
+                    if chunk:  # filter out keep-alive new chunks
+                        f.write(chunk)
+            else:
+                total_length = int(total_length)
+                for chunk in tqdm(r.iter_content(chunk_size=1024),
+                                  total=int(total_length / 1024. + 0.5),
+                                  unit='KB',
+                                  unit_scale=False,
+                                  dynamic_ncols=True):
+                    f.write(chunk)
+
+        if sha1_hash and not check_sha1(fname, sha1_hash):
+            raise UserWarning('File {} is downloaded but the content hash does not match. ' \
+                              'The repo may be outdated or download may be incomplete. ' \
+                              'If the "repo_url" is overridden, consider switching to ' \
+                              'the default repo.'.format(fname))
+
+    return fname
--- a/lib_unprompted/insightface_cuda/utils/face_align.py
+++ b/lib_unprompted/insightface_cuda/utils/face_align.py
@ -0,0 +1,103 @@
+import cv2
+import numpy as np
+from skimage import transform as trans
+
+
+arcface_dst = np.array(
+    [[38.2946, 51.6963], [73.5318, 51.5014], [56.0252, 71.7366],
+     [41.5493, 92.3655], [70.7299, 92.2041]],
+    dtype=np.float32)
+
+def estimate_norm(lmk, image_size=112,mode='arcface'):
+    # assert lmk.shape == (5, 2)
+    # assert image_size%112==0 or image_size%128==0
+    if image_size%112==0:
+        ratio = float(image_size)/112.0
+        diff_x = 0
+    else:
+        ratio = float(image_size)/128.0
+        diff_x = 8.0*ratio
+    dst = arcface_dst * ratio
+    dst[:,0] += diff_x
+    tform = trans.SimilarityTransform()
+    tform.estimate(lmk, dst)
+    M = tform.params[0:2, :]
+    return M
+
+def norm_crop(img, landmark, image_size=112, mode='arcface'):
+    M = estimate_norm(landmark, image_size, mode)
+    warped = cv2.warpAffine(img, M, (image_size, image_size), borderValue=0.0)
+    return warped
+
+def norm_crop2(img, landmark, image_size=112, mode='arcface'):
+    M = estimate_norm(landmark, image_size, mode)
+    warped = cv2.warpAffine(img, M, (image_size, image_size), borderValue=0.0)
+    return warped, M
+
+def square_crop(im, S):
+    if im.shape[0] > im.shape[1]:
+        height = S
+        width = int(float(im.shape[1]) / im.shape[0] * S)
+        scale = float(S) / im.shape[0]
+    else:
+        width = S
+        height = int(float(im.shape[0]) / im.shape[1] * S)
+        scale = float(S) / im.shape[1]
+    resized_im = cv2.resize(im, (width, height))
+    det_im = np.zeros((S, S, 3), dtype=np.uint8)
+    det_im[:resized_im.shape[0], :resized_im.shape[1], :] = resized_im
+    return det_im, scale
+
+
+def transform(data, center, output_size, scale, rotation):
+    scale_ratio = scale
+    rot = float(rotation) * np.pi / 180.0
+    #translation = (output_size/2-center[0]*scale_ratio, output_size/2-center[1]*scale_ratio)
+    t1 = trans.SimilarityTransform(scale=scale_ratio)
+    cx = center[0] * scale_ratio
+    cy = center[1] * scale_ratio
+    t2 = trans.SimilarityTransform(translation=(-1 * cx, -1 * cy))
+    t3 = trans.SimilarityTransform(rotation=rot)
+    t4 = trans.SimilarityTransform(translation=(output_size / 2,
+                                                output_size / 2))
+    t = t1 + t2 + t3 + t4
+    M = t.params[0:2]
+    cropped = cv2.warpAffine(data,
+                             M, (output_size, output_size),
+                             borderValue=0.0)
+    return cropped, M
+
+
+def trans_points2d(pts, M):
+    new_pts = np.zeros(shape=pts.shape, dtype=np.float32)
+    for i in range(pts.shape[0]):
+        pt = pts[i]
+        new_pt = np.array([pt[0], pt[1], 1.], dtype=np.float32)
+        new_pt = np.dot(M, new_pt)
+        #print('new_pt', new_pt.shape, new_pt)
+        new_pts[i] = new_pt[0:2]
+
+    return new_pts
+
+
+def trans_points3d(pts, M):
+    scale = np.sqrt(M[0][0] * M[0][0] + M[0][1] * M[0][1])
+    #print(scale)
+    new_pts = np.zeros(shape=pts.shape, dtype=np.float32)
+    for i in range(pts.shape[0]):
+        pt = pts[i]
+        new_pt = np.array([pt[0], pt[1], 1.], dtype=np.float32)
+        new_pt = np.dot(M, new_pt)
+        #print('new_pt', new_pt.shape, new_pt)
+        new_pts[i][0:2] = new_pt[0:2]
+        new_pts[i][2] = pts[i][2] * scale
+
+    return new_pts
+
+
+def trans_points(pts, M):
+    if pts.shape[1] == 2:
+        return trans_points2d(pts, M)
+    else:
+        return trans_points3d(pts, M)
+
--- a/lib_unprompted/insightface_cuda/utils/filesystem.py
+++ b/lib_unprompted/insightface_cuda/utils/filesystem.py
@ -0,0 +1,157 @@
+"""
+This code file mainly comes from https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/filesystem.py
+"""
+import os
+import os.path as osp
+import errno
+
+
+def get_model_dir(name, root='~/.insightface'):
+    root = os.path.expanduser(root)
+    model_dir = osp.join(root, 'models', name)
+    return model_dir
+
+def makedirs(path):
+    """Create directory recursively if not exists.
+    Similar to `makedir -p`, you can skip checking existence before this function.
+
+    Parameters
+    ----------
+    path : str
+        Path of the desired dir
+    """
+    try:
+        os.makedirs(path)
+    except OSError as exc:
+        if exc.errno != errno.EEXIST:
+            raise
+
+
+def try_import(package, message=None):
+    """Try import specified package, with custom message support.
+
+    Parameters
+    ----------
+    package : str
+        The name of the targeting package.
+    message : str, default is None
+        If not None, this function will raise customized error message when import error is found.
+
+
+    Returns
+    -------
+    module if found, raise ImportError otherwise
+
+    """
+    try:
+        return __import__(package)
+    except ImportError as e:
+        if not message:
+            raise e
+        raise ImportError(message)
+
+
+def try_import_cv2():
+    """Try import cv2 at runtime.
+
+    Returns
+    -------
+    cv2 module if found. Raise ImportError otherwise
+
+    """
+    msg = "cv2 is required, you can install by package manager, e.g. 'apt-get', \
+        or `pip install opencv-python --user` (note that this is unofficial PYPI package)."
+
+    return try_import('cv2', msg)
+
+
+def try_import_mmcv():
+    """Try import mmcv at runtime.
+
+    Returns
+    -------
+    mmcv module if found. Raise ImportError otherwise
+
+    """
+    msg = "mmcv is required, you can install by first `pip install Cython --user` \
+        and then `pip install mmcv --user` (note that this is unofficial PYPI package)."
+
+    return try_import('mmcv', msg)
+
+
+def try_import_rarfile():
+    """Try import rarfile at runtime.
+
+    Returns
+    -------
+    rarfile module if found. Raise ImportError otherwise
+
+    """
+    msg = "rarfile is required, you can install by first `sudo apt-get install unrar` \
+        and then `pip install rarfile --user` (note that this is unofficial PYPI package)."
+
+    return try_import('rarfile', msg)
+
+
+def import_try_install(package, extern_url=None):
+    """Try import the specified package.
+    If the package not installed, try use pip to install and import if success.
+
+    Parameters
+    ----------
+    package : str
+        The name of the package trying to import.
+    extern_url : str or None, optional
+        The external url if package is not hosted on PyPI.
+        For example, you can install a package using:
+         "pip install git+http://github.com/user/repo/tarball/master/egginfo=xxx".
+        In this case, you can pass the url to the extern_url.
+
+    Returns
+    -------
+    <class 'Module'>
+        The imported python module.
+
+    """
+    try:
+        return __import__(package)
+    except ImportError:
+        try:
+            from pip import main as pipmain
+        except ImportError:
+            from pip._internal import main as pipmain
+
+        # trying to install package
+        url = package if extern_url is None else extern_url
+        pipmain(['install', '--user',
+                 url])  # will raise SystemExit Error if fails
+
+        # trying to load again
+        try:
+            return __import__(package)
+        except ImportError:
+            import sys
+            import site
+            user_site = site.getusersitepackages()
+            if user_site not in sys.path:
+                sys.path.append(user_site)
+            return __import__(package)
+    return __import__(package)
+
+
+def try_import_dali():
+    """Try import NVIDIA DALI at runtime.
+    """
+    try:
+        dali = __import__('nvidia.dali', fromlist=['pipeline', 'ops', 'types'])
+        dali.Pipeline = dali.pipeline.Pipeline
+    except ImportError:
+
+        class dali:
+            class Pipeline:
+                def __init__(self):
+                    raise NotImplementedError(
+                        "DALI not found, please check if you installed it correctly."
+                    )
+
+    return dali
--- a/lib_unprompted/insightface_cuda/utils/storage.py
+++ b/lib_unprompted/insightface_cuda/utils/storage.py
@ -0,0 +1,52 @@
+
+import os
+import os.path as osp
+import zipfile
+from .download import download_file
+
+BASE_REPO_URL = 'https://github.com/deepinsight/insightface/releases/download/v0.7'
+
+def download(sub_dir, name, force=False, root='~/.insightface'):
+    _root = os.path.expanduser(root)
+    dir_path = os.path.join(_root, sub_dir, name)
+    if osp.exists(dir_path) and not force:
+        return dir_path
+    print('download_path:', dir_path)
+    zip_file_path = os.path.join(_root, sub_dir, name + '.zip')
+    model_url = "%s/%s.zip"%(BASE_REPO_URL, name)
+    download_file(model_url,
+             path=zip_file_path,
+             overwrite=True)
+    if not os.path.exists(dir_path):
+        os.makedirs(dir_path)
+    with zipfile.ZipFile(zip_file_path) as zf:
+        zf.extractall(dir_path)
+    #os.remove(zip_file_path)
+    return dir_path
+
+def ensure_available(sub_dir, name, root='~/.insightface'):
+    return download(sub_dir, name, force=False, root=root)
+
+def download_onnx(sub_dir, model_file, force=False, root='~/.insightface', download_zip=False):
+    _root = os.path.expanduser(root)
+    model_root = osp.join(_root, sub_dir)
+    new_model_file = osp.join(model_root, model_file)
+    if osp.exists(new_model_file) and not force:
+        return new_model_file
+    if not osp.exists(model_root):
+        os.makedirs(model_root)
+    print('download_path:', new_model_file)
+    if not download_zip:
+        model_url = "%s/%s"%(BASE_REPO_URL, model_file)
+        download_file(model_url,
+                 path=new_model_file,
+                 overwrite=True)
+    else:
+        model_url = "%s/%s.zip"%(BASE_REPO_URL, model_file)
+        zip_file_path = new_model_file+".zip"
+        download_file(model_url,
+                 path=zip_file_path,
+                 overwrite=True)
+        with zipfile.ZipFile(zip_file_path) as zf:
+            zf.extractall(model_root)
+        return new_model_file
--- a/lib_unprompted/insightface_cuda/utils/transform.py
+++ b/lib_unprompted/insightface_cuda/utils/transform.py
@ -0,0 +1,116 @@
+import cv2
+import math
+import numpy as np
+from skimage import transform as trans
+
+
+def transform(data, center, output_size, scale, rotation):
+    scale_ratio = scale
+    rot = float(rotation) * np.pi / 180.0
+    #translation = (output_size/2-center[0]*scale_ratio, output_size/2-center[1]*scale_ratio)
+    t1 = trans.SimilarityTransform(scale=scale_ratio)
+    cx = center[0] * scale_ratio
+    cy = center[1] * scale_ratio
+    t2 = trans.SimilarityTransform(translation=(-1 * cx, -1 * cy))
+    t3 = trans.SimilarityTransform(rotation=rot)
+    t4 = trans.SimilarityTransform(translation=(output_size / 2,
+                                                output_size / 2))
+    t = t1 + t2 + t3 + t4
+    M = t.params[0:2]
+    cropped = cv2.warpAffine(data,
+                             M, (output_size, output_size),
+                             borderValue=0.0)
+    return cropped, M
+
+
+def trans_points2d(pts, M):
+    new_pts = np.zeros(shape=pts.shape, dtype=np.float32)
+    for i in range(pts.shape[0]):
+        pt = pts[i]
+        new_pt = np.array([pt[0], pt[1], 1.], dtype=np.float32)
+        new_pt = np.dot(M, new_pt)
+        #print('new_pt', new_pt.shape, new_pt)
+        new_pts[i] = new_pt[0:2]
+
+    return new_pts
+
+
+def trans_points3d(pts, M):
+    scale = np.sqrt(M[0][0] * M[0][0] + M[0][1] * M[0][1])
+    #print(scale)
+    new_pts = np.zeros(shape=pts.shape, dtype=np.float32)
+    for i in range(pts.shape[0]):
+        pt = pts[i]
+        new_pt = np.array([pt[0], pt[1], 1.], dtype=np.float32)
+        new_pt = np.dot(M, new_pt)
+        #print('new_pt', new_pt.shape, new_pt)
+        new_pts[i][0:2] = new_pt[0:2]
+        new_pts[i][2] = pts[i][2] * scale
+
+    return new_pts
+
+
+def trans_points(pts, M):
+    if pts.shape[1] == 2:
+        return trans_points2d(pts, M)
+    else:
+        return trans_points3d(pts, M)
+
+def estimate_affine_matrix_3d23d(X, Y):
+    ''' Using least-squares solution 
+    Args:
+        X: [n, 3]. 3d points(fixed)
+        Y: [n, 3]. corresponding 3d points(moving). Y = PX
+    Returns:
+        P_Affine: (3, 4). Affine camera matrix (the third row is [0, 0, 0, 1]).
+    '''
+    X_homo = np.hstack((X, np.ones([X.shape[0],1]))) #n x 4
+    P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
+    return P
+
+def P2sRt(P):
+    ''' decompositing camera matrix P
+    Args: 
+        P: (3, 4). Affine Camera Matrix.
+    Returns:
+        s: scale factor.
+        R: (3, 3). rotation matrix.
+        t: (3,). translation. 
+    '''
+    t = P[:, 3]
+    R1 = P[0:1, :3]
+    R2 = P[1:2, :3]
+    s = (np.linalg.norm(R1) + np.linalg.norm(R2))/2.0
+    r1 = R1/np.linalg.norm(R1)
+    r2 = R2/np.linalg.norm(R2)
+    r3 = np.cross(r1, r2)
+
+    R = np.concatenate((r1, r2, r3), 0)
+    return s, R, t
+
+def matrix2angle(R):
+    ''' get three Euler angles from Rotation Matrix
+    Args:
+        R: (3,3). rotation matrix
+    Returns:
+        x: pitch
+        y: yaw
+        z: roll
+    '''
+    sy = math.sqrt(R[0,0] * R[0,0] +  R[1,0] * R[1,0])
+     
+    singular = sy < 1e-6
+ 
+    if  not singular :
+        x = math.atan2(R[2,1] , R[2,2])
+        y = math.atan2(-R[2,0], sy)
+        z = math.atan2(R[1,0], R[0,0])
+    else :
+        x = math.atan2(-R[1,2], R[1,1])
+        y = math.atan2(-R[2,0], sy)
+        z = 0
+
+    # rx, ry, rz = np.rad2deg(x), np.rad2deg(y), np.rad2deg(z)
+    rx, ry, rz = x*180/np.pi, y*180/np.pi, z*180/np.pi
+    return rx, ry, rz
+
--- a/lib_unprompted/shared.py
+++ b/lib_unprompted/shared.py
@ -62,7 +62,7 @@ class Unprompted:

 					@shortcodes.register(shortcode_name, None, preprocess)
 					def handler(keyword, pargs, kwargs, context):
-						self.prep_for_shortcode(keyword,pargs,kwargs,context)
+						self.prep_for_shortcode(keyword, pargs, kwargs, context)
 						return (self.shortcode_objects[f"{keyword}"].run_atomic(pargs, kwargs, context))

 				# Normal atomic
@ -70,7 +70,7 @@ class Unprompted:

 					@shortcodes.register(shortcode_name)
 					def handler(keyword, pargs, kwargs, context):
-						self.prep_for_shortcode(keyword,pargs,kwargs,context)
+						self.prep_for_shortcode(keyword, pargs, kwargs, context)
 						return (self.shortcode_objects[f"{keyword}"].run_atomic(pargs, kwargs, context))
 			else:
 				# Allow shortcode to run before inner content
@ -81,7 +81,7 @@ class Unprompted:

 					@shortcodes.register(shortcode_name, f"{self.Config.syntax.tag_close}{shortcode_name}", preprocess)
 					def handler(keyword, pargs, kwargs, context, content):
-						self.prep_for_shortcode(keyword,pargs,kwargs,context,content)
+						self.prep_for_shortcode(keyword, pargs, kwargs, context, content)
 						return (self.shortcode_objects[f"{keyword}"].run_block(pargs, kwargs, context, content))

 				# Normal block
@ -89,7 +89,7 @@ class Unprompted:

 					@shortcodes.register(shortcode_name, f"{self.Config.syntax.tag_close}{shortcode_name}")
 					def handler(keyword, pargs, kwargs, context, content):
-						self.prep_for_shortcode(keyword,pargs,kwargs,context,content)
+						self.prep_for_shortcode(keyword, pargs, kwargs, context, content)
 						return (self.shortcode_objects[f"{keyword}"].run_block(pargs, kwargs, context, content))

 			# Setup extra routines
@ -107,7 +107,7 @@ class Unprompted:
 		self.log.info(f"Finished loading in {time.time()-start_time} seconds.")

 	def __init__(self, base_dir="."):
-		self.VERSION = "10.6.0"
+		self.VERSION = "10.7.0"

 		self.shortcode_modules = {}
 		self.shortcode_objects = {}
@ -178,7 +178,7 @@ class Unprompted:
 	def start(self, string, debug=True):
 		if debug: self.log.debug("Loading global variables...")
 		for global_var, value in self.Config.globals.__dict__.items():
-			self.shortcode_user_vars[self.Config.syntax.global_prefix+global_var] = value
+			self.shortcode_user_vars[self.Config.syntax.global_prefix + global_var] = value
 		if debug: self.log.debug("Main routine started...")
 		self.routine = "main"
 		self.conditional_depth = -1
@ -204,13 +204,13 @@ class Unprompted:
 		return processed

 	def process_string(self, string, context=None, cleanup_extra_spaces=None):
-		if cleanup_extra_spaces==None: cleanup_extra_spaces = self.Config.syntax.cleanup_extra_spaces
+		if cleanup_extra_spaces == None: cleanup_extra_spaces = self.Config.syntax.cleanup_extra_spaces

 		self.conditional_depth += 1
 		if context: self.current_context = context
 		# First, sanitize contents
 		string = self.shortcode_parser.parse(self.sanitize_pre(string, self.Config.syntax.sanitize_before), context)
-		self.conditional_depth = max(0, self.conditional_depth -1)
+		self.conditional_depth = max(0, self.conditional_depth - 1)
 		return (self.sanitize_post(string, cleanup_extra_spaces))

 	def sanitize_pre(self, string, rules_obj, only_remove_last=False):
@ -268,7 +268,7 @@ class Unprompted:
 		self.kwargs = kwargs
 		self.context = context
 		self.content = content
-	
+
 	def parse_arg(self, key, default=False, datatype=None, context=None, pargs=None, kwargs=None, arithmetic=True, delimiter=None):
 		"""Processes the argument, casting it to the correct datatype."""
 		# Load defaults from the Unprompted object
@ -285,8 +285,8 @@ class Unprompted:
 		if pargs and key in pargs:
 			return True
 		elif kwargs and key in kwargs:
-			if arithmetic: default = self.parse_advanced(str(kwargs[key]),context)
-			else: default = self.parse_alt_tags(str(kwargs[key]),context)
+			if arithmetic: default = self.parse_advanced(str(kwargs[key]), context)
+			else: default = self.parse_alt_tags(str(kwargs[key]), context)
 			if delimiter:
 				try:
 					# We will cast the value to a string so that we can split it, but
@ -300,7 +300,7 @@ class Unprompted:

 		try:
 			if type(default) == list:
-				for idx,val in enumerate(default):
+				for idx, val in enumerate(default):
 					default[idx] = datatype(val)
 			else:
 				default = datatype(default)
@ -310,11 +310,10 @@ class Unprompted:

 		return default

-	
 	def parse_advanced(self, string, context=None):
 		"""First runs the string through parse_alt_tags, the result of which then goes through simpleeval"""
 		if string is None: return ""
-		
+
 		if (len(string) < 1): return ""
 		string = self.parse_alt_tags(string, context)
 		if self.Config.advanced_expressions:
@ -360,11 +359,11 @@ class Unprompted:
 		string = string.replace(tmp_start, self.Config.syntax.tag_start_alt).replace(tmp_end, self.Config.syntax.tag_end_alt)

 		return (parser.parse(string, context))
-	
+
 	def make_alt_tags(self, string):
 		"""Similar to parse_alt_tags, but in reverse; converts square brackets to nested alt tags."""
 		if string is None or len(string) < 1: return ""
-		
+
 		# Find maximum nested depth
 		nested = 0
 		while True:
@ -384,7 +383,7 @@ class Unprompted:
 			end_new = tmp_end * (i + 1)

 			string = string.replace(start_old, start_new).replace(end_old, end_new)
-		
+
 		# Convert primary square bracket tag to alt tag
 		string = string.replace(self.Config.syntax.tag_start, self.Config.syntax.tag_start_alt).replace(self.Config.syntax.tag_end, self.Config.syntax.tag_end_alt)

@ -453,10 +452,10 @@ class Unprompted:
 					this_val = self.shortcode_user_vars[att]
 					# Apply preset model names
 					if att_split[2] == "model":
-						if self.shortcode_user_vars["sd_base"]== "sd1": cn_dict = self.Config.stable_diffusion.controlnet.sd1_models
+						if self.shortcode_user_vars["sd_base"] == "sd1": cn_dict = self.Config.stable_diffusion.controlnet.sd1_models
 						elif self.shortcode_user_vars["sd_base"] == "sdxl": cn_dict = self.Config.stable_diffusion.controlnet.sdxl_models

-						if hasattr(cn_dict,this_val):
+						if hasattr(cn_dict, this_val):
 							this_val = getattr(cn_dict, this_val)
 				setattr(all_units[int(att_split[1])], "_".join(att_split[2:]), this_val)
 				cnet.update_cn_script_in_processing(this_p, all_units)
@ -548,15 +547,18 @@ class Unprompted:
 			if self.routine == "after":
 				if new_image:
 					self.after_processed.images[idx] = new_image
-				else: return self.after_processed.images[idx]
+				else:
+					return self.after_processed.images[idx]
 			elif "init_images" in self.shortcode_user_vars and self.shortcode_user_vars["init_images"]:
 				if new_image:
 					self.shortcode_user_vars["init_images"][idx] = new_image
-				else: return self.shortcode_user_vars["init_images"][idx]
+				else:
+					return self.shortcode_user_vars["init_images"][idx]
 			elif "default_image" in self.shortcode_user_vars:
 				if new_image:
 					self.shortcode_user_vars["default_image"] = new_image
-				else: return self.shortcode_user_vars["default_image"]
+				else:
+					return self.shortcode_user_vars["default_image"]
 		except Exception as e:
 			self.log.exception("Could not find the current image.")
 			return None
@ -570,16 +572,16 @@ class Unprompted:

 				if self.routine == "after":
 					self.shortcode_user_vars["init_images"][idx] = self.after_processed.images[idx]
-				
+
 				# Update the SD vars if Unprompted.main_p exists
 				#if hasattr(self, "main_p"):
 				#	self.update_stable_diffusion_vars(self.main_p)
 			return True
 		return None

-	def escape_tags(self, string, new_start = None, new_end = None):
-		if not new_start: new_start = self.Config.syntax.tag_escape+self.Config.syntax.tag_start_alt
-		if not new_end: new_end = self.Config.syntax.tag_escape+self.Config.syntax.tag_end_alt
+	def escape_tags(self, string, new_start=None, new_end=None):
+		if not new_start: new_start = self.Config.syntax.tag_escape + self.Config.syntax.tag_start_alt
+		if not new_end: new_end = self.Config.syntax.tag_escape + self.Config.syntax.tag_end_alt
 		# self.log.warning(f"string is {string}")
 		# self.log.warning(f"string after replacing is {string.replace(self.Config.syntax.tag_start,new_start).replace(self.Config.syntax.tag_end,new_end)}")
-		return string.replace(self.Config.syntax.tag_start,new_start).replace(self.Config.syntax.tag_end,new_end)
+		return string.replace(self.Config.syntax.tag_start, new_start).replace(self.Config.syntax.tag_end, new_end)
--- a/requirements.txt
+++ b/requirements.txt
@ -8,7 +8,7 @@ color-matcher # [zoom_enhance]
 modelscope # [faceswap] face_fusion
 tensorflow # [faceswap]
 onnx # [faceswap]
-onnxruntime # [faceswap]
+onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ # [faceswap] GPU support (CUDA 12 package)
 mxnet -f https://dist.mxnet.io/python/cpu # [faceswap]
 albumentations # [faceswap] insightface
 pyiqa # [image_info] image quality assessment
--- a/scripts/unprompted.py
+++ b/scripts/unprompted.py
@ -20,6 +20,7 @@ from enum import IntEnum, auto
 import sys, os, html, random

 base_dir = scripts.basedir()
+unprompted_dir = str(Path(*Path(base_dir).parts[-2:])).replace("\\", "/")

 sys.path.append(base_dir)
 # Main object
@ -27,6 +28,7 @@ from lib_unprompted.shared import Unprompted, parse_config

 Unprompted = Unprompted(base_dir)

+Unprompted.log.debug(f"The `base_dir` is: {base_dir}")
 ext_dir = os.path.split(os.path.normpath(base_dir))[1]
 if ext_dir == "unprompted":
 	Unprompted.log.warning("The extension folder must be renamed from unprompted to _unprompted in order to ensure compatibility with other extensions. Please see this A1111 WebUI issue for more details: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/8011")
@ -40,6 +42,13 @@ Unprompted.is_enabled = True
 Unprompted.original_prompt = None
 Unprompted.original_negative_prompt = ""

+if os.path.exists(f"./modules_forge"):
+	Unprompted.webui = "forge"
+else:
+	Unprompted.webui = "auto1111"
+
+Unprompted.log.debug(f"WebUI type: {Unprompted.webui}")
+
 Unprompted.wizard_template_files = []
 Unprompted.wizard_template_names = []
 Unprompted.wizard_template_kwargs = []
@ -95,7 +104,7 @@ def wizard_prep_event_listeners(obj):
 			wizard_set_event_listener(child)


-def wizard_generate_template(option, is_img2img, prepend="", append=""):
+def wizard_generate_template(option, is_img2img, html_safe=True, prepend="", append=""):
 	filepath = os.path.relpath(Unprompted.wizard_template_files[option], f"{base_dir}/{Unprompted.Config.template_directory}")
 	# Remove file extension
 	filepath = os.path.splitext(filepath)[0]
@ -120,7 +129,9 @@ def wizard_generate_template(option, is_img2img, prepend="", append=""):
 						this_val = gr_obj.value
 					if (arg_name == "prompt"): continue

-					this_val = Unprompted.make_alt_tags(html.escape(str(helpers.autocast(this_val)).replace("\"", "\'"), quote=False))
+					this_val = str(helpers.autocast(this_val)).replace("\"", "\'")
+					if html_safe: this_val = html.escape(this_val, quote=False)
+					this_val = Unprompted.make_alt_tags(this_val)

 					if " " in this_val: this_val = f"\"{this_val}\""  # Enclose in quotes if necessary
 					result += f" {arg_name}={this_val}"
@ -139,7 +150,7 @@ def wizard_generate_template(option, is_img2img, prepend="", append=""):
 	return (prepend + result + append)


-def wizard_generate_shortcode(option, is_img2img, prepend="", append=""):
+def wizard_generate_shortcode(option, is_img2img, html_safe=True, prepend="", append=""):
 	if hasattr(Unprompted.shortcode_objects[option], "wizard_prepend"): result = Unprompted.shortcode_objects[option].wizard_prepend
 	else: result = Unprompted.Config.syntax.tag_start + option
 	filtered_shortcodes = Unprompted.wizard_groups[WizardModes.SHORTCODES][int(is_img2img)]
@ -182,7 +193,9 @@ def wizard_generate_shortcode(option, is_img2img, prepend="", append=""):
 					elif (block_name == "number" or block_name == "slider"): result += f" {arg_name}={helpers.autocast(gr_obj.value)}"
 					elif (block_name == "textbox"):
 						if len(this_val) > 0: result += f" {arg_name}=\"{this_val}\""
-					else: result += f" {arg_name}=\"{html.escape(this_val, quote=False)}\""
+					else:
+						if html_safe: this_val = html.escape(this_val, quote=False)
+						result += f" {arg_name}=\"{this_val}\""

 		except:
 			pass
@ -256,10 +269,12 @@ def wizard_generate_capture(include_inference, include_prompt, include_neg_promp


 def get_local_file_dir(filename=None):
-	unp_dir = os.path.basename(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
+	# unp_dir = os.path.basename(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
+
 	if filename: filepath = "/" + str(Path(os.path.relpath(filename, f"{base_dir}")).parent)
 	else: filepath = ""
-	return (f"file/extensions/{unp_dir}{filepath}")
+
+	return (f"file/{unprompted_dir}{filepath}")


 def get_markdown(file):
@ -319,6 +334,7 @@ class Scripts(scripts.Script):
 				promos.append(f'<a href="https://payhip.com/b/hdgNR" target="_blank"><img src="{get_local_file_dir()}/images/promo_box_fantasy.png" class="thumbnail"></a><h1>Create beautiful art for your <strong>Fantasy Card Game</strong></h1><p>Generate a wide variety of creatures and characters in the style of a fantasy card game. Perfect for heroes, animals, monsters, and even crazy hybrids.</p><a href="https://payhip.com/b/hdgNR" target=_blank><button class="gr-button gr-button-lg gr-button-secondary" title="View premium assets for Unprompted">Download Now ➜</button></a>')
 				promos.append(f'<a href="https://github.com/ThereforeGames/unprompted" target="_blank"><img src="{get_local_file_dir()}/images/promo_github_star.png" class="thumbnail"></a><h1>Give Unprompted a <strong>star</strong> for visibility</h1><p>Most WebUI users have never heard of Unprompted. You can help more people discover it by giving the repo a ⭐ on Github. Thank you for your support!</p><a href="https://github.com/ThereforeGames/unprompted" target=_blank><button class="gr-button gr-button-lg gr-button-secondary" title="View the Unprompted repo">Visit Github ➜</button></a>')
 				promos.append(f'<a href="https://github.com/sponsors/ThereforeGames" target="_blank"><img src="{get_local_file_dir()}/images/promo_github_sponsor.png" class="thumbnail"></a><h1>Become a Sponsor</h1><p>One of the best ways to support Unprompted is by becoming our Sponsor on Github - sponsors receive access to a private repo containing all of our premium add-ons. <em>(Still setting that up... should be ready soon!)</em></p><a href="https://github.com/sponsors/ThereforeGames" target=_blank><button class="gr-button gr-button-lg gr-button-secondary" title="View the Unprompted repo">Visit Github ➜</button></a>')
+				promos.append(f'<a href="https://github.com/ThereforeGames/sd-webui-breadcrumbs" target="_blank"><img src="{get_local_file_dir()}/images/promo_breadcrumbs.png" class="thumbnail"></a><h1>Try our new Breadcrumbs extension</h1><p>From the developer of Unprompted comes <strong>sd-webui-breadcrumbs</strong>, an extension designed to improve the WebUI\'s navigation flow. Tedious "menu diving" is a thing of the past!</p><a href="https://github.com/ThereforeGames/sd-webui-breadcrumbs" target=_blank><button class="gr-button gr-button-lg gr-button-secondary" title="View the sd-webui-breadcrumbs repo">Visit Github ➜</button></a>')

 				with gr.Accordion("🎉 Promo", open=is_open):
 					plug = gr.HTML(label="plug", elem_id="promo", value=random.choice(promos))
@ -348,7 +364,7 @@ class Scripts(scripts.Script):
 								if (block_name == "textbox"):
 									if "_placeholder" in kwargs: this_placeholder = kwargs["_placeholder"]
 									else: this_placeholder = str(content)
-									obj = gr.Textbox(label=this_label, max_lines=1, placeholder=this_placeholder, info=_info, show_label=_show_label)
+									obj = gr.Textbox(label=this_label, lines=int(kwargs["_lines"]) if "_lines" in kwargs else 1, max_lines=int(kwargs["_max_lines"]) if "_max_lines" in kwargs else 1, placeholder=this_placeholder, info=_info, show_label=_show_label)
 								elif (block_name == "checkbox"):
 									obj = gr.Checkbox(label=this_label, value=bool(int(content)), info=_info, show_label=_show_label)
 								elif (block_name == "number"):
@ -651,8 +667,8 @@ class Scripts(scripts.Script):
 						autoinclude_obj = autoinclude_obj.children[-1]

 					if (autoinclude_obj.value):
-						if mode == WizardModes.SHORTCODES: Unprompted.original_prompt = wizard_generate_shortcode(key, is_img2img, "", Unprompted.original_prompt)
-						elif mode == WizardModes.TEMPLATES: Unprompted.original_prompt = wizard_generate_template(idx, is_img2img, "", Unprompted.original_prompt)
+						if mode == WizardModes.SHORTCODES: Unprompted.original_prompt = wizard_generate_shortcode(key, is_img2img, False, "", Unprompted.original_prompt)
+						elif mode == WizardModes.TEMPLATES: Unprompted.original_prompt = wizard_generate_template(idx, is_img2img, False, "", Unprompted.original_prompt)
 						p.all_prompts[0] = Unprompted.original_prompt  # test
 						p.unprompted_original_prompt = Unprompted.original_prompt

--- a/shortcodes/basic/after.py
+++ b/shortcodes/basic/after.py
@ -15,7 +15,7 @@ class Shortcode():
 		# if "batch_indexing" in kwargs: self.batch_indexing = bool(self.Unprompted.parse_advanced(kwargs["batch_indexing"]))

 		batch_real_index = self.Unprompted.shortcode_user_vars["batch_real_index"] if "batch_real_index" in self.Unprompted.shortcode_user_vars else 0
-		dupe_index_mode = self.Unprompted.parse_arg("dupe_index_mode","concat")
+		dupe_index_mode = self.Unprompted.parse_arg("dupe_index_mode", "concat")

 		# Create list inside of list to house [after] content for this batch number
 		while batch_real_index >= len(self.after_content):
@ -26,10 +26,10 @@ class Shortcode():
 		is_new_index = index >= len(self.after_content[batch_real_index])
 		if is_new_index or dupe_index_mode != "skip":
 			self.log.debug(f"Queueing up content (Batch #{batch_real_index}, After {index}): {content}")
-			
+
 			if is_new_index or dupe_index_mode == "replace":
 				self.log.debug(f"Replacing content in After routine (index {index})")
-				helpers.list_set(self.after_content[batch_real_index],index,content,"")
+				helpers.list_set(self.after_content[batch_real_index], index, content, "")
 			elif not is_new_index:
 				if dupe_index_mode == "concat":
 					self.log.debug(f"Concatenating content to After routine (index {index})")
@ -61,17 +61,21 @@ class Shortcode():
 							self.log.debug(f"{success_string} Regional Prompter")
 						elif script_title == "controlnet":
 							# Update the controlnet script args with a list of 0 units
-							cn_path = self.Unprompted.extension_path(self.Unprompted.Config.stable_diffusion.controlnet.extension)
-							if cn_path:
-								cn_module = helpers.import_file(f"{self.Unprompted.Config.stable_diffusion.controlnet.extension}.internal_controlnet.external_code", f"{cn_path}/internal_controlnet/external_code.py")
-								cn_module.update_cn_script_in_processing(self.Unprompted.main_p, [])
-								self.log.debug(f"{success_string} ControlNet")
-							else:
-								self.log.error("Could not communicate with ControlNet.")
+							if self.Unprompted.webui == "auto1111":
+								cn_path = self.Unprompted.extension_path(self.Unprompted.Config.stable_diffusion.controlnet.extension)
+								if cn_path:
+									cn_lib = "internal_controlnet"
+									# cn_lib = "lib_controlnet"
+									cn_module = helpers.import_file(f"{self.Unprompted.Config.stable_diffusion.controlnet.extension}.{cn_lib}.external_code", f"{cn_path}/{cn_lib}/external_code.py")
+									cn_module.update_cn_script_in_processing(self.Unprompted.main_p, [])
+									self.log.debug(f"{success_string} ControlNet")
+								else:
+									self.log.error("Could not communicate with ControlNet.")

 							pass
 					except Exception as e:
 						self.log.exception(f"Exception while trying to bypass an extension: {script_title}")
+						pass
 					i += 1

 			if processed:
@ -95,11 +99,11 @@ class Shortcode():
 					self.log.info(f"Processing After content for batch {batch_idx}, block {idx}...")
 					self.log.debug(f"After content: {content}")
 					self.Unprompted.shortcode_user_vars["after_index"] = idx
-					
+
 					self.Unprompted.process_string(content, "after")

 			self.after_content = []
-			
+
 			return (self.Unprompted.after_processed)
 		return processed

--- a/shortcodes/basic/autotone.py
+++ b/shortcodes/basic/autotone.py
@ -0,0 +1,84 @@
+class Shortcode():
+	def __init__(self, Unprompted):
+		self.Unprompted = Unprompted
+		self.description = "Adjusts the black point of the image to maximize contrast."
+
+		self.wizard_append = Unprompted.Config.syntax.tag_end + Unprompted.Config.syntax.tag_start + Unprompted.Config.syntax.tag_close + "after" + Unprompted.Config.syntax.tag_end
+
+	def run_atomic(self, pargs, kwargs, context):
+		from PIL import Image, ImageOps, ImageEnhance
+		import numpy as np
+		import lib_unprompted.helpers as helpers
+		image = self.Unprompted.parse_alt_tags(kwargs["file"], context) if "file" in kwargs else self.Unprompted.current_image()
+		show = self.Unprompted.parse_arg("show", False)
+		out = self.Unprompted.parse_arg("out", "")
+
+		if isinstance(image, str):
+			try:
+				image = Image.open(image)
+			except:
+				self.log.error(f"Could not open image {image}")
+				return ""
+
+		# Reinterpretation of Photoshop's "Auto Tone"
+		# Thank you to Gerald Bakker for the following writeup on the algorithm:
+		# https://geraldbakker.nl/psnumbers/auto-options.html
+
+		shadows = np.array(helpers.str_to_rgb(self.Unprompted.parse_arg("shadows", "0,0,0")))
+		# midtones are only used in other algorithms:
+		midtones = helpers.str_to_rgb(self.Unprompted.parse_arg("midtones", "128,128,128"))
+		highlights = np.array(helpers.str_to_rgb(self.Unprompted.parse_arg("highlights", "255,255,255")))
+		shadow_clip = self.Unprompted.parse_arg("shadow_clip", 0.001)
+		highlight_clip = self.Unprompted.parse_arg("highlight_clip", 0.001)
+
+		# Convert the image to a numpy array
+		img_array = np.array(image, dtype=np.float32)
+
+		def calculate_adjustment_values(hist, total_pixels, clip_percent):
+			clip_threshold = total_pixels * clip_percent
+			cumulative_hist = hist.cumsum()
+
+			# Find the first and last indices where the cumulative histogram exceeds the clip thresholds
+			lower_bound_idx = np.where(cumulative_hist > clip_threshold)[0][0]
+			upper_bound_idx = np.where(cumulative_hist < (total_pixels - clip_threshold))[0][-1]
+
+			return lower_bound_idx, upper_bound_idx
+
+		# Process each channel (R, G, B) separately
+		for channel in range(3):
+			# Calculate the histogram of the current channel
+			hist, _ = np.histogram(img_array[:, :, channel].flatten(), bins=256, range=[0, 255])
+
+			# Total number of pixels
+			total_pixels = img_array.shape[0] * img_array.shape[1]
+
+			# Calculate the adjustment values based on clipping percentages
+			dark_value, light_value = calculate_adjustment_values(hist, total_pixels, shadow_clip)
+			_, upper_light_value = calculate_adjustment_values(hist, total_pixels, highlight_clip)
+
+			# Adjust light_value using upper_light_value for highlights
+			light_value = max(light_value, upper_light_value)
+
+			# Avoid division by zero
+			if light_value == dark_value:
+				continue
+
+			# Scale and clip the channel values
+			img_array[:, :, channel] = (img_array[:, :, channel] - dark_value) * (highlights[channel] - shadows[channel]) / (light_value - dark_value) + shadows[channel]
+			img_array[:, :, channel] = np.clip(img_array[:, :, channel], 0, 255)
+
+		# Make sure the data type is correct for PIL
+		img_array = np.clip(img_array, 0, 255).astype(np.uint8)
+
+		new_image = Image.fromarray(img_array)
+
+		if show:
+			self.Unprompted.after_processed.images.append(image)
+
+		if out:
+			new_image.save(out)
+
+		self.Unprompted.current_image(new_image)
+
+	def ui(self, gr):
+		gr.Textbox(label="Path to image (uses SD image by default) 🡢 str")
--- a/shortcodes/basic/call.py
+++ b/shortcodes/basic/call.py
@ -41,8 +41,6 @@ class Shortcode():
 					contents = self.Unprompted.shortcode_objects["function"].functions[name]
 					next_context = name
 			else:
-				# self.log.debug(f"{name} is assumed to be a filepath")
-
 				file = self.Unprompted.parse_filepath(helpers.str_with_ext(name, self.Unprompted.Config.txt_format), context=context, must_exist=False)

 				if not os.path.exists(file):
@ -77,4 +75,4 @@ class Shortcode():
 	def ui(self, gr):
 		gr.Textbox(label="Function name or filepath 🡢 str", max_lines=1)
 		gr.Textbox(label="Expected encoding 🡢 _encoding", max_lines=1, value="utf-8")
-		pass
+		pass
--- a/shortcodes/basic/gpt.py
+++ b/shortcodes/basic/gpt.py
@ -12,7 +12,9 @@ class Shortcode():

 		task = self.Unprompted.parse_advanced(kwargs["task"], context) if "task" in kwargs else "text-generation"

-		do_cache = self.Unprompted.shortcode_var_is_true("cache", pargs, kwargs)
+		instruction = self.Unprompted.parse_arg("instruction", "")
+
+		do_cache = not self.Unprompted.shortcode_var_is_true("unload", pargs, kwargs)

 		output_key = "generated_text"
 		if task == "summarization": output_key = "summary_text"
@ -25,7 +27,7 @@ class Shortcode():

 		model_dir = f"{self.Unprompted.base_dir}/{self.Unprompted.Config.subdirectories.models}/gpt"

-		model_name = self.Unprompted.parse_advanced(kwargs["model"], context) if "model" in kwargs else "Gustavosta/MagicPrompt-Stable-Diffusion"
+		model_name = self.Unprompted.parse_advanced(kwargs["model"], context) if "model" in kwargs else "LykosAI/GPT-Prompt-Expansion-Fooocus-v2"

 		if do_cache and model_name == self.cache_model_name and task == self.cache_task:
 			tokenizer = self.cache_tokenizer
@ -34,7 +36,7 @@ class Shortcode():
 			model = model_name
 			tokenizer = model

-			if "task" == "text-generation":
+			if task == "text-generation":
 				tokenizer = AutoTokenizer.from_pretrained(model, cache_dir=model_dir)
 				model = AutoModelForCausalLM.from_pretrained(model, cache_dir=model_dir)

@ -48,13 +50,17 @@ class Shortcode():

 		generator = pipeline(task, model=model, tokenizer=tokenizer, model_kwargs={"cache_dir": model_dir}, device=self.Unprompted.main_p.sd_model.device)

-		gpt_result = generator(content, min_length=min_length, max_length=max_length, num_return_sequences=num_return_sequences)[0][output_key]
+		gpt_result = generator(content, min_length=min_length, max_length=max_length, num_return_sequences=num_return_sequences, prefix=instruction)[0][output_key]
+
+		if instruction:
+			gpt_result = gpt_result.replace(instruction, "")

 		return gpt_result

 	def ui(self, gr):
-		gr.Dropdown(label="GPT model 🡢 model", info="The first time you use a model, it will be downloaded to your `unprompted/models/gpt` directory. Each model is approximately between 300MB-1.4GB. Credit to the model author names are included in the dropdown below.", value="Gustavosta/MagicPrompt-Stable-Diffusion", choices=["Gustavosta/MagicPrompt-Stable-Diffusion", "daspartho/prompt-extend", "succinctly/text2image-prompt-generator", "microsoft/Promptist", "AUTOMATIC/promptgen-lexart", "AUTOMATIC/promptgen-majinai-safe", "AUTOMATIC/promptgen-majinai-unsafe", "Gustavosta/MagicPrompt-Dalle", "kmewhort/stable-diffusion-prompt-bolster", "Ar4ikov/gpt2-650k-stable-diffusion-prompt-generator", "Ar4ikov/gpt2-medium-650k-stable-diffusion-prompt-generator", "crumb/bloom-560m-RLHF-SD2-prompter-aesthetic", "Meli/GPT2-Prompt", "DrishtiSharma/StableDiffusion-Prompt-Generator-GPT-Neo-125M", "facebook/bart-large-cnn", "gpt2"])
+		gr.Dropdown(label="GPT model 🡢 model", info="The first time you use a model, it will be downloaded to your `unprompted/models/gpt` directory. Each model is approximately between 300MB-1.4GB. Credit to the model author names are included in the dropdown below.", value="LykosAI/GPT-Prompt-Expansion-Fooocus-v2", choices=["LykosAI/GPT-Prompt-Expansion-Fooocus-v2", "Gustavosta/MagicPrompt-Stable-Diffusion", "daspartho/prompt-extend", "succinctly/text2image-prompt-generator", "microsoft/Promptist", "AUTOMATIC/promptgen-lexart", "AUTOMATIC/promptgen-majinai-safe", "AUTOMATIC/promptgen-majinai-unsafe", "Gustavosta/MagicPrompt-Dalle", "kmewhort/stable-diffusion-prompt-bolster", "Ar4ikov/gpt2-650k-stable-diffusion-prompt-generator", "Ar4ikov/gpt2-medium-650k-stable-diffusion-prompt-generator", "crumb/bloom-560m-RLHF-SD2-prompter-aesthetic", "Meli/GPT2-Prompt", "DrishtiSharma/StableDiffusion-Prompt-Generator-GPT-Neo-125M", "facebook/bart-large-cnn", "gpt2"])
+		gr.Text(label="Instruction 🡢 instruction", value="", info="Text to prepend to the content; may help steer the model's output.")
 		gr.Dropdown(label="Task 🡢 task", info="Not every model is compatible with every task.", value="text-generation", choices=["text-generation", "summarization"])
 		gr.Number(label="Minimum number of words returned 🡢 min_length", value=1, interactive=True)
 		gr.Number(label="Maximum number of words returned 🡢 max_length", value=50, interactive=True)
-		gr.Checkbox(label="Cache the model 🡢 cache")
+		gr.Checkbox(label="Unload the model from cache after use 🡢 unload")
--- a/shortcodes/stable_diffusion/faceswap.py
+++ b/shortcodes/stable_diffusion/faceswap.py
@ -3,7 +3,7 @@ class Shortcode():
 		self.Unprompted = Unprompted
 		self.description = "Swap the face in an image using one or more techniques. Note that the Facelift template is more user-friendly for this purpose."

-		self.fs_pipelines = ["face_fusion","ghost","insightface"]
+		self.fs_pipelines = ["face_fusion", "ghost", "insightface"]
 		self.fs_now = ""
 		self.fs_pipeline = {}
 		for pipeline in self.fs_pipelines:
@ -19,48 +19,61 @@ class Shortcode():
 		import lib_unprompted.helpers as helpers
 		from PIL import Image

-		visibility = self.Unprompted.parse_arg("visibility",1.0)
-		unload_parts = self.Unprompted.parse_arg("unload","")
-		minimum_similarity = self.Unprompted.parse_arg("minimum_similarity",-1000.0)
+		visibility = self.Unprompted.parse_arg("visibility", 1.0)
+		unload_parts = self.Unprompted.parse_arg("unload", "")
+		minimum_similarity = self.Unprompted.parse_arg("minimum_similarity", -1000.0)
+		prefer_gpu = self.Unprompted.parse_arg("prefer_gpu", True)

 		if len(pargs) < 1:
 			self.log.error("You must pass a path to a face image as the first parg.")
 			return ""
-		all_pipelines = helpers.ensure(self.Unprompted.parse_arg("pipeline","insightface"),list)
+		all_pipelines = helpers.ensure(self.Unprompted.parse_arg("pipeline", "insightface"), list)
 		# (kwargs["pipeline"] if "pipeline" in kwargs else "insightface").split(self.Unprompted.Config.syntax.delimiter)

-		providers = ["CPUExecutionProvider"]
+		providers = ["CUDAExecutionProvider" if prefer_gpu else "CPUExecutionProvider"]
 		model_dir = f"{self.Unprompted.base_dir}/{self.Unprompted.Config.subdirectories.models}"

-		_body = self.Unprompted.parse_alt_tags(kwargs["body"],context) if "body" in kwargs else False
+		_body = self.Unprompted.parse_alt_tags(kwargs["body"], context) if "body" in kwargs else False
 		if _body:
 			orig_img = Image.open(_body)
-		else: orig_img = self.Unprompted.current_image()
-		
+		else:
+			orig_img = self.Unprompted.current_image()
+
 		face_string = self.Unprompted.parse_advanced(pargs[0])
 		faces = face_string.split(self.Unprompted.Config.syntax.delimiter)

 		def get_cached(part):
-			if part in self.fs_pipeline[self.fs_now] and part not in unload_parts and "all" not in unload_parts:
+			if part in self.fs_pipeline[self.fs_now] and part not in unload_parts and "all" not in unload_parts and "export_embedding" not in pargs:
 				self.log.info(f"Using cached {part}.")
 				return self.fs_pipeline[self.fs_now][part]
 			self.log.info(f"Processing {part}...")
 			return False

 		for swap_method in all_pipelines:
-			
+
 			result = None
 			self.log.info(f"Starting faceswap: {swap_method}")
 			self.fs_now = swap_method

+			gender_bonus = self.Unprompted.parse_arg("gender_bonus", 50)
+			age_influence = self.Unprompted.parse_arg("age_influence", 1)
+
 			if swap_method == "insightface":
-				import lib_unprompted.insightface as insightface
+				if prefer_gpu:
+					import lib_unprompted.insightface_cuda as insightface
+				else:
+					import lib_unprompted.insightface as insightface
 				import numpy as np
 				import cv2
 				import torch

-				def get_faces(img_data: np.ndarray, face_index=0, det_size=(640, 640)):
+				face_analyser = get_cached("analyser")
+				if not face_analyser:
 					face_analyser = insightface.app.FaceAnalysis(name="buffalo_l", providers=providers)
+					self.fs_pipeline[swap_method]["analyser"] = face_analyser
+
+				def get_faces(img_data: np.ndarray, face_index=0, det_size=(640, 640)):
+
 					face_analyser.prepare(ctx_id=0, det_size=det_size)
 					face = face_analyser.get(img_data)

@ -74,7 +87,7 @@ class Shortcode():
 						return None

 				these_faces = (self.fs_face_path == face_string) and get_cached("face")
-				if not these_faces: 
+				if not these_faces:
 					temp_dict = []
 					for facepath in faces:
 						# Avoid reloading faces that were already in self.fs_face_path
@ -87,7 +100,7 @@ class Shortcode():
 									from safetensors.torch import load_file
 									tensors = load_file(facepath)
 									embedding = tensors["embedding"].numpy()
-									face = insightface.app.common.Face(embedding=embedding)
+									face = insightface.app.common.Face(embedding=embedding, gender=tensors["gender"] if "gender" in tensors else 0, age=tensors["age"] if "age" in tensors else 18)
 								except:
 									self.log.error(f"Could not parse face from the safetensors file at {facepath}.")
 									continue
@ -100,17 +113,19 @@ class Shortcode():
 								temp_dict.append(face)

 					self.fs_pipeline[swap_method]["face"] = temp_dict
-			
+
 				if "export_embedding" in pargs:
 					import os
 					from safetensors.torch import save_file

 					self.log.info("Blending faces together...")
 					avg_embedding = np.mean([obj.embedding for obj in temp_dict], axis=0)
-					face = insightface.app.common.Face(embedding=avg_embedding)
+					avg_gender = int(np.mean([obj.gender for obj in temp_dict], axis=0))
+					avg_age = int(np.mean([obj.age for obj in temp_dict], axis=0))
+					face = insightface.app.common.Face(embedding=avg_embedding, gender=avg_gender, age=avg_age)
 					self.fs_pipeline[swap_method]["face"] = [face]

-					embedding_str = self.Unprompted.parse_arg("embedding_path","blended_faces")
+					embedding_str = self.Unprompted.parse_arg("embedding_path", "blended_faces")
 					embedding_path = self.Unprompted.parse_filepath(helpers.str_with_ext(embedding_str, ".safetensors"), context=context, must_exist=False, root=self.Unprompted.base_dir + "/user/faces")
 					os.makedirs(os.path.dirname(embedding_path), exist_ok=True)
 					# If embedding file already exists, increment the filename until it doesn't
@ -121,9 +136,8 @@ class Shortcode():
 						dupe_counter += 1

 					self.log.info(f"Exporting to {embedding_path}...")
-					tensors = {"embedding": torch.tensor(face["embedding"])}
-					save_file(tensors, embedding_path)					
-
+					tensors = {"embedding": torch.tensor(face["embedding"]), "gender": torch.tensor(face["gender"]), "age": torch.tensor(face["age"])}
+					save_file(tensors, embedding_path)

 				target_img = cv2.cvtColor(np.array(orig_img), cv2.COLOR_RGB2BGR)

@ -132,7 +146,7 @@ class Shortcode():

 					this_model = get_cached("model")
 					if not this_model:
-						if not helpers.download_file(f"{model_dir}/insightface/inswapper_128.onnx","https://github.com/facefusion/facefusion-assets/releases/download/models/inswapper_128.onnx"):
+						if not helpers.download_file(f"{model_dir}/insightface/inswapper_128.onnx", "https://github.com/facefusion/facefusion-assets/releases/download/models/inswapper_128.onnx"):
 							continue
 						model_path = f"{model_dir}/insightface/inswapper_128.onnx"
 						self.fs_pipeline[swap_method]["model"] = insightface.model_zoo.get_model(model_path, providers=providers)
@ -142,36 +156,49 @@ class Shortcode():
 						for source_idx, source_face in enumerate(self.fs_pipeline[swap_method]["face"]):
 							self.log.debug(f"Seeking swap target for new face #{source_idx}")

-							similarities = [None]*len(target_faces)
-							
+							similarities = [None] * len(target_faces)
+
 							for idx, target_face in enumerate(target_faces):
+								# TODO: Utilize target_face.pose for similarity check?
+
 								# For each face, find the most similar face in the source image and swap it in.
 								if target_face.embedding is not None:
 									# Find the most similar face in the source image
 									similarity = np.dot(
-										source_face.embedding,
-										target_face.embedding,
+									    source_face.embedding,
+									    target_face.embedding,
 									)
-									
+
+									if gender_bonus:
+										self.log.debug(f"Source gender is {source_face.gender}, target face #{idx} gender is {target_face.gender}")
+										if source_face.gender == target_face.gender:
+											similarity += gender_bonus
+
+									if age_influence:
+										self.log.debug(f"Source age is {source_face.age}, target face #{idx} age is {target_face.age}")
+										age_diff = abs(source_face.age - target_face.age)
+										similarity -= age_diff * age_influence
+
 									self.log.debug(f"Similarity of face #{idx}: {similarity}")
+
 									similarities[idx] = similarity
-							
+
 							highest_similarity = max(similarities)
 							if highest_similarity >= minimum_similarity:
 								most_similar_idx = similarities.index(max(similarities))
 								result = self.fs_pipeline[swap_method]["model"].get(
-												result,
-												target_faces[most_similar_idx],
-												source_face,
-											)
-							
+								    result,
+								    target_faces[most_similar_idx],
+								    source_face,
+								)
+
 								# Remove this target face to avoid swapping it with the remaining images
 								target_faces.pop(most_similar_idx)
 								# Break out of the source_face loop in case there are no more target faces
 								if not target_faces: break
 							else:
 								self.log.info("No faces met the minimum similarity threshold.")
-						
+
 						result = Image.fromarray(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))
 					else:
 						self.log.error(f"No target face detected.")
@ -208,8 +235,8 @@ class Shortcode():
 				from lib_unprompted.ghost.models.config_sr import TestOptions

 				# Prep default args
-				kwargs["G_path"] = self.Unprompted.parse_arg("G_path",f"{model_dir}/ghost/G_unet_2blocks.pth")
-				kwargs["backbone"] = self.Unprompted.parse_arg("backbone","unet")
+				kwargs["G_path"] = self.Unprompted.parse_arg("G_path", f"{model_dir}/ghost/G_unet_2blocks.pth")
+				kwargs["backbone"] = self.Unprompted.parse_arg("backbone", "unet")
 				kwargs["num_blocks"] = self.Unprompted.parse_arg("num_blocks", 2)
 				kwargs["batch_size"] = self.Unprompted.parse_arg("batch_size", 40)
 				kwargs["crop_size"] = self.Unprompted.parse_arg("crop_size", 224)
@ -236,29 +263,29 @@ class Shortcode():
 						model = self.fs_pipeline[swap_method]["model"]["model"]
 					else:
 						# process downloads
-						helpers.download_file(f"{model_dir}/ghost/antelope/glintr100.onnx","https://github.com/sberbank-ai/sber-swap/releases/download/antelope/glintr100.onnx")
-						helpers.download_file(f"{model_dir}/ghost/antelope/scrfd_10g_bnkps.onnx","https://github.com/sberbank-ai/sber-swap/releases/download/antelope/scrfd_10g_bnkps.onnx")
-						helpers.download_file(f"{model_dir}/ghost/backbone.pth","https://github.com/sberbank-ai/sber-swap/releases/download/arcface/backbone.pth")
-						helpers.download_file(f"{model_dir}/ghost/G_unet_2blocks.pth","https://github.com/sberbank-ai/sber-swap/releases/download/sber-swap-v2.0/G_unet_2blocks.pth")
+						helpers.download_file(f"{model_dir}/ghost/antelope/glintr100.onnx", "https://github.com/sberbank-ai/sber-swap/releases/download/antelope/glintr100.onnx")
+						helpers.download_file(f"{model_dir}/ghost/antelope/scrfd_10g_bnkps.onnx", "https://github.com/sberbank-ai/sber-swap/releases/download/antelope/scrfd_10g_bnkps.onnx")
+						helpers.download_file(f"{model_dir}/ghost/backbone.pth", "https://github.com/sberbank-ai/sber-swap/releases/download/arcface/backbone.pth")
+						helpers.download_file(f"{model_dir}/ghost/G_unet_2blocks.pth", "https://github.com/sberbank-ai/sber-swap/releases/download/sber-swap-v2.0/G_unet_2blocks.pth")

 						# model for face cropping
 						app = Face_detect_crop(name="antelope", root=f"{model_dir}/ghost")
-						app.prepare(ctx_id= 0, det_thresh=0.6, det_size=(640,640))
+						app.prepare(ctx_id=0, det_thresh=0.6, det_size=(640, 640))

 						# main model for generation
 						G = AEI_Net(args.backbone, num_blocks=args.num_blocks, c_id=512)
 						G.eval()
-						G.load_state_dict(torch.load(args.G_path, map_location=torch.device('cpu')))
+						G.load_state_dict(torch.load(args.G_path, map_location=torch.device("cuda" if prefer_gpu else "cpu")))
 						G = G.cuda()
 						G = G.half()

 						# arcface model to get face embedding
 						netArc = iresnet100(fp16=False)
 						netArc.load_state_dict(torch.load(f'{model_dir}/ghost/backbone.pth'))
-						netArc=netArc.cuda()
+						netArc = netArc.cuda()
 						netArc.eval()

-						# model to get face landmarks 
+						# model to get face landmarks
 						handler = Handler(f'{self.Unprompted.base_dir}/lib_unprompted/ghost/coordinate_reg/model/2d106det', 0, root=f"{model_dir}/ghost", ctx_id=0, det_size=640)

 						# model to make superres of face, set use_sr=True if you want to use super resolution or use_sr=False if you don't
@ -271,44 +298,44 @@ class Shortcode():
 							model.netG.train()
 						else:
 							model = None
-						
+
 						self.fs_pipeline[swap_method]["model"] = {}
 						self.fs_pipeline[swap_method]["model"]["app"] = app
 						self.fs_pipeline[swap_method]["model"]["G"] = G
 						self.fs_pipeline[swap_method]["model"]["netArc"] = netArc
 						self.fs_pipeline[swap_method]["model"]["handler"] = handler
 						self.fs_pipeline[swap_method]["model"]["model"] = model
-					
+
 					return app, G, netArc, handler, model

 				app, G, netArc, handler, model = init_models(args)
-				
+
 				# get crops from source images
 				# print('List of source paths: ',args.source_paths)
 				source = []
 				try:
-					for source_img in args.source_paths: 
+					for source_img in args.source_paths:
 						img = cv2.imread(source_img)
 						img = crop_face(img, app, args.crop_size)[0]
 						source.append(img[:, :, ::-1])
 				except TypeError:
 					self.log.error("Could not parse face from the image in given filepath.")
 					return ""
-					
+
 				target_full = helpers.pil_to_cv2(orig_img)
 				full_frames = [target_full]
-				
+
 				# get target faces that are used for swap
 				set_target = True

 				target = [crop_face(target_full, app, args.crop_size)[0]]
-				
+
 				# start = time.time()
 				final_frames_list, crop_frames_list, full_frames, tfm_array_list = model_inference(full_frames, source, target, netArc, G, app, set_target, similarity_th=args.similarity_th, crop_size=args.crop_size, BS=args.batch_size)
-				
+
 				result = get_final_image(final_frames_list, crop_frames_list, full_frames[0], tfm_array_list, handler)
-				result = Image.fromarray(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))						
-			
+				result = Image.fromarray(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))
+
 			# TODO: SimSwap pipeline does not play well with WebUI torch load functions e.g.
 			# ModuleNotFoundError: No module named 'models.arcface_models'
 			elif swap_method == "simswap":
@ -319,9 +346,9 @@ class Shortcode():
 				# import sys
 				# def add_path(path):
 				# 	if path not in sys.path:
-				# 		sys.path.insert(0, path)	
+				# 		sys.path.insert(0, path)
 				# path = osp.join(self.Unprompted.base_dir, "lib_unprompted/simswap/models")
-				# add_path(path)									
+				# add_path(path)

 				from torchvision import transforms
 				from lib_unprompted.simswap.insightface_func.face_detect_crop_single import Face_detect_crop
@ -466,7 +493,7 @@ class Shortcode():
 					else:
 						net = None

-					result = reverse2wholeimage(b_align_crop_tenor_list, swap_result_list, b_mat_list, crop_size, img_b_whole, None, None, True, pasring_model =net,use_mask=opt.use_mask, norm = spNorm)
+					result = reverse2wholeimage(b_align_crop_tenor_list, swap_result_list, b_mat_list, crop_size, img_b_whole, None, None, True, pasring_model=net, use_mask=opt.use_mask, norm=spNorm)

 			# Append to output window
 			try:
@ -479,15 +506,18 @@ class Shortcode():
 			self.fs_pipeline[swap_method].pop(part, None)
 		if "face" in unload_parts: self.fs_face_path = None
 		else: self.fs_face_path = face_string
-			
+
 		return ""

 	def ui(self, gr):
 		with gr.Row():
-			gr.Image(label="New face image(s) to swap to 🡢 str",type="filepath",interactive=True)
-			gr.Image(label="Body image to perform swap on (defaults to SD output) 🡢 body",type="filepath",interactive=True)
+			gr.Image(label="New face image(s) to swap to 🡢 str", type="filepath", interactive=True)
+			gr.Image(label="Body image to perform swap on (defaults to SD output) 🡢 body", type="filepath", interactive=True)
 		gr.Dropdown(label="Faceswap pipeline(s) 🡢 pipeline", choices=self.fs_pipelines, value="insightface", multiselect=True, interactive=True, info="You can enable multiple pipelines with the standard delimiter. Please note that each pipeline must download its models on first use.")
-		gr.Checkbox(label="Export all faces as a blended safetensors embedding 🡢 export_embedding",value=False)
-		gr.Textbox(label="Path to save the exported embedding 🡢 embedding_path",placeholder="unprompted/user/faces/blended_faces.safetensors",interactive=True)
+		gr.Slider(label="Gender bonus 🡢 gender_bonus", value=50, maximum=1000, minimum=0, interactive=True, step=1)
+		gr.Slider(label="Age influence multiplier 🡢 age_influence", value=1, maximum=100, minimum=0, interactive=True, step=1)
+		gr.Checkbox(label="Export all faces as a blended safetensors embedding 🡢 export_embedding", value=False)
+		gr.Textbox(label="Path to save the exported embedding 🡢 embedding_path", placeholder="unprompted/user/faces/blended_faces.safetensors", interactive=True)
 		gr.Slider(label="Visibility 🡢 visibility", value=1.0, maximum=1.0, minimum=0.0, interactive=True, step=0.01)
-		gr.Dropdown(label="Unload pipeline parts from cache 🡢 unload", choices=["all","face","model"],multiselect=True,interactive=True,info="You can release some or all of the pipeline parts from your cache after inference. Useful for low-memory devices.")
+		gr.Checkbox(label="Prefer GPU 🡢 prefer_gpu", value=True, interactive=True)
+		gr.Dropdown(label="Unload pipeline parts from cache 🡢 unload", choices=["all", "face", "model","analyser"], multiselect=True, interactive=True, info="You can release some or all of the pipeline parts from your cache after inference. Useful for low-memory devices.")
--- a/templates/common/cn_upscaler_v0.0.2.txt
+++ b/templates/common/cn_upscaler_v0.0.2.txt
@ -3,4 +3,4 @@ A decent starting point to upscale images using the Tile model for ControlNet.
 Ideally, you should mask out the face and run the result through Facelift.
 Best ESRGAN model I'm aware of: 4x_RealisticRescaler_100000_G
 [/##]
-[if batch_real_index=0][sets sampler="Restart" steps=20 denoising_strength=0.25 cfg_scale=15 cn_0_enabled=1 cn_0_model=ip-adapter-plus-face_sd15 cn_0_module=ip-adapter_clip_sd15 cn_0_weight=0.5 cn_0_pixel_perfect=0 negative_prompt="rfneg UnrealisticDream BadDream BeyondV3-neg" cn_1_enabled=1 cn_1_module=inpaint_only cn_1_model=inpaint cn_1_weight=1.0 cn_1_guidance_end=1.0 cn_1_control_mode=2], best quality (worst quality:-1)[/if]
+[if batch_real_index=0][sets sampler="Restart" steps=20 denoising_strength=0.25 cfg_scale=15 negative_prompt="rfneg UnrealisticDream BadDream BeyondV3-neg" cn_0_enabled=1 cn_0_model=ip-adapter-plus-face_sd15 cn_0_module=ip-adapter_clip_sd15 cn_0_weight=0.5 cn_0_pixel_perfect=0 cn_1_enabled=1 cn_1_module=inpaint_only cn_1_model=inpaint cn_1_weight=1.0 cn_1_guidance_end=1.0 cn_1_control_mode=2], best quality (worst quality:-1)[/if]
--- a/templates/common/functions/facelift.png
+++ b/templates/common/functions/facelift.png
--- a/templates/common/functions/facelift.txt
+++ b/templates/common/functions/facelift.txt
@ -1,4 +1,6 @@
 [template name="Facelift v0.1.1"]
+![Preview]([base_dir]/facelift.png)
+
 An all-in-one solution for performing faceswaps by combining different models and postprocessing techniques.
 [/template]
 [wizard row]
--- a/templates/common/functions/magic_spice.png
+++ b/templates/common/functions/magic_spice.png
--- a/templates/common/functions/magic_spice.txt
+++ b/templates/common/functions/magic_spice.txt
@ -0,0 +1,79 @@
+[template name="Magic Spice v0.0.1"]
+![Preview]([base_dir]/magic_spice.png)
+This template elevates your prompts using techniques from Fooocus and elsewhere. It helps ensure high-quality images regardless of the simplicity of your prompt. **Some spices may yield NSFW terms due to GPT-2 prompt expansion.**
+
+<details><summary>📚 Documentation</summary>
+
+<details><summary>What is a "spice?"</summary>
+
+A spice is a prompt template that uses a set of techniques to enhance the quality of the generated image. It can include anything from adding extra networks to using a negative prompt to using fluff terms.
+</details>
+
+<details><summary>Model compatibility</summary>
+
+Spices are model-agnostic, meaning they are compatible with both Stable Diffusion 1.5 and SDXL checkpoints. Some settings such as the aspect ratio are automatically adjusted based on the architecture you're using.
+</details>
+
+<details><summary>Quality vs adherence</summary>
+
+Optimizing for quality means that the model will try to generate the best possible image, even if it doesn't strictly adhere to the prompt. This can be useful for prompts that are too simple or too complex. However, if the spice strays too far from your intentions, try disabling GPT-2 prompt expansion and the use of negative prompts.
+</details>
+
+</details>
+[/template]
+[set subject _new _label="Subject" _info="Enter a prompt to enhance." _max_lines=20 _lines=3]Statue of God[/set]
+
+[set style_preset _new _info="May download extra dependencies on first use." _ui="dropdown" _choices="none|{filelist '%BASE_DIR%/templates/common/presets/magic_spice/*.*' _basename _hide_ext}" _label="Choose Your Spice"]allspice_v1[/set]
+
+[set aspect_ratio _new _ui="radio" _choices="■ Square|↕️ Portrait|↔️ Landscape|Custom"]■ Square[/set]
+
+[wizard accordion _label="⚙️ Advanced Settings"]
+	[set inference_preset _new _info="Locks CFG scale, sampler method, etc. to recommended values" _label="Inference Preset" _ui="dropdown" _choices="none|{filelist '%BASE_DIR%/templates/common/presets/txt2img/*.*' _basename _hide_ext}"]restart_v1[/set]
+	[set do_fluff _new _label="Use fluff terms" _ui="checkbox"]1[/set]
+	[set do_gpt _new _label="Use GPT-2 prompt expansion" _ui="checkbox"]1[/set]
+	[set do_networks _new _label="Use extra networks" _ui="checkbox"]1[/set]
+	[set do_negatives _new _label="Use negative prompt" _ui="checkbox"]1[/set]
+	[set do_autotone _new _label="Fix contrast issues" _ui="checkbox"]1[/set]
+[/wizard]
+
+[if "style_preset != 'none'"]
+	[call "common/presets/magic_spice/{get style_preset}"]
+[/if]
+[else]
+	[get subject]
+[/else]
+
+
+[if "inference_preset != 'none'"]
+	[call "common/presets/txt2img/{get inference_preset}"]
+[/if]
+
+[if sd_base="sdxl"]
+	[switch aspect_ratio]
+		[case "■ Square"]
+			[sets width=1024 height=1024]
+		[/case]
+		[case "↕️ Portrait"]
+			[sets width=768 height=1344]
+		[/case]
+		[case "↔️ Landscape"]
+			[sets width=1344 height=768]
+		[/case]
+	[/switch]
+[/if][else]
+	[switch aspect_ratio]
+		[case "■ Square"]
+			[sets width=512 height=512]
+		[/case]
+		[case "↕️ Portrait"]
+			[sets width=512 height=768]
+		[/case]
+		[case "↔️ Landscape"]
+			[sets width=768 height=512]
+		[/case]
+	[/switch]
+[/else]
+
+[if do_autotone]
+	[after][autotone][/after]
+[/if]
--- a/templates/common/presets/controlnet/dev.txt
+++ b/templates/common/presets/controlnet/dev.txt
@ -1,11 +0,0 @@
-[if "inference_preset == 'none'"]
-	[logs "Applying optimal inference settings for the Vivarium preset..."]
-	[sets cfg_scale=7.5 sampler_name="Euler" steps=20 denoising_strength=1.0 mask_blur=2 mask_blur_x=2 mask_blur_y=2 inpaint_full_res=1 inpaint_full_res_padding=0 interrogate=0 mask_method=none]
-	[img2img_autosize][civitai lora "epiCRealismHelper" 0.5 _id=110334 _debug][civitai lora "SimplePositive_v1_AutoRunMech" _mvid=159384]
-	[set negative_prompt _append]
-		([civitai embedding "rfneg" _id="120412"] [civitai embedding "UnrealisticDream" 1.0 _mvid="77173"] [civitai embedding "BadDream" _mvid="77169"] [civitai embedding "BeyondNegativev2-neg" _mvid="119407"]::0.95)
-	[/set]
-[/if]
-[else]
-	[logs "For optimal results with Vivarium, set your inference_preset to none" _level="warning"]
-[/else]
--- a/templates/common/presets/facelift/best_quality_v4.txt
+++ b/templates/common/presets/facelift/best_quality_v4.txt
@ -1 +1 @@
-[restore_faces unload="{get unload}" method=gfpgan image="{get body}"][faceswap "{get faces}" unload="{get unload_all}" visibility=0.75][restore_faces unload="{get unload}" method=gpen][upscale models="TGHQFace8x_500k|4xFaceUpSharpLDAT|4x-UltraSharp|R-ESRGAN 4x+" scale=1 limit=1 visibility=0.8 keep_res]
+[restore_faces unload="{get unload}" method=gfpgan image="{get body}"][faceswap "{get faces}" unload="{get unload_all}" visibility=0.75][restore_faces unload="{get unload}" method=gpen]
--- a/templates/common/presets/facelift/experimental.txt
+++ b/templates/common/presets/facelift/experimental.txt
@ -1 +1 @@
-[restore_faces method=codeformer image="{get body}"][faceswap "{get face}" unload="{get unload_all}" pipeline="ghost" ][call common/presets/txt2img/restart_v1][zoom_enhance replacement="face, best quality, hdr"][faceswap "{get face}" unload="{get unload_all}"][restore_faces unload="{get unload}"]
+[faceswap "{get faces}" unload="{get unload_all}" visibility=0.75][restore_faces unload="{get unload}" method=gpen][upscale models="TGHQFace8x_500k|4xFaceUpSharpLDAT|4x-UltraSharp|R-ESRGAN 4x+" scale=1 limit=1 visibility=0.8 keep_res]
--- a/templates/common/presets/facelift/fast_v1.txt
+++ b/templates/common/presets/facelift/fast_v1.txt
@ -1 +1 @@
-[faceswap "{get faces}" unload="{get unload_all}" body="{get body}"][restore_faces unload="{get unload}"]
+[faceswap "{get faces}" unload="{get unload_all}" body="{get body}"][restore_faces unload="{get unload}" method=gpen]
--- a/templates/common/presets/img2img/dev.txt
+++ b/templates/common/presets/img2img/dev.txt
@ -1 +0,0 @@
-[sets cfg_scale=4 sampler_name="DPM++ SDE" steps=20 denoising_strength=0.67 mask_blur=0]
--- a/templates/common/presets/magic_spice/allspice_v1.txt
+++ b/templates/common/presets/magic_spice/allspice_v1.txt
@ -0,0 +1,23 @@
+[set fluff]best quality, high detail[/set]
+[if do_gpt]
+	[gpt max_length=300]([get subject]:1.1), [if do_fluff][get fluff][/if] BREAK [/gpt]
+[/if]
+[else][get subject] [if do_fluff], [get fluff][/if][/else]
+[if sd_base="sdxl"]
+	[if do_networks]
+		[civitai _file="sdxl_offset_example_v10" _id=137511 _weight=0.5]
+	[/if]
+	[if do_negatives]
+		[set negative_prompt _append]text, watermark, low-quality, signature, moiré pattern, downsampling, aliasing, distorted, blurry, glossy, blur, jpeg artifacts, compression artifacts, poorly drawn, low-resolution, bad, distortion, twisted, excessive, exaggerated pose, exaggerated limbs, grainy, symmetrical, duplicate, error, pattern, beginner, pixelated, fake, hyper, glitch, overexposed, high-contrast, bad-contrast[/set]
+	[/if]
+[/if]
+[else]
+	[if do_negatives]
+		[if do_networks]
+			[set negative_prompt _append][worst quality:worst quality, deviantart, [civitai _file=badhandv4 _id=16993], [civitai _file=rfneg _id=120412], ([civitai _file=UnrealisticDream _id=72437 _mid=77173]:1.2) [civitai _file=BeyondV4-neg _id=108821], [civitai _file=difConsistency_negative_v2 _id=87375], [civitai _file=epiCPhotoGasm-colorfulPhoto-neg _id=132719], [civitai _file=PA7_UnRealistic-Neg_v2-neg _id=208852 _mid=235232], [civitai _file=BadDream _id=72437] [civitai _file=realisticvision-negative-embedding _id=36070]:3][/set]
+		[/if]
+		[else]
+			[set negative_prompt _append]worst quality[/set]
+		[/else]
+	[/if]
+[/else]
--- a/templates/common/presets/magic_spice/booru_spice_v1.txt
+++ b/templates/common/presets/magic_spice/booru_spice_v1.txt
@ -0,0 +1,13 @@
+[set fluff]anime screencap BREAK score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, official art[/set]
+[if do_gpt]
+	[gpt max_length=150 model="FredZhang7/distilgpt2-stable-diffusion-v2"][get subject], [if do_fluff], [get fluff][/if][/gpt]
+[/if]
+[else][get subject][/else][if do_fluff], [get fluff][/if]
+[if do_networks]
+	[if sd_base="sdxl"]
+		[civitai _file="sdxl_offset_example_v10" _id=137511 _weight=0.5]
+	[/if]
+[/if]
+[if do_negatives]
+	[set negative_prompt _append]line art, watermark, logo, (worst quality:1.5), (low quality:1.5), (normal quality:1.5), lowres, bad anatomy, bad hands, multiple eyebrow, (cropped), extra limb, missing limbs, deformed hands, long neck, long body, (bad hands), signature, username, artist name, conjoined fingers, deformed fingers, error,(deformed|distorted|disfigured:1.21), poorly drawn, bad anatomy, wrong anatomy, mutation, mutated, (mutated hands AND fingers:1.21), bad hands, bad fingers, loss of a limb, extra limb, missing limb, floating limbs, amputation, deformed, black and white, disfigured, low contrast[/set]
+[/if]
--- a/templates/common/presets/magic_spice/photo_spice_v1.txt
+++ b/templates/common/presets/magic_spice/photo_spice_v1.txt
@ -0,0 +1,25 @@
+[set fluff]BREAK high detail RAW photo, colorful, best quality, 4k resolution, professional photography, extremely detailed, film grain[/set]
+[if do_gpt]
+	[gpt max_length=150 model="daspartho/prompt-extend"][get subject] [if do_fluff][get fluff][/if] BREAK [/gpt]
+[/if][else][get subject] [get fluff], [/else]
+[if sd_base="sdxl"]
+	[if do_networks]
+		[civitai _file="RMSDXL_Photo" _id=250381 _weight=1.0][civitai _file="sdxl_offset_example_v10" _id=137511 _weight=0.5]
+	[/if]
+	[if do_negatives]
+		[set negative_prompt _append]text, watermark, low-quality, signature, moiré pattern, downsampling, aliasing, distorted, blurry, glossy, blur, jpeg artifacts, compression artifacts, poorly drawn, low-resolution, bad, distortion, twisted, excessive, exaggerated pose, exaggerated limbs, grainy, symmetrical, duplicate, error, pattern, beginner, pixelated, fake, hyper, glitch, overexposed, high-contrast, bad-contrast[/set]
+	[/if]
+[/if]
+[else]
+	[if do_networks]
+		[civitai lora "difConsistency_detail" 0.2 _id=87378]
+	[/if]
+	[if do_negatives]
+		[if do_networks]
+			[set negative_prompt _append][worst quality:worst quality, deviantart, [civitai _file=badhandv4 _id=16993], [civitai _file=rfneg _id=120412], ([civitai _file=UnrealisticDream _id=72437 _mid=77173]:1.2) [civitai _file=BeyondV4-neg _id=108821], [civitai _file=difConsistency_negative_v2 _id=87375], [civitai _file=epiCPhotoGasm-colorfulPhoto-neg _id=132719], [civitai _file=PA7_UnRealistic-Neg_v2-neg _id=208852 _mid=235232], [civitai _file=BadDream _id=72437] [civitai _file=realisticvision-negative-embedding _id=36070]:3][/set]
+		[/if]
+		[else]
+			[set negative_prompt _append]worst quality[/set]
+		[/else]
+	[/if]
+[/else]
--- a/templates/common/presets/txt2img/dpm_lightning_4step_merged_v1.txt
+++ b/templates/common/presets/txt2img/dpm_lightning_4step_merged_v1.txt
@ -0,0 +1,2 @@
+[# Note: This preset is only compatible with WebUI Forge and assumes that the Lightning lora has been merged into the active model.]
+[sets cfg_scale=2 sampler_name="DPM++ 2M SDE SGMUniform" steps=6]
--- a/templates/common/presets/txt2img/dpm_lightning_8step_v1.txt
+++ b/templates/common/presets/txt2img/dpm_lightning_8step_v1.txt
@ -0,0 +1,2 @@
+[# Note: This preset is only compatible with WebUI Forge and requires the SDXL Lightning 8-step LORA.]
+<lora:sdxl_lightning_8step_lora:1.0>[sets cfg_scale=2 sampler_name="DPM++ 2M SDE SGMUniform" steps=8]
--- a/templates/common/presets/txt2img/restart_fast_v1.txt
+++ b/templates/common/presets/txt2img/restart_fast_v1.txt
@ -0,0 +1 @@
+[sets cfg_scale=7.5 sampler_name="Restart" steps=12]
				`@ -1 +0,0 @@`
				`[sets cfg_scale=4 sampler_name="DPM++ SDE" steps=20 denoising_strength=0.67 mask_blur=0]`
				`@ -0,0 +1 @@`
				`[sets cfg_scale=7.5 sampler_name="Restart" steps=12]`