mirror of https://github.com/vladmandic/automatic
update precommit hooks
Signed-off-by: Vladimir Mandic <mandic00@live.com>pull/4039/head
parent
d522982bdb
commit
71a18dcf74
|
|
@ -25,15 +25,24 @@ repos:
|
||||||
- id: check-case-conflict
|
- id: check-case-conflict
|
||||||
- id: check-merge-conflict
|
- id: check-merge-conflict
|
||||||
- id: check-symlinks
|
- id: check-symlinks
|
||||||
- id: check-yaml
|
- id: check-illegal-windows-names
|
||||||
|
- id: check-merge-conflict
|
||||||
|
- id: detect-private-key
|
||||||
- id: check-builtin-literals
|
- id: check-builtin-literals
|
||||||
- id: check-case-conflict
|
- id: check-case-conflict
|
||||||
- id: check-json
|
|
||||||
- id: check-symlinks
|
- id: check-symlinks
|
||||||
|
- id: check-yaml
|
||||||
|
- id: check-json
|
||||||
- id: check-toml
|
- id: check-toml
|
||||||
- id: check-xml
|
- id: check-xml
|
||||||
- id: end-of-file-fixer
|
- id: end-of-file-fixer
|
||||||
- id: mixed-line-ending
|
- id: mixed-line-ending
|
||||||
|
- id: check-executables-have-shebangs
|
||||||
|
exclude: |
|
||||||
|
(?x)^(
|
||||||
|
.*.bat|
|
||||||
|
.*.ps1
|
||||||
|
)$
|
||||||
- id: trailing-whitespace
|
- id: trailing-whitespace
|
||||||
exclude: |
|
exclude: |
|
||||||
(?x)^(
|
(?x)^(
|
||||||
|
|
|
||||||
|
|
@ -29,8 +29,8 @@ Any code commit is validated before merge
|
||||||
|
|
||||||
`SD.Next` library can establish external connections *only* for following purposes and *only* when explicitly configured by user:
|
`SD.Next` library can establish external connections *only* for following purposes and *only* when explicitly configured by user:
|
||||||
|
|
||||||
- Download extensions and themes indexes from automatically updated indexes
|
- Download extensions and themes indexes from automatically updated indexes
|
||||||
- Download required packages and repositories from GitHub during installation/upgrade
|
- Download required packages and repositories from GitHub during installation/upgrade
|
||||||
- Download installed/enabled extensions
|
- Download installed/enabled extensions
|
||||||
- Download models from CivitAI and/or Huggingface when instructed by user
|
- Download models from CivitAI and/or Huggingface when instructed by user
|
||||||
- Submit benchmark info upon user interaction
|
- Submit benchmark info upon user interaction
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
# (Generic) EfficientNets for PyTorch
|
# (Generic) EfficientNets for PyTorch
|
||||||
|
|
||||||
A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter efficient architectures derived from the MobileNet V1/V2 block sequence, including those found via automated neural architecture search.
|
A 'generic' implementation of EfficientNet, MixNet, MobileNetV3, etc. that covers most of the compute/parameter efficient architectures derived from the MobileNet V1/V2 block sequence, including those found via automated neural architecture search.
|
||||||
|
|
||||||
All models are implemented by GenEfficientNet or MobileNetV3 classes, with string based architecture definitions to configure the block layouts (idea from [here](https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mnasnet_models.py))
|
All models are implemented by GenEfficientNet or MobileNetV3 classes, with string based architecture definitions to configure the block layouts (idea from [here](https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mnasnet_models.py))
|
||||||
|
|
||||||
|
|
@ -20,7 +20,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
|
||||||
* 4.5M param MobileNet-V2 110d @ 75%
|
* 4.5M param MobileNet-V2 110d @ 75%
|
||||||
* 6.1M param MobileNet-V2 140 @ 76.5%
|
* 6.1M param MobileNet-V2 140 @ 76.5%
|
||||||
* 5.8M param MobileNet-V2 120d @ 77.3%
|
* 5.8M param MobileNet-V2 120d @ 77.3%
|
||||||
|
|
||||||
### March 23, 2020
|
### March 23, 2020
|
||||||
* Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
|
* Add EfficientNet-Lite models w/ weights ported from [Tensorflow TPU](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/lite)
|
||||||
* Add PyTorch trained MobileNet-V3 Large weights with 75.77% top-1
|
* Add PyTorch trained MobileNet-V3 Large weights with 75.77% top-1
|
||||||
|
|
@ -39,7 +39,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
|
||||||
### Nov 22, 2019
|
### Nov 22, 2019
|
||||||
* New top-1 high! Ported official TF EfficientNet AdvProp (https://arxiv.org/abs/1911.09665) weights and B8 model spec. Created a new set of `ap` models since they use a different
|
* New top-1 high! Ported official TF EfficientNet AdvProp (https://arxiv.org/abs/1911.09665) weights and B8 model spec. Created a new set of `ap` models since they use a different
|
||||||
preprocessing (Inception mean/std) from the original EfficientNet base/AA/RA weights.
|
preprocessing (Inception mean/std) from the original EfficientNet base/AA/RA weights.
|
||||||
|
|
||||||
### Nov 15, 2019
|
### Nov 15, 2019
|
||||||
* Ported official TF MobileNet-V3 float32 large/small/minimalistic weights
|
* Ported official TF MobileNet-V3 float32 large/small/minimalistic weights
|
||||||
* Modifications to MobileNet-V3 model and components to support some additional config needed for differences between TF MobileNet-V3 and mine
|
* Modifications to MobileNet-V3 model and components to support some additional config needed for differences between TF MobileNet-V3 and mine
|
||||||
|
|
@ -50,7 +50,7 @@ All models are implemented by GenEfficientNet or MobileNetV3 classes, with strin
|
||||||
* Add JIT optimized mem-efficient Swish/Mish autograd.fn in addition to memory-efficient autgrad.fn
|
* Add JIT optimized mem-efficient Swish/Mish autograd.fn in addition to memory-efficient autgrad.fn
|
||||||
* Activation factory to select best version of activation by name or override one globally
|
* Activation factory to select best version of activation by name or override one globally
|
||||||
* Add pretrained checkpoint load helper that handles input conv and classifier changes
|
* Add pretrained checkpoint load helper that handles input conv and classifier changes
|
||||||
|
|
||||||
### Oct 27, 2019
|
### Oct 27, 2019
|
||||||
* Add CondConv EfficientNet variants ported from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/condconv
|
* Add CondConv EfficientNet variants ported from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet/condconv
|
||||||
* Add RandAug weights for TF EfficientNet B5 and B7 from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
|
* Add RandAug weights for TF EfficientNet B5 and B7 from https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
|
||||||
|
|
@ -75,8 +75,8 @@ Implemented models include:
|
||||||
* MobileNet-V3 (https://arxiv.org/abs/1905.02244)
|
* MobileNet-V3 (https://arxiv.org/abs/1905.02244)
|
||||||
* FBNet-C (https://arxiv.org/abs/1812.03443)
|
* FBNet-C (https://arxiv.org/abs/1812.03443)
|
||||||
* Single-Path NAS (https://arxiv.org/abs/1904.02877)
|
* Single-Path NAS (https://arxiv.org/abs/1904.02877)
|
||||||
|
|
||||||
I originally implemented and trained some these models with code [here](https://github.com/rwightman/pytorch-image-models), this repository contains just the GenEfficientNet models, validation, and associated ONNX/Caffe2 export code.
|
I originally implemented and trained some these models with code [here](https://github.com/rwightman/pytorch-image-models), this repository contains just the GenEfficientNet models, validation, and associated ONNX/Caffe2 export code.
|
||||||
|
|
||||||
## Pretrained
|
## Pretrained
|
||||||
|
|
||||||
|
|
@ -117,7 +117,7 @@ More pretrained models to come...
|
||||||
|
|
||||||
The weights ported from Tensorflow checkpoints for the EfficientNet models do pretty much match accuracy in Tensorflow once a SAME convolution padding equivalent is added, and the same crop factors, image scaling, etc (see table) are used via cmd line args.
|
The weights ported from Tensorflow checkpoints for the EfficientNet models do pretty much match accuracy in Tensorflow once a SAME convolution padding equivalent is added, and the same crop factors, image scaling, etc (see table) are used via cmd line args.
|
||||||
|
|
||||||
**IMPORTANT:**
|
**IMPORTANT:**
|
||||||
* Tensorflow ported weights for EfficientNet AdvProp (AP), EfficientNet EdgeTPU, EfficientNet-CondConv, EfficientNet-Lite, and MobileNet-V3 models use Inception style (0.5, 0.5, 0.5) for mean and std.
|
* Tensorflow ported weights for EfficientNet AdvProp (AP), EfficientNet EdgeTPU, EfficientNet-CondConv, EfficientNet-Lite, and MobileNet-V3 models use Inception style (0.5, 0.5, 0.5) for mean and std.
|
||||||
* Enabling the Tensorflow preprocessing pipeline with `--tf-preprocessing` at validation time will improve scores by 0.1-0.5%, very close to original TF impl.
|
* Enabling the Tensorflow preprocessing pipeline with `--tf-preprocessing` at validation time will improve scores by 0.1-0.5%, very close to original TF impl.
|
||||||
|
|
||||||
|
|
@ -130,7 +130,7 @@ To run validation w/ TF preprocessing for tf_efficientnet_b5:
|
||||||
To run validation for a model with Inception preprocessing, ie EfficientNet-B8 AdvProp:
|
To run validation for a model with Inception preprocessing, ie EfficientNet-B8 AdvProp:
|
||||||
`python validate.py /path/to/imagenet/validation/ --model tf_efficientnet_b8_ap -b 48 --num-gpu 2 --img-size 672 --crop-pct 0.954 --mean 0.5 --std 0.5`
|
`python validate.py /path/to/imagenet/validation/ --model tf_efficientnet_b8_ap -b 48 --num-gpu 2 --img-size 672 --crop-pct 0.954 --mean 0.5 --std 0.5`
|
||||||
|
|
||||||
|Model | Prec@1 (Err) | Prec@5 (Err) | Param # | Image Scaling | Image Size | Crop |
|
|Model | Prec@1 (Err) | Prec@5 (Err) | Param # | Image Scaling | Image Size | Crop |
|
||||||
|---|---|---|---|---|---|---|
|
|---|---|---|---|---|---|---|
|
||||||
| tf_efficientnet_l2_ns *tfp | 88.352 (11.648) | 98.652 (1.348) | 480 | bicubic | 800 | N/A |
|
| tf_efficientnet_l2_ns *tfp | 88.352 (11.648) | 98.652 (1.348) | 480 | bicubic | 800 | N/A |
|
||||||
| tf_efficientnet_l2_ns | TBD | TBD | 480 | bicubic | 800 | 0.961 |
|
| tf_efficientnet_l2_ns | TBD | TBD | 480 | bicubic | 800 | 0.961 |
|
||||||
|
|
@ -308,7 +308,7 @@ Scripts are included to
|
||||||
As an example, to export the MobileNet-V3 pretrained model and then run an Imagenet validation:
|
As an example, to export the MobileNet-V3 pretrained model and then run an Imagenet validation:
|
||||||
```
|
```
|
||||||
python onnx_export.py --model mobilenetv3_large_100 ./mobilenetv3_100.onnx
|
python onnx_export.py --model mobilenetv3_large_100 ./mobilenetv3_100.onnx
|
||||||
python onnx_validate.py /imagenet/validation/ --onnx-input ./mobilenetv3_100.onnx
|
python onnx_validate.py /imagenet/validation/ --onnx-input ./mobilenetv3_100.onnx
|
||||||
```
|
```
|
||||||
|
|
||||||
These scripts were tested to be working as of PyTorch 1.6 and ONNX 1.7 w/ ONNX runtime 1.4. Caffe2 compatible
|
These scripts were tested to be working as of PyTorch 1.6 and ONNX 1.7 w/ ONNX runtime 1.4. Caffe2 compatible
|
||||||
|
|
|
||||||
|
|
@ -2,24 +2,24 @@
|
||||||
|
|
||||||
This repository contains code to compute depth from a single image. It accompanies our [paper](https://arxiv.org/abs/1907.01341v3):
|
This repository contains code to compute depth from a single image. It accompanies our [paper](https://arxiv.org/abs/1907.01341v3):
|
||||||
|
|
||||||
>Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
|
>Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
|
||||||
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun
|
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun
|
||||||
|
|
||||||
|
|
||||||
and our [preprint](https://arxiv.org/abs/2103.13413):
|
and our [preprint](https://arxiv.org/abs/2103.13413):
|
||||||
|
|
||||||
> Vision Transformers for Dense Prediction
|
> Vision Transformers for Dense Prediction
|
||||||
> René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
|
> René Ranftl, Alexey Bochkovskiy, Vladlen Koltun
|
||||||
|
|
||||||
|
|
||||||
MiDaS was trained on up to 12 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS, KITTI, NYU Depth V2) with
|
MiDaS was trained on up to 12 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS, KITTI, NYU Depth V2) with
|
||||||
multi-objective optimization.
|
multi-objective optimization.
|
||||||
The original model that was trained on 5 datasets (`MIX 5` in the paper) can be found [here](https://github.com/isl-org/MiDaS/releases/tag/v2).
|
The original model that was trained on 5 datasets (`MIX 5` in the paper) can be found [here](https://github.com/isl-org/MiDaS/releases/tag/v2).
|
||||||
The figure below shows an overview of the different MiDaS models; the bubble size scales with number of parameters.
|
The figure below shows an overview of the different MiDaS models; the bubble size scales with number of parameters.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
### Setup
|
### Setup
|
||||||
|
|
||||||
1) Pick one or more models and download the corresponding weights to the `weights` folder:
|
1) Pick one or more models and download the corresponding weights to the `weights` folder:
|
||||||
|
|
||||||
|
|
@ -31,9 +31,9 @@ MiDaS 3.1
|
||||||
|
|
||||||
MiDaS 3.0: Legacy transformer models [dpt_large_384](https://github.com/isl-org/MiDaS/releases/download/v3/dpt_large_384.pt) and [dpt_hybrid_384](https://github.com/isl-org/MiDaS/releases/download/v3/dpt_hybrid_384.pt)
|
MiDaS 3.0: Legacy transformer models [dpt_large_384](https://github.com/isl-org/MiDaS/releases/download/v3/dpt_large_384.pt) and [dpt_hybrid_384](https://github.com/isl-org/MiDaS/releases/download/v3/dpt_hybrid_384.pt)
|
||||||
|
|
||||||
MiDaS 2.1: Legacy convolutional models [midas_v21_384](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_384.pt) and [midas_v21_small_256](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_small_256.pt)
|
MiDaS 2.1: Legacy convolutional models [midas_v21_384](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_384.pt) and [midas_v21_small_256](https://github.com/isl-org/MiDaS/releases/download/v2_1/midas_v21_small_256.pt)
|
||||||
|
|
||||||
1) Set up dependencies:
|
1) Set up dependencies:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
conda env create -f environment.yaml
|
conda env create -f environment.yaml
|
||||||
|
|
@ -53,7 +53,7 @@ For the OpenVINO model, install
|
||||||
```shell
|
```shell
|
||||||
pip install openvino
|
pip install openvino
|
||||||
```
|
```
|
||||||
|
|
||||||
### Usage
|
### Usage
|
||||||
|
|
||||||
1) Place one or more input images in the folder `input`.
|
1) Place one or more input images in the folder `input`.
|
||||||
|
|
@ -68,19 +68,19 @@ pip install openvino
|
||||||
[dpt_swin2_tiny_256](#model_type), [dpt_swin_large_384](#model_type), [dpt_next_vit_large_384](#model_type),
|
[dpt_swin2_tiny_256](#model_type), [dpt_swin_large_384](#model_type), [dpt_next_vit_large_384](#model_type),
|
||||||
[dpt_levit_224](#model_type), [dpt_large_384](#model_type), [dpt_hybrid_384](#model_type),
|
[dpt_levit_224](#model_type), [dpt_large_384](#model_type), [dpt_hybrid_384](#model_type),
|
||||||
[midas_v21_384](#model_type), [midas_v21_small_256](#model_type), [openvino_midas_v21_small_256](#model_type).
|
[midas_v21_384](#model_type), [midas_v21_small_256](#model_type), [openvino_midas_v21_small_256](#model_type).
|
||||||
|
|
||||||
3) The resulting depth maps are written to the `output` folder.
|
3) The resulting depth maps are written to the `output` folder.
|
||||||
|
|
||||||
#### optional
|
#### optional
|
||||||
|
|
||||||
1) By default, the inference resizes the height of input images to the size of a model to fit into the encoder. This
|
1) By default, the inference resizes the height of input images to the size of a model to fit into the encoder. This
|
||||||
size is given by the numbers in the model names of the [accuracy table](#accuracy). Some models do not only support a single
|
size is given by the numbers in the model names of the [accuracy table](#accuracy). Some models do not only support a single
|
||||||
inference height but a range of different heights. Feel free to explore different heights by appending the extra
|
inference height but a range of different heights. Feel free to explore different heights by appending the extra
|
||||||
command line argument `--height`. Unsupported height values will throw an error. Note that using this argument may
|
command line argument `--height`. Unsupported height values will throw an error. Note that using this argument may
|
||||||
decrease the model accuracy.
|
decrease the model accuracy.
|
||||||
2) By default, the inference keeps the aspect ratio of input images when feeding them into the encoder if this is
|
2) By default, the inference keeps the aspect ratio of input images when feeding them into the encoder if this is
|
||||||
supported by a model (all models except for Swin, Swin2, LeViT). In order to resize to a square resolution,
|
supported by a model (all models except for Swin, Swin2, LeViT). In order to resize to a square resolution,
|
||||||
disregarding the aspect ratio while preserving the height, use the command line argument `--square`.
|
disregarding the aspect ratio while preserving the height, use the command line argument `--square`.
|
||||||
|
|
||||||
#### via Camera
|
#### via Camera
|
||||||
|
|
||||||
|
|
@ -91,7 +91,7 @@ pip install openvino
|
||||||
python run.py --model_type <model_type> --side
|
python run.py --model_type <model_type> --side
|
||||||
```
|
```
|
||||||
|
|
||||||
The argument `--side` is optional and causes both the input RGB image and the output depth map to be shown
|
The argument `--side` is optional and causes both the input RGB image and the output depth map to be shown
|
||||||
side-by-side for comparison.
|
side-by-side for comparison.
|
||||||
|
|
||||||
#### via Docker
|
#### via Docker
|
||||||
|
|
@ -122,7 +122,7 @@ The pretrained model is also available on [PyTorch Hub](https://pytorch.org/hub/
|
||||||
|
|
||||||
See [README](https://github.com/isl-org/MiDaS/tree/master/tf) in the `tf` subdirectory.
|
See [README](https://github.com/isl-org/MiDaS/tree/master/tf) in the `tf` subdirectory.
|
||||||
|
|
||||||
Currently only supports MiDaS v2.1.
|
Currently only supports MiDaS v2.1.
|
||||||
|
|
||||||
|
|
||||||
#### via Mobile (iOS / Android)
|
#### via Mobile (iOS / Android)
|
||||||
|
|
@ -133,16 +133,16 @@ See [README](https://github.com/isl-org/MiDaS/tree/master/mobile) in the `mobile
|
||||||
|
|
||||||
See [README](https://github.com/isl-org/MiDaS/tree/master/ros) in the `ros` subdirectory.
|
See [README](https://github.com/isl-org/MiDaS/tree/master/ros) in the `ros` subdirectory.
|
||||||
|
|
||||||
Currently only supports MiDaS v2.1. DPT-based models to be added.
|
Currently only supports MiDaS v2.1. DPT-based models to be added.
|
||||||
|
|
||||||
|
|
||||||
### Accuracy
|
### Accuracy
|
||||||
|
|
||||||
We provide a **zero-shot error** $\epsilon_d$ which is evaluated for 6 different datasets
|
We provide a **zero-shot error** $\epsilon_d$ which is evaluated for 6 different datasets
|
||||||
(see [paper](https://arxiv.org/abs/1907.01341v3)). **Lower error values are better**.
|
(see [paper](https://arxiv.org/abs/1907.01341v3)). **Lower error values are better**.
|
||||||
$\color{green}{\textsf{Overall model quality is represented by the improvement}}$ ([Imp.](#improvement)) with respect to
|
$\color{green}{\textsf{Overall model quality is represented by the improvement}}$ ([Imp.](#improvement)) with respect to
|
||||||
MiDaS 3.0 DPT<sub>L-384</sub>. The models are grouped by the height used for inference, whereas the square training resolution is given by
|
MiDaS 3.0 DPT<sub>L-384</sub>. The models are grouped by the height used for inference, whereas the square training resolution is given by
|
||||||
the numbers in the model names. The table also shows the **number of parameters** (in millions) and the
|
the numbers in the model names. The table also shows the **number of parameters** (in millions) and the
|
||||||
**frames per second** for inference at the training resolution (for GPU RTX 3090):
|
**frames per second** for inference at the training resolution (for GPU RTX 3090):
|
||||||
|
|
||||||
| MiDaS Model | DIW </br><sup>WHDR</sup> | Eth3d </br><sup>AbsRel</sup> | Sintel </br><sup>AbsRel</sup> | TUM </br><sup>δ1</sup> | KITTI </br><sup>δ1</sup> | NYUv2 </br><sup>δ1</sup> | $\color{green}{\textsf{Imp.}}$ </br><sup>%</sup> | Par.</br><sup>M</sup> | FPS</br><sup> </sup> |
|
| MiDaS Model | DIW </br><sup>WHDR</sup> | Eth3d </br><sup>AbsRel</sup> | Sintel </br><sup>AbsRel</sup> | TUM </br><sup>δ1</sup> | KITTI </br><sup>δ1</sup> | NYUv2 </br><sup>δ1</sup> | $\color{green}{\textsf{Imp.}}$ </br><sup>%</sup> | Par.</br><sup>M</sup> | FPS</br><sup> </sup> |
|
||||||
|
|
@ -171,16 +171,16 @@ the numbers in the model names. The table also shows the **number of parameters*
|
||||||
| [v3.1 LeViT<sub>224</sub>](https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_levit_224.pt)$\tiny{\square}$ | **0.1314** | **0.1206** | **0.3148** | **18.21** | **15.27*** | **8.64*** | $\color{green}{\textsf{-40}}$ | **51** | **73** |
|
| [v3.1 LeViT<sub>224</sub>](https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_levit_224.pt)$\tiny{\square}$ | **0.1314** | **0.1206** | **0.3148** | **18.21** | **15.27*** | **8.64*** | $\color{green}{\textsf{-40}}$ | **51** | **73** |
|
||||||
|
|
||||||
* No zero-shot error, because models are also trained on KITTI and NYU Depth V2\
|
* No zero-shot error, because models are also trained on KITTI and NYU Depth V2\
|
||||||
$\square$ Validation performed at **square resolution**, either because the transformer encoder backbone of a model
|
$\square$ Validation performed at **square resolution**, either because the transformer encoder backbone of a model
|
||||||
does not support non-square resolutions (Swin, Swin2, LeViT) or for comparison with these models. All other
|
does not support non-square resolutions (Swin, Swin2, LeViT) or for comparison with these models. All other
|
||||||
validations keep the aspect ratio. A difference in resolution limits the comparability of the zero-shot error and the
|
validations keep the aspect ratio. A difference in resolution limits the comparability of the zero-shot error and the
|
||||||
improvement, because these quantities are averages over the pixels of an image and do not take into account the
|
improvement, because these quantities are averages over the pixels of an image and do not take into account the
|
||||||
advantage of more details due to a higher resolution.\
|
advantage of more details due to a higher resolution.\
|
||||||
Best values per column and same validation height in bold
|
Best values per column and same validation height in bold
|
||||||
|
|
||||||
#### Improvement
|
#### Improvement
|
||||||
|
|
||||||
The improvement in the above table is defined as the relative zero-shot error with respect to MiDaS v3.0
|
The improvement in the above table is defined as the relative zero-shot error with respect to MiDaS v3.0
|
||||||
DPT<sub>L-384</sub> and averaging over the datasets. So, if $\epsilon_d$ is the zero-shot error for dataset $d$, then
|
DPT<sub>L-384</sub> and averaging over the datasets. So, if $\epsilon_d$ is the zero-shot error for dataset $d$, then
|
||||||
the $\color{green}{\textsf{improvement}}$ is given by $100(1-(1/6)\sum_d\epsilon_d/\epsilon_{d,\rm{DPT_{L-384}}})$%.
|
the $\color{green}{\textsf{improvement}}$ is given by $100(1-(1/6)\sum_d\epsilon_d/\epsilon_{d,\rm{DPT_{L-384}}})$%.
|
||||||
|
|
||||||
|
|
@ -193,14 +193,14 @@ and v2.0 Large<sub>384</sub> respectively instead of v3.0 DPT<sub>L-384</sub>.
|
||||||
Zoom in for better visibility
|
Zoom in for better visibility
|
||||||

|

|
||||||
|
|
||||||
### Speed on Camera Feed
|
### Speed on Camera Feed
|
||||||
|
|
||||||
Test configuration
|
Test configuration
|
||||||
- Windows 10
|
- Windows 10
|
||||||
- 11th Gen Intel Core i7-1185G7 3.00GHz
|
- 11th Gen Intel Core i7-1185G7 3.00GHz
|
||||||
- 16GB RAM
|
- 16GB RAM
|
||||||
- Camera resolution 640x480
|
- Camera resolution 640x480
|
||||||
- openvino_midas_v21_small_256
|
- openvino_midas_v21_small_256
|
||||||
|
|
||||||
Speed: 22 FPS
|
Speed: 22 FPS
|
||||||
|
|
||||||
|
|
@ -251,9 +251,9 @@ If you use a DPT-based model, please also cite:
|
||||||
|
|
||||||
### Acknowledgements
|
### Acknowledgements
|
||||||
|
|
||||||
Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [Next-ViT](https://github.com/bytedance/Next-ViT).
|
Our work builds on and uses code from [timm](https://github.com/rwightman/pytorch-image-models) and [Next-ViT](https://github.com/bytedance/Next-ViT).
|
||||||
We'd like to thank the authors for making these libraries available.
|
We'd like to thank the authors for making these libraries available.
|
||||||
|
|
||||||
### License
|
### License
|
||||||
|
|
||||||
MIT License
|
MIT License
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue