|
|
||
|---|---|---|
| .. | ||
| CLIP/clip | ||
| IRNet | ||
| dataset | ||
| figs | ||
| loss | ||
| model | ||
| scripts | ||
| utils | ||
| LICENSE | ||
| README.md | ||
| __init__.py | ||
| adamw.py | ||
| args.py | ||
| demo.py | ||
| environment.yml | ||
| logger.py | ||
| train_stage1.py | ||
| train_stage2.py | ||
| validate.py | ||
| validate_referit.py | ||
README.md
Referring Image Segmentation Using Text Supervision
Official PyTorch implementation of TRIS, from the following paper:
Referring Image Segmentation Using Text Supervision. ICCV 2023.
Fang Liu*, Yuhao Liu*, Yuqiu Kong, Ke Xu, Lihe Zhang, Baocai Yin, Gerhard Hancke, Rynson Lau
Environment
We recommend running the code using Pytorch 1.13.1 or higher version.
Dataset
RefCOCO/+/g
├── data/
| ├── train2014
| ├── refer
| | ├── refcocog
| | | ├── instances.json
| | | ├── refs(google).p
| | | ├── refs(umd).p
| | ├── refcoco
ReferIt
├── data/
| ├── referit
| | ├── annotations
| | | ├── train.pickle
| | | ├── test.pickle
| | ├── images
| | ├── masks
If you want to generate referit annotations by yourself, refer to MG for more details.
Evaluation
Note that we use mIoU to evaluate the accuracy of the generated masks.
- Create the
./weightsdirectory
mkdir ./weights
- Download model weights using github links below and put them in
./weights.
| ReferIt | RefCOCO | RefCOCO+ | G-Ref (Google) | G-Ref (UMD) | |
|---|---|---|---|---|---|
| Step-1 | weight | weight | weight | weight | weight |
| Step-2 | weight | weight | weight | weight | weight |
- Shell for
G-Ref(UMD)evaluation. Replacerefcocogwithrefcoco, andumdwithuncfor RefCOCO dataset evaluation.
bash scripts/validate_stage1.sh
Demo
The output of the demo is saved in ./figs/.
python demo.py --img figs/demo.png --text 'man on the right'
Training
- Train Step1 network on
Gref (UMD)dataset.
bash scripts/train_stage1.sh
- Validate and generate response maps on the Gref (UMD)
trainset, based on the proposed PRMS strategy (--prms). The response maps are saved in./output/refcocog_umd/cam/indicated by the args--cam_save_dir.
## path to save response maps and pseudo labels
dir=./output
python validate.py --batch_size 1 --size 320 --dataset refcocog --splitBy umd --test_split train --max_query_len 20 --output ./weights/refcocog_umd --resume --pretrain ckpt.pth --cam_save_dir $dir/refcocog_umd/cam/ --name_save_dir $dir/refcocog_umd --eval --prms
- Train IRNet and generate pseudo masks.
cd IRNet
dir=../output
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_sample_refer.py --cam_out_dir $dir/refcocog_umd/cam --ir_label_out_dir $dir/refcocog_umd/ir_label --ins_seg_out_dir $dir/refcocog_umd/ins_seg --train_list $dir/refcocog_umd/refcocog_train_names.json --cam_eval_thres 0.15 --work_space output_refer/refcocog_umd --num_workers 8 --irn_batch_size 96 --cam_to_ir_label_pass True --train_irn_pass True --make_ins_seg_pass True
- Train Step2 network using the generated pseudo masks in
output/refcocog_umd/ins_segindicated by the args--pseudo_path.
cd ../
python train_stage2.py --batch_size 48 --size 320 --dataset refcocog --splitBy umd --test_split val --bert_tokenizer clip --backbone clip-RN50 --max_query_len 20 --epoch 15 --pseudo_path output/refcocog_umd/ins_seg --output ./weights/stage2/pseudo_refcocog_umd
Acknowledgement
This repository was based on LAVT, WWbL, CLIMS and IRNet.
Citation
If you find this repository helpful, please consider citing:
@inproceedings{liu2023referring,
title={Referring Image Segmentation Using Text Supervision},
author={Liu, Fang and Liu, Yuhao and Kong, Yuqiu and Xu, Ke and Zhang, Lihe and Yin, Baocai and Hancke, Gerhard and Lau, Rynson},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={22124--22134},
year={2023}
}
Contact
If you have any questions, please feel free to reach out at fawnliu2333@gmail.com.