History

ThereforeGames 92b959b121 v11.0.0		2024-05-31 20:22:32 -04:00
..
CLIP/clip	v11.0.0	2024-05-31 20:22:32 -04:00
IRNet	v11.0.0	2024-05-31 20:22:32 -04:00
dataset	v11.0.0	2024-05-31 20:22:32 -04:00
figs	v11.0.0	2024-05-31 20:22:32 -04:00
loss	v11.0.0	2024-05-31 20:22:32 -04:00
model	v11.0.0	2024-05-31 20:22:32 -04:00
scripts	v11.0.0	2024-05-31 20:22:32 -04:00
utils	v11.0.0	2024-05-31 20:22:32 -04:00
LICENSE	v11.0.0	2024-05-31 20:22:32 -04:00
README.md	v11.0.0	2024-05-31 20:22:32 -04:00
__init__.py	v11.0.0	2024-05-31 20:22:32 -04:00
adamw.py	v11.0.0	2024-05-31 20:22:32 -04:00
args.py	v11.0.0	2024-05-31 20:22:32 -04:00
demo.py	v11.0.0	2024-05-31 20:22:32 -04:00
environment.yml	v11.0.0	2024-05-31 20:22:32 -04:00
logger.py	v11.0.0	2024-05-31 20:22:32 -04:00
train_stage1.py	v11.0.0	2024-05-31 20:22:32 -04:00
train_stage2.py	v11.0.0	2024-05-31 20:22:32 -04:00
validate.py	v11.0.0	2024-05-31 20:22:32 -04:00
validate_referit.py	v11.0.0	2024-05-31 20:22:32 -04:00

README.md

Referring Image Segmentation Using Text Supervision

Official PyTorch implementation of TRIS, from the following paper:

Referring Image Segmentation Using Text Supervision. ICCV 2023.
Fang Liu*, Yuhao Liu*, Yuqiu Kong, Ke Xu, Lihe Zhang, Baocai Yin, Gerhard Hancke, Rynson Lau

Environment

We recommend running the code using Pytorch 1.13.1 or higher version.

Dataset

RefCOCO/+/g

Download refer annotations from refer.
Download train2014 images from COCO.

├── data/
|   ├── train2014
|   ├── refer
|   |   ├── refcocog
|   |   |   ├── instances.json
|   |   |   ├── refs(google).p
|   |   |   ├── refs(umd).p
|   |   ├── refcoco

ReferIt

Download parsed annotations from our link.
Download saiapr_tc-12 images from referit.

├── data/
|   ├── referit
|   |   ├── annotations
|   |   |   ├── train.pickle
|   |   |   ├── test.pickle
|   |   ├── images
|   |   ├── masks

If you want to generate referit annotations by yourself, refer to MG for more details.

Evaluation

Note that we use mIoU to evaluate the accuracy of the generated masks.

Create the ./weights directory

mkdir ./weights

Download model weights using github links below and put them in ./weights.

	ReferIt	RefCOCO	RefCOCO+	G-Ref (Google)	G-Ref (UMD)
Step-1	weight	weight	weight	weight	weight
Step-2	weight	weight	weight	weight	weight

Shell for G-Ref(UMD) evaluation. Replace refcocog with refcoco, and umd with unc for RefCOCO dataset evaluation.

bash scripts/validate_stage1.sh

Demo

The output of the demo is saved in ./figs/.

python demo.py  --img figs/demo.png  --text 'man on the right'

Training

Train Step1 network on Gref (UMD) dataset.

bash scripts/train_stage1.sh

Validate and generate response maps on the Gref (UMD) train set, based on the proposed PRMS strategy (--prms). The response maps are saved in ./output/refcocog_umd/cam/ indicated by the args --cam_save_dir.

## path to save response maps and pseudo labels
dir=./output

python validate.py   --batch_size 1   --size 320   --dataset refcocog   --splitBy umd   --test_split train   --max_query_len 20   --output ./weights/refcocog_umd   --resume --pretrain  ckpt.pth   --cam_save_dir $dir/refcocog_umd/cam/   --name_save_dir $dir/refcocog_umd  --eval --prms

Train IRNet and generate pseudo masks.

cd IRNet

dir=../output
CUDA_VISIBLE_DEVICES=0,1,2,3 python run_sample_refer.py   --cam_out_dir $dir/refcocog_umd/cam   --ir_label_out_dir $dir/refcocog_umd/ir_label   --ins_seg_out_dir $dir/refcocog_umd/ins_seg   --train_list $dir/refcocog_umd/refcocog_train_names.json   --cam_eval_thres 0.15   --work_space output_refer/refcocog_umd   --num_workers 8   --irn_batch_size 96   --cam_to_ir_label_pass True   --train_irn_pass True   --make_ins_seg_pass True

Train Step2 network using the generated pseudo masks in output/refcocog_umd/ins_seg indicated by the args --pseudo_path.

cd ../

python train_stage2.py  --batch_size 48  --size 320  --dataset refcocog  --splitBy umd  --test_split val  --bert_tokenizer clip  --backbone clip-RN50  --max_query_len 20  --epoch 15  --pseudo_path output/refcocog_umd/ins_seg  --output ./weights/stage2/pseudo_refcocog_umd

Acknowledgement

This repository was based on LAVT, WWbL, CLIMS and IRNet.

Citation

If you find this repository helpful, please consider citing:

@inproceedings{liu2023referring,
  title={Referring Image Segmentation Using Text Supervision},
  author={Liu, Fang and Liu, Yuhao and Kong, Yuqiu and Xu, Ke and Zhang, Lihe and Yin, Baocai and Hancke, Gerhard and Lau, Rynson},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={22124--22134},
  year={2023}
}

Contact

If you have any questions, please feel free to reach out at fawnliu2333@gmail.com.