# Evaluation ## Important We use `batch_size = 1` for evaluation due to the batch inference issue in LLaMA if you do not suffer performance drop. (Left padding would cause the wrong positional embedding.) ## Dependencies ```bash pip install pycocoevalcap tqdm ``` ## Image Caption ### [Flickr30K](https://bryanplummer.com/Flickr30kEntities/)
Data Preparation ```bash mkdir -p data/flickr && cd data/flickr # download images from https://bryanplummer.com/Flickr30kEntities/ # karpathy split annotations can be downloaded from https://cs.stanford.edu/people/karpathy/deepimagesent/ # download converted files wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/flickr30k/flickr30k_karpathy_test.json wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/flickr30k/flickr30k_karpathy_train.json cd ../.. ```
Evaluate ```bash ds="flickr" checkpoint=/PATH/TO/CHECKPOINT python -m torch.distributed.launch --use-env \ --nproc_per_node ${NPROC_PER_NODE:-8} \ --nnodes ${WORLD_SIZE:-1} \ --node_rank ${RANK:-0} \ --master_addr ${MASTER_ADDR:-127.0.0.1} \ --master_port ${MASTER_PORT:-12345} \ evaluate_caption.py \ --checkpoint $checkpoint \ --dataset $ds \ --batch-size 8 \ --num-workers 2 ```
Evaluate ```bash ds="nocaps" checkpoint=/PATH/TO/CHECKPOINT python -m torch.distributed.launch --use-env \ --nproc_per_node ${NPROC_PER_NODE:-8} \ --nnodes ${WORLD_SIZE:-1} \ --node_rank ${RANK:-0} \ --master_addr ${MASTER_ADDR:-127.0.0.1} \ --master_port ${MASTER_PORT:-12345} \ evaluate_caption.py \ --checkpoint $checkpoint \ --dataset $ds \ --batch-size 8 \ --num-workers 2 ```
## [COCO](https://cocodataset.org/) > COCO images are used in VQAv2/OK-VQA/RefCOCO/RefCOCO+/RefCOCOg, make sure you have already downloaded COCO images before evaluate on these benchmarks.
Data Preparation ```bash mkdir -p data/coco && cd data/coco # download coco2014 images wget http://images.cocodataset.org/zips/train2014.zip && unzip train2014.zip wget http://images.cocodataset.org/zips/val2014.zip && unzip val2014.zip wget http://images.cocodataset.org/zips/test2015.zip && unzip test2015.zip cd ../.. ```
## General VQA ### [VQAv2](https://visualqa.org/)
Data Preparation ```bash mkdir -p data/vqav2 && cd data/vqav2 # make sure you have downloaded COCO images # download questions and annotations wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Train_mscoco.zip && unzip v2_Annotations_Train_mscoco.zip wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Train_mscoco.zip && unzip v2_Questions_Train_mscoco.zip wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Val_mscoco.zip && unzip v2_Annotations_Val_mscoco.zip wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Val_mscoco.zip && unzip v2_Questions_Val_mscoco.zip wget https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Questions_Test_mscoco.zip && unzip v2_Questions_Test_mscoco.zip # download converted files wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vqav2/vqav2_train.jsonl wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vqav2/vqav2_val.jsonl wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/vqav2/vqav2_testdev.jsonl ```
Evaluate ```bash checkpoint=/PATH/TO/CHECKPOINT for ds in "vqav2_val" "vqav2_testdev" python -m torch.distributed.launch --use-env \ --nproc_per_node ${NPROC_PER_NODE:-8} \ --nnodes ${WORLD_SIZE:-1} \ --node_rank ${RANK:-0} \ --master_addr ${MASTER_ADDR:-127.0.0.1} \ --master_port ${MASTER_PORT:-12345} \ evaluate_vqa.py \ --checkpoint $checkpoint \ --dataset $ds \ --batch-size 8 \ --num-workers 2 ```
### [OKVQA](https://okvqa.allenai.org/)
Data Preparation ```bash mkdir -p data/okvqa && cd data/okvqa # download annotations and questions wget https://okvqa.allenai.org/static/data/mscoco_train2014_annotations.json.zip && unzip mscoco_train2014_annotations.json.zip wget https://okvqa.allenai.org/static/data/OpenEnded_mscoco_train2014_questions.json.zip && unzip OpenEnded_mscoco_train2014_questions.json.zip wget https://okvqa.allenai.org/static/data/mscoco_val2014_annotations.json.zip && unzip mscoco_val2014_annotations.json.zip wget https://okvqa.allenai.org/static/data/OpenEnded_mscoco_val2014_questions.json.zip && unzip OpenEnded_mscoco_val2014_questions.json.zip # download converted files wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/okvqa/okvqa_train.jsonl wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/okvqa/okvqa_val.jsonl cd ../.. ```
Evaluate ```bash ds="okvqa_val" checkpoint=/PATH/TO/CHECKPOINT python -m torch.distributed.launch --use-env \ --nproc_per_node ${NPROC_PER_NODE:-8} \ --nnodes ${WORLD_SIZE:-1} \ --node_rank ${RANK:-0} \ --master_addr ${MASTER_ADDR:-127.0.0.1} \ --master_port ${MASTER_PORT:-12345} \ evaluate_vqa.py \ --checkpoint $checkpoint \ --dataset $ds \ --batch-size 8 \ --num-workers 2 ```
### [TextVQA](https://textvqa.org/)
Data Preparation ```bash mkdir -p data/textvqa && cd data/textvqa # download images wget https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip && unzip train_val_images.zip # download annotations and questions wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_train.json wget https://dl.fbaipublicfiles.com/textvqa/data/TextVQA_0.5.1_val.json # download converted files wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_train_annotations.json wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_train_questions.json wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_train.jsonl wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_val_annotations.json wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_val_questions.json wget https://ofasys-wlcb.oss-cn-wulanchabu.aliyuncs.com/Qwen-VL/evaluation/textvqa/textvqa_val.jsonl cd ../.. ```
Evaluate ```bash ds="textvqa_val" checkpoint=/PATH/TO/CHECKPOINT python -m torch.distributed.launch --use-env \ --nproc_per_node ${NPROC_PER_NODE:-8} \ --nnodes ${WORLD_SIZE:-1} \ --node_rank ${RANK:-0} \ --master_addr ${MASTER_ADDR:-127.0.0.1} \ --master_port ${MASTER_PORT:-12345} \ evaluate_vqa.py \ --checkpoint $checkpoint \ --dataset $ds \ --batch-size 8 \ --num-workers 2 ```
## Multi-modal Benchmark ### [MMBench](https://textvqa.org/)
Data Preparation ```bash python mmbench_converter.py # To convert jsonl file. ```
Evaluate ```bash ds="mmbench_dev_20230712" checkpoint=/PATH/TO/CHECKPOINT python -m torch.distributed.launch --use-env \ --nproc_per_node ${NPROC_PER_NODE:-8} \ --nnodes ${WORLD_SIZE:-1} \ --node_rank ${RANK:-0} \ --master_addr ${MASTER_ADDR:-127.0.0.1} \ --master_port ${MASTER_PORT:-12345} \ evaluate_mmbench.py \ --checkpoint $checkpoint \ --dataset $ds \ --batch-size 1 \ --num-workers 2 ```