Error while running evaluation for open domain VQA
See original GitHub issueHi,
I am evaluating the pre-trained ofa_large.pt
for open domain VQA. The evaluation runs fine for certain input samples but then fails with the following error:
2022-09-01 04:52:07 | INFO | tasks.ofa_task | source dictionary: 59457 types
2022-09-01 04:52:07 | INFO | tasks.ofa_task | target dictionary: 59457 types
local datafile ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 begin to initialize row_count and line_idx-to-offset mapping
local datafile ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 finished initializing row_count and line_idx-to-offset mapping
file ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 row count 4999 total row count 4999
/private/home/rbh/anaconda3/envs/ofa/lib/python3.7/site-packages/torchvision/transforms/transforms.py:258: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
"Argument interpolation should be of type InterpolationMode instead of int. "
/private/home/rbh/OFA/data/mm_data/vqa_gen_dataset.py:64: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
decoder_prompts = np.array([s['decoder_prompt'].tolist() for s in samples])
2022-09-01 04:53:16 | INFO | fairseq.logging.progress_bar | : 11 / 625 sentences=8
2022-09-01 04:53:21 | INFO | fairseq.logging.progress_bar | : 21 / 625 sentences=8
Traceback (most recent call last):
File "../../evaluate.py", line 156, in <module>
cli_main()
File "../../evaluate.py", line 151, in cli_main
cfg, main, ema_eval=args.ema_eval, beam_search_vqa_eval=args.beam_search_vqa_eval, zero_shot=args.zero_shot
File "/private/home/rbh/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
File "/private/home/rbh/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
main(cfg, **kwargs)
File "../../evaluate.py", line 134, in main
result, scores = eval_step(task, generator, models, sample, **kwargs)
File "/private/home/rbh/OFA/utils/eval_utils.py", line 306, in eval_step
return eval_vqa_gen(task, generator, models, sample, **kwargs)
File "/private/home/rbh/OFA/utils/eval_utils.py", line 47, in eval_vqa_gen
hypos = task.inference_step(generator, models, sample, prefix_tokens=sample['prefix_tokens'])
File "/private/home/rbh/OFA/fairseq/fairseq/tasks/fairseq_task.py", line 518, in inference_step
models, sample, prefix_tokens=prefix_tokens, constraints=constraints
File "/private/home/rbh/anaconda3/envs/ofa/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/private/home/rbh/OFA/models/sequence_generator.py", line 207, in generate
return self._generate(models, sample, **kwargs)
File "/private/home/rbh/OFA/models/sequence_generator.py", line 379, in _generate
step, lprobs, scores, tokens, prefix_tokens, beam_size
File "/private/home/rbh/OFA/models/sequence_generator.py", line 624, in _prefix_tokens
assert (first_beam == target_prefix).all()
AssertionError
I am using the following evaluation command
#!/usr/bin/env bash
# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=8082
user_dir=../../ofa_module
bpe_dir=../../utils/BPE
# val or test
split=$1
data=../../dataset/vqa_data/textvqa_val1.tsv
ans2label_file=../../dataset/vqa_data/trainval_ans2label.pkl
path=../../pretrained_checkpoints/ofa_large.pt
result_path=../../results/vqa_${split}_beam
selected_cols=0,5,2,3,4
val_inference_type=beamsearch
python3 -m torch.distributed.launch ../../evaluate.py \
${data} \
--path=${path} \
--bpe-dir=${bpe_dir} \
--prompt-type=src \
--selected-cols=${selected_cols} \
--user-dir=${user_dir} \
--task=vqa_gen \
--batch-size=8 \
--log-format=simple --log-interval=10 \
--seed=7 \
--gen-subset=${split} \
--results-path=${result_path} \
--fp16 \
--beam-search-vqa-eval \
--zero-shot \
--unconstrained-training \
--beam=5 \
--unnormalized \
--temperature=1.0 \
--val-inference-type=${val_inference_type}
--num-workers=0 \
--model-overrides="{\"data\":\"${data}\",\"bpe_dir\":\"${bpe_dir}\",\"selected_cols\":\"${selected_cols}\",\"ans2label_file\":\"${ans2label_file}\"}"
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Do Explanations Help Users Detect Errors in Open-Domain ...
Figure 1: Using end-to-end user studies, we evaluate whether explanation strategies of open-domain QA as- sistants help users decide when to ...
Read more >Supervising the Transfer of Reasoning Patterns in VQA
However, in this paper, we study the reasoning ability of the VQA model by evaluating its accuracy in OOD settings (using GQA-OOD).
Read more >Errudite: Scalable, Reproducible, and Testable Error Analysis
This paper cod- ifies model and task agnostic principles for informative error analysis, and presents Er- rudite, an interactive tool for better support-...
Read more >What's in a Name? Answer Equivalence For Open-Domain Question ...
We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. We analyse three QA benchmarks: Natural ...
Read more >Reliable Visual Question Answering: Abstain Rather Than ...
To evaluate a VQA model with an ability to abstain, we consider two types of evaluation and discuss how we adapt them for...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the reply! Yeah, with the latest pull i am able to get 73.05 VQA 2 val accuracy with a beam size of 20 without adding the above line.
In fact, the codes related with zero-shot inference on pretrained OFA checkpoint (like
ofa_large_384.pt
) are inzero_shot_utils.py
rather thaneval_utils.py
which is for finetuned OFA checkpoint. So I’m somewhat confused why your edit makes difference in the zero-shot evaluation setting. PR #124 is still under fixing. Thanks for your comment and I will have a try.