Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error while running evaluation for open domain VQA

See original GitHub issue

Hi, I am evaluating the pre-trained ofa_large.pt for open domain VQA. The evaluation runs fine for certain input samples but then fails with the following error:

2022-09-01 04:52:07 | INFO | tasks.ofa_task | source dictionary: 59457 types
2022-09-01 04:52:07 | INFO | tasks.ofa_task | target dictionary: 59457 types
local datafile ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 begin to initialize row_count and line_idx-to-offset mapping
local datafile ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 finished initializing row_count and line_idx-to-offset mapping
file ../../dataset/vqa_data/textvqa_val1.tsv slice_id 0 row count 4999 total row count 4999
/private/home/rbh/anaconda3/envs/ofa/lib/python3.7/site-packages/torchvision/transforms/transforms.py:258: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
  "Argument interpolation should be of type InterpolationMode instead of int. "
/private/home/rbh/OFA/data/mm_data/vqa_gen_dataset.py:64: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  decoder_prompts = np.array([s['decoder_prompt'].tolist() for s in samples])
2022-09-01 04:53:16 | INFO | fairseq.logging.progress_bar | :     11 / 625 sentences=8
2022-09-01 04:53:21 | INFO | fairseq.logging.progress_bar | :     21 / 625 sentences=8
Traceback (most recent call last):
  File "../../evaluate.py", line 156, in <module>
    cli_main()
  File "../../evaluate.py", line 151, in cli_main
    cfg, main, ema_eval=args.ema_eval, beam_search_vqa_eval=args.beam_search_vqa_eval, zero_shot=args.zero_shot
  File "/private/home/rbh/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/private/home/rbh/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
    main(cfg, **kwargs)
  File "../../evaluate.py", line 134, in main
    result, scores = eval_step(task, generator, models, sample, **kwargs)
  File "/private/home/rbh/OFA/utils/eval_utils.py", line 306, in eval_step
    return eval_vqa_gen(task, generator, models, sample, **kwargs)
  File "/private/home/rbh/OFA/utils/eval_utils.py", line 47, in eval_vqa_gen
    hypos = task.inference_step(generator, models, sample, prefix_tokens=sample['prefix_tokens'])
  File "/private/home/rbh/OFA/fairseq/fairseq/tasks/fairseq_task.py", line 518, in inference_step
    models, sample, prefix_tokens=prefix_tokens, constraints=constraints
  File "/private/home/rbh/anaconda3/envs/ofa/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 207, in generate
    return self._generate(models, sample, **kwargs)
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 379, in _generate
    step, lprobs, scores, tokens, prefix_tokens, beam_size
  File "/private/home/rbh/OFA/models/sequence_generator.py", line 624, in _prefix_tokens
    assert (first_beam == target_prefix).all()
AssertionError

I am using the following evaluation command

#!/usr/bin/env bash

# The port for communication. Note that if you want to run multiple tasks on the same machine,
# you need to specify different port numbers.
export MASTER_PORT=8082

user_dir=../../ofa_module
bpe_dir=../../utils/BPE

# val or test
split=$1

data=../../dataset/vqa_data/textvqa_val1.tsv
ans2label_file=../../dataset/vqa_data/trainval_ans2label.pkl
path=../../pretrained_checkpoints/ofa_large.pt
result_path=../../results/vqa_${split}_beam
selected_cols=0,5,2,3,4

val_inference_type=beamsearch
python3 -m torch.distributed.launch ../../evaluate.py \
    ${data} \
    --path=${path} \
    --bpe-dir=${bpe_dir} \
    --prompt-type=src \
    --selected-cols=${selected_cols} \
    --user-dir=${user_dir} \
    --task=vqa_gen \
    --batch-size=8 \
    --log-format=simple --log-interval=10 \
    --seed=7 \
    --gen-subset=${split} \
    --results-path=${result_path} \
    --fp16 \
    --beam-search-vqa-eval \
    --zero-shot \
    --unconstrained-training \
    --beam=5 \
    --unnormalized \
    --temperature=1.0 \
    --val-inference-type=${val_inference_type}
    --num-workers=0 \
    --model-overrides="{\"data\":\"${data}\",\"bpe_dir\":\"${bpe_dir}\",\"selected_cols\":\"${selected_cols}\",\"ans2label_file\":\"${ans2label_file}\"}"

Issue Analytics

State:
Created a year ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

RishabhMaheshwarycommented, Sep 8, 2022

Thanks for the reply! Yeah, with the latest pull i am able to get 73.05 VQA 2 val accuracy with a beam size of 20 without adding the above line.

0reactions

yangapkucommented, Sep 8, 2022

Hi @yangapku, I was able to get the accuracy of 0.6552 on VQAv2 validation set using the ofa_large_384.pt by adding here the line below.
if kwargs["zero_shot"]:
    generator.constraint_trie = None
Also is #124 complete? Maybe the above line can fix the issue in the validation of fine-tuned open domain VQA model.

In fact, the codes related with zero-shot inference on pretrained OFA checkpoint (like ofa_large_384.pt) are in zero_shot_utils.py rather than eval_utils.py which is for finetuned OFA checkpoint. So I’m somewhat confused why your edit makes difference in the zero-shot evaluation setting. PR #124 is still under fixing. Thanks for your comment and I will have a try.

Top Results From Across the Web

Do Explanations Help Users Detect Errors in Open-Domain ...

Figure 1: Using end-to-end user studies, we evaluate whether explanation strategies of open-domain QA as- sistants help users decide when to ...

Supervising the Transfer of Reasoning Patterns in VQA

However, in this paper, we study the reasoning ability of the VQA model by evaluating its accuracy in OOD settings (using GQA-OOD).

Errudite: Scalable, Reproducible, and Testable Error Analysis

This paper cod- ifies model and task agnostic principles for informative error analysis, and presents Er- rudite, an interactive tool for better support-...

What's in a Name? Answer Equivalence For Open-Domain Question ...

We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. We analyse three QA benchmarks: Natural ...

Reliable Visual Question Answering: Abstain Rather Than ...

To evaluate a VQA model with an ability to abstain, we consider two types of evaluation and discuss how we adapt them for...