Unsuccessful Image Caption step 2 fine-tuning
See original GitHub issueHi, when running the train_caption_stage2.sh
, it triggers a AssertError like:
Traceback (most recent call last):
File "../../train.py", line 527, in <module>
cli_main()
File "../../train.py", line 520, in cli_main
distributed_utils.call_main(cfg, main)
File "/workspace/OFA/fairseq/fairseq/distributed/utils.py", line 374, in call_main
distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
File "/workspace/OFA/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
main(cfg, **kwargs)
File "../../train.py", line 189, in main
valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "../../train.py", line 300, in train
log_output = trainer.train_step(samples)
File "/opt/conda/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/workspace/OFA/trainer.py", line 773, in train_step
loss, sample_size_i, logging_output = self.task.train_step(
File "/workspace/OFA/tasks/ofa_task.py", line 319, in train_step
loss, sample_size, logging_output = criterion(model, sample, update_num=update_num)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 881, in _call_impl
result = self.forward(*input, **kwargs)
File "/workspace/OFA/criterions/scst_loss.py", line 88, in forward
loss, score, ntokens, nsentences = self.compute_loss(model, sample, reduce=reduce)
File "/workspace/OFA/criterions/scst_loss.py", line 239, in compute_loss
gen_target, gen_res, gt_res = self.get_generator_out(model, sample)
File "/workspace/OFA/criterions/scst_loss.py", line 149, in get_generator_out
gen_out = self.task.scst_generator.generate([model], sample)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/workspace/OFA/models/sequence_generator.py", line 207, in generate
return self._generate(models, sample, **kwargs)
File "/workspace/OFA/models/sequence_generator.py", line 480, in _generate
assert step < max_len, f"{step} < {max_len}"
AssertionError: 16 < 16
How can I fix this problem? Thx!
Issue Analytics
- State:
- Created 2 years ago
- Comments:8
Top Results From Across the Web
LRP-Inference Fine-Tuning for Image Captioning Models - arXiv
Abstract—This paper analyzes the predictions of image cap- tioning models with attention mechanisms beyond visualizing the attention itself.
Read more >Explain and improve: LRP-inference fine-tuning for image ...
We propose an LRP-inference fine-tuning strategy that reduces object ... In the rest of this paper, Section 2 introduces recent image captioning models, ......
Read more >Contcap - A scalable framework for continual image captioning
Code for: Contcap - A scalable framework for continual image captioning - GitHub ... Step 2: Train model 2to21 to get model for...
Read more >Fine-Tune a Model - Amazon SageMaker
Fine-tuning trains a pretrained model on a new dataset without training from scratch. This process, also known as transfer learning, can produce accurate ......
Read more >Image captioning with visual attention | TensorFlow Core
The steps are: Load the images (and ignore images that fail to load). Replicate images to match the number of captions. Shuffle and...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m still confused because 24GB memory is not enough to use
batch_size=4
for stage2 finetuning. But in your file, you do usebatch_size=4
. Can you check if this file is actually used for stage2 finetuning?You are brilliant! The problem is the stored checkpoint from stage 1 has a mismatched name ( Specifically, no checkpoint is loaded in. And of course, the batch_size is too large for me, I switch it from 4 to 1 ). That’s the reason why it has that Error. It even didn’t reach the data part! After I change the name to be matched, the stage2 fine-tuning is running! Thank you so much!