question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to reproduce val results

See original GitHub issue

Hi @rohitgirdhar, I’m trying to test the irCSN-152 (IG65M) model for EK-55. I used the model https://dl.fbaipublicfiles.com/avt/checkpoints/expts/10_ek55_avt_ig65m.txt/0/checkpoint.pth and the config expts/10_ek55_avt_ig65m.txt, and added these lines to the config:

test_only=true
train.init_from_model=[[${cwd}/DATA/models/10_ek55_avt_ig65m.pth]]

However, I’m getting

[2021-10-05 12:37:04,999][root][INFO] - Reading from resfiles
[2021-10-05 12:37:11,072][func.train][INFO] - []
[2021-10-05 12:37:11,073][root][INFO] - iter_time: 0.294328
[2021-10-05 12:37:11,073][root][INFO] - data_time: 0.135377
[2021-10-05 12:37:11,074][root][INFO] - loss: 6.164686
[2021-10-05 12:37:11,074][root][INFO] - acc1/action: 7.351763
[2021-10-05 12:37:11,074][root][INFO] - acc5/action: 19.931891
[2021-10-05 12:37:11,074][root][INFO] - cls_action: 6.134162
[2021-10-05 12:37:11,074][root][INFO] - feat: 0.030524

which is far from the 14.4 and 31.7 Top 1/5 performance. Do you know what might be wrong here?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
rohitgirdharcommented, Oct 6, 2021

Great! The configs should run with a 16GB GPU. From my initial experiments I found that more heads/layers for EK55 did help in getting better performance. You can try with fewer though the performance might be a bit lower. Closing this task, but feel free to open another task if you face any other issues.

1reaction
rohitgirdharcommented, Oct 6, 2021

Hmm that is strange. It seems then the problem might be with the IG65M features. Can you try re-downloading the LMDB file? I have already tried it with a fresh download of the LMDB file and it seems to work. And could you also try with the Epic Kitchens-100 IG65M LMDB file and try that experiment?

Btw for the AR numbers, I actually don’t print them in the logs at the end, however they should be in the tensorboard files. So you can just run tensorboard on the output directory and see the AR5 numbers.

Read more comments on GitHub >

github_iconTop Results From Across the Web

I can't reproduce the result. · Issue #19 · TRI-ML/dd3d · GitHub
Hi, The backbone is DLA34. The GPU number is 2 and per-GPU bachsize is 2.The result is below.
Read more >
Unable to reproduce PyTorch tutorial results - PyTorch Forums
Unable to reproduce PyTorch tutorial results ... I ain't able to reproduce even while running the code on CPU. Training complete in 3m...
Read more >
Why can't I get reproducible results in Keras even though I set ...
Set `python` built-in pseudo-random generator at a fixed value import ... The key point of making result reproducible is to disable GPU.
Read more >
How to Get Reproducible Results with Keras
It is possible that because of the sophistication of your model and the parallel nature of training, that you are getting unreproducible results...
Read more >
Unable to reproduce benchmark results mentioned in the paper - #3 ...
As part of our submission pipeline for the challenge, I'm trying to reproduce the results mentioned in the paper. With AlexNet trained on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found