Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to reproduce val results

See original GitHub issue

Hi @rohitgirdhar, I’m trying to test the irCSN-152 (IG65M) model for EK-55. I used the model https://dl.fbaipublicfiles.com/avt/checkpoints/expts/10_ek55_avt_ig65m.txt/0/checkpoint.pth and the config expts/10_ek55_avt_ig65m.txt, and added these lines to the config:

test_only=true
train.init_from_model=[[${cwd}/DATA/models/10_ek55_avt_ig65m.pth]]

However, I’m getting

[2021-10-05 12:37:04,999][root][INFO] - Reading from resfiles
[2021-10-05 12:37:11,072][func.train][INFO] - []
[2021-10-05 12:37:11,073][root][INFO] - iter_time: 0.294328
[2021-10-05 12:37:11,073][root][INFO] - data_time: 0.135377
[2021-10-05 12:37:11,074][root][INFO] - loss: 6.164686
[2021-10-05 12:37:11,074][root][INFO] - acc1/action: 7.351763
[2021-10-05 12:37:11,074][root][INFO] - acc5/action: 19.931891
[2021-10-05 12:37:11,074][root][INFO] - cls_action: 6.134162
[2021-10-05 12:37:11,074][root][INFO] - feat: 0.030524

which is far from the 14.4 and 31.7 Top 1/5 performance. Do you know what might be wrong here?

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

rohitgirdharcommented, Oct 6, 2021

Great! The configs should run with a 16GB GPU. From my initial experiments I found that more heads/layers for EK55 did help in getting better performance. You can try with fewer though the performance might be a bit lower. Closing this task, but feel free to open another task if you face any other issues.

1reaction

rohitgirdharcommented, Oct 6, 2021

Hmm that is strange. It seems then the problem might be with the IG65M features. Can you try re-downloading the LMDB file? I have already tried it with a fresh download of the LMDB file and it seems to work. And could you also try with the Epic Kitchens-100 IG65M LMDB file and try that experiment?

Btw for the AR numbers, I actually don’t print them in the logs at the end, however they should be in the tensorboard files. So you can just run tensorboard on the output directory and see the AR5 numbers.