question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent scores between Local and Remote minival for PointNav

See original GitHub issue

I thought it was better to raise a separate issue for this. I’m observing large inconsistencies in the local and remote submissions. I made 3 submissions of the same model on the minival track. I also evaluated the docker on a local server. These were the results I got:

Local docker evaluation

SPL Success Distance to goal
Trial 1 0.170 0.244 2.630
Trial 2 0.150 0.206 2.491
Trial 3 0.172 0.234 2.486

Remote docker evaluation

SPL Success Distance to goal
Trial 1 0.260 0.366 0.895
Trial 2 0.258 0.333 0.846
Trial 3 0.336 0.433 0.870

I additionally evaluated locally on a ppo_trainer evaluation script I wrote based in habitat-baselines. I’ve tried to keep things as consistent as possible between my docker submission and the ppo_trainer script.

Local non-docker evaluation

SPL Success Distance to goal
Trial 1 0.327 0.455 1.504
Trial 2 0.350 0.492 1.503
Trial 3 0.333 0.455 1.622

Could these inconsistencies be due to some random seed issues? I’ve set them to 123 as follows:

random.seed(config.PYT_RANDOM_SEED)
np.random.seed(config.PYT_RANDOM_SEED)
torch.random.manual_seed(config.PYT_RANDOM_SEED)
if torch.cuda.is_available():
   torch.backends.cudnn.deterministic = True
   torch.backends.cudnn.benchmark = False

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
srama2512commented, May 13, 2020

Hi @abhiskk

The random seed is already being set via the construct_envs() function. So I’m not sure why the randomness still persists. In any case, the variance could also be due to the specific model that I am using. I was mainly concerned about the difference between local and remote docker evaluations. The remote evaluation and local non-docker evaluation seem to be reasonably consistent for me right now. So I hope this should not be a problem. I will close this for now. Thanks!

0reactions
srama2512commented, May 27, 2020

I thought that minival would have 213 episodes. In any case, I’m getting success rates close to 0.5 with SPL close to 0.3 on the full validation set. This was consistent with different subsets of the validation data that I am using as well. So it does not closely match test. Is it possible that there are some configuration / episode generation inconsistencies between val and test?

Read more comments on GitHub >

github_iconTop Results From Across the Web

arXiv:2006.13171v2 [cs.CV] 30 Aug 2020
Minival phase The purpose of this phase is sanity checking. – to confirm that an EvalAI remote evaluation matches the results achieved by ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found