Inconsistent scores between Local and Remote minival for PointNav
See original GitHub issueI thought it was better to raise a separate issue for this. I’m observing large inconsistencies in the local and remote submissions. I made 3 submissions of the same model on the minival track. I also evaluated the docker on a local server. These were the results I got:
Local docker evaluation
SPL | Success | Distance to goal | |
---|---|---|---|
Trial 1 | 0.170 | 0.244 | 2.630 |
Trial 2 | 0.150 | 0.206 | 2.491 |
Trial 3 | 0.172 | 0.234 | 2.486 |
Remote docker evaluation
SPL | Success | Distance to goal | |
---|---|---|---|
Trial 1 | 0.260 | 0.366 | 0.895 |
Trial 2 | 0.258 | 0.333 | 0.846 |
Trial 3 | 0.336 | 0.433 | 0.870 |
I additionally evaluated locally on a ppo_trainer
evaluation script I wrote based in habitat-baselines. I’ve tried to keep things as consistent as possible between my docker submission and the ppo_trainer
script.
Local non-docker evaluation
SPL | Success | Distance to goal | |
---|---|---|---|
Trial 1 | 0.327 | 0.455 | 1.504 |
Trial 2 | 0.350 | 0.492 | 1.503 |
Trial 3 | 0.333 | 0.455 | 1.622 |
Could these inconsistencies be due to some random seed issues? I’ve set them to 123 as follows:
random.seed(config.PYT_RANDOM_SEED)
np.random.seed(config.PYT_RANDOM_SEED)
torch.random.manual_seed(config.PYT_RANDOM_SEED)
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
arXiv:2006.13171v2 [cs.CV] 30 Aug 2020
Minival phase The purpose of this phase is sanity checking. – to confirm that an EvalAI remote evaluation matches the results achieved by ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @abhiskk
The random seed is already being set via the
construct_envs()
function. So I’m not sure why the randomness still persists. In any case, the variance could also be due to the specific model that I am using. I was mainly concerned about the difference between local and remote docker evaluations. The remote evaluation and local non-docker evaluation seem to be reasonably consistent for me right now. So I hope this should not be a problem. I will close this for now. Thanks!I thought that minival would have 213 episodes. In any case, I’m getting success rates close to 0.5 with SPL close to 0.3 on the full validation set. This was consistent with different subsets of the validation data that I am using as well. So it does not closely match test. Is it possible that there are some configuration / episode generation inconsistencies between val and test?