Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent scores between Local and Remote minival for PointNav

See original GitHub issue

I thought it was better to raise a separate issue for this. I’m observing large inconsistencies in the local and remote submissions. I made 3 submissions of the same model on the minival track. I also evaluated the docker on a local server. These were the results I got:

Local docker evaluation

	SPL	Success	Distance to goal
Trial 1	0.170	0.244	2.630
Trial 2	0.150	0.206	2.491
Trial 3	0.172	0.234	2.486

Remote docker evaluation

	SPL	Success	Distance to goal
Trial 1	0.260	0.366	0.895
Trial 2	0.258	0.333	0.846
Trial 3	0.336	0.433	0.870

I additionally evaluated locally on a ppo_trainer evaluation script I wrote based in habitat-baselines. I’ve tried to keep things as consistent as possible between my docker submission and the ppo_trainer script.

Local non-docker evaluation

	SPL	Success	Distance to goal
Trial 1	0.327	0.455	1.504
Trial 2	0.350	0.492	1.503
Trial 3	0.333	0.455	1.622

Could these inconsistencies be due to some random seed issues? I’ve set them to 123 as follows:

random.seed(config.PYT_RANDOM_SEED)
np.random.seed(config.PYT_RANDOM_SEED)
torch.random.manual_seed(config.PYT_RANDOM_SEED)
if torch.cuda.is_available():
   torch.backends.cudnn.deterministic = True
   torch.backends.cudnn.benchmark = False

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

srama2512commented, May 13, 2020

Hi @abhiskk

The random seed is already being set via the construct_envs() function. So I’m not sure why the randomness still persists. In any case, the variance could also be due to the specific model that I am using. I was mainly concerned about the difference between local and remote docker evaluations. The remote evaluation and local non-docker evaluation seem to be reasonably consistent for me right now. So I hope this should not be a problem. I will close this for now. Thanks!

0reactions

srama2512commented, May 27, 2020

I thought that minival would have 213 episodes. In any case, I’m getting success rates close to 0.5 with SPL close to 0.3 on the full validation set. This was consistent with different subsets of the validation data that I am using as well. So it does not closely match test. Is it possible that there are some configuration / episode generation inconsistencies between val and test?