question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Irreproducible zoo trials

See original GitHub issue

Hi, I am using zoo to optimise the parameters for SAC with a customised env. The code I used was

python3 train.py --algo sac --env FullFilterEnv-v0 --gym-packages gym_environment -n 50000 -optimize --eval-episodes 40 --n-trials 1000 --n-jobs 2 --sampler random --pruner median

I use --eval-episodes = 40 to have agents with more stable performance.

Something about the env. Each episode is at most 5 steps long. The rewards for usual steps are negative value of some Euclidean norm, say -||x-x_target||, and the successful step will get reward +100. Once 100 is reached, the episode is over.

In the zoo, I get some results like

[I 2020-09-29 07:35:38,656] Trial 697 finished with value: -100.0 and parameters: {'gamma': 0.5, 'lr': 0.009853989305797941, 'learning_starts': 50, 'batch_size': 64, 'buffer_size': 100000, 'train_freq': 256, 'tau': 0.01, 'ent_coef': 'auto', 'net_arch': 'deep', 'target_entropy': -100}. Best is trial 650 with value: -100.0.

That means for the last 40 steps after 50,000 timesteps, all the episodes finish with just one step, and directly get reward +100, which is kinda too good to be true. So I used the recommended parameters and do the real training to the same env and I used 40 episodes to calculate the mean ep_reward. But after 50,000 timesteps, the mean ep_reward was only around -900, which is far from success in each episode.

Notice that there are two trials give -100. The similar “irreproducity” happens to other trials as well. Is this something known to the zoo, or is there anything I did wrongly?

BTW, I use the same random seed as in the zoo, i.e.,

SEED = 0
np.random.seed(SEED)

The code I used in the callback to calculate mean ep_reward.

def _on_step(self) -> bool:
       if self.n_calls % self.check_freq == 0:

         # Retrieve training reward
         x, y = ts2xy(load_results(self.log_dir), 'timesteps')
         if len(x) > 0:
             # Mean training reward over the last 40 episodes
             mean_reward = np.mean(y[-40:])
             if self.verbose > 0:
               print("Num timesteps: {}".format(self.num_timesteps))
               print("Best mean reward: {:.2f} - Last mean reward per episode: {:.2f}".format(self.best_mean_reward, mean_reward))

             # New best model, you could save the agent here
             if mean_reward > self.best_mean_reward:
                 self.best_mean_reward = mean_reward
                 # Example for saving best model
                 if self.verbose > 0:
                   print("Saving new best model to {}".format(self.save_path))
                 self.model.save(self.save_path)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
araffincommented, Oct 12, 2020

The previous hyperparameter combination is recommended by roo. Can those trials with NaNs be eliminated already from zoo without recommending it as best trial (or pruning it)?

You should raise an exception (assertion error) and the trial will be ignored. See https://github.com/araffin/rl-baselines-zoo/blob/master/utils/hyperparams_opt.py#L112

I saw that we can use VecCheckNan to the env, but it seems step_async and step_wait are needed in the env. Is there an example about how these function look like?

Please read the documentation for that.

0reactions
blurLakecommented, Oct 12, 2020

Hi, thanks for the suggestions. I think I found what is the problem. I am using entr_coef= auto in SAC. At certain point, action becomes NaN which leads to state of the env to be NaN also. Since NaN is not incorporated in the condition checking in step function, which leads to doneflag = True even with NaN state.

I guess it is similar to this.

Questions: The previous hyperparameter combination is recommended by roo. Can those trials with NaNs be eliminated already from zoo without recommending it as best trial (or pruning it)?

I saw that we can use VecCheckNan to the env, but it seems step_async and step_wait are needed in the env. Is there an example about how these function look like?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reproduce or Bust?!: Bringing Reproducibility Back to Center ...
Since published studies are the basis of future studies, hypotheses, and clinical trials, one irreproducible paper is likely to have a ripple ...
Read more >
Reducing versus Embracing Variation as Strategies for ... - MDPI
Irreproducibility is a well-recognized problem in biomedical animal ... In animal research, direct replication studies are a challenge due ...
Read more >
Predator Control Needs a Standard of Unbiased Randomized ...
Therefore, as with biomedical research, the field of predator control needs the “gold-standard” of randomized, controlled experiment without ...
Read more >
A TRIP TO THE CATACLYSMIC BINARY ZOO - IOPscience
We report follow-up studies of 35 recently discovered cataclysmic variables (CVs), 32 of which were found in large, automated synoptic sky surveys.
Read more >
Support for the ARRIVE (Animal Research: Reporting ... - NCBI
Item/sub‑item ARRIVE items and sub‑items Possible categories 1 Title not reported; partially reported; fully reported 2 Abstract not reported; partially reported; fully reported 3 Background depends...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found