Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Irreproducible zoo trials

See original GitHub issue

Hi, I am using zoo to optimise the parameters for SAC with a customised env. The code I used was

python3 train.py --algo sac --env FullFilterEnv-v0 --gym-packages gym_environment -n 50000 -optimize --eval-episodes 40 --n-trials 1000 --n-jobs 2 --sampler random --pruner median

I use --eval-episodes = 40 to have agents with more stable performance.

Something about the env. Each episode is at most 5 steps long. The rewards for usual steps are negative value of some Euclidean norm, say -||x-x_target||, and the successful step will get reward +100. Once 100 is reached, the episode is over.

In the zoo, I get some results like

[I 2020-09-29 07:35:38,656] Trial 697 finished with value: -100.0 and parameters: {'gamma': 0.5, 'lr': 0.009853989305797941, 'learning_starts': 50, 'batch_size': 64, 'buffer_size': 100000, 'train_freq': 256, 'tau': 0.01, 'ent_coef': 'auto', 'net_arch': 'deep', 'target_entropy': -100}. Best is trial 650 with value: -100.0.

That means for the last 40 steps after 50,000 timesteps, all the episodes finish with just one step, and directly get reward +100, which is kinda too good to be true. So I used the recommended parameters and do the real training to the same env and I used 40 episodes to calculate the mean ep_reward. But after 50,000 timesteps, the mean ep_reward was only around -900, which is far from success in each episode.

Notice that there are two trials give -100. The similar “irreproducity” happens to other trials as well. Is this something known to the zoo, or is there anything I did wrongly?

BTW, I use the same random seed as in the zoo, i.e.,

SEED = 0
np.random.seed(SEED)

The code I used in the callback to calculate mean ep_reward.

def _on_step(self) -> bool:
       if self.n_calls % self.check_freq == 0:

         # Retrieve training reward
         x, y = ts2xy(load_results(self.log_dir), 'timesteps')
         if len(x) > 0:
             # Mean training reward over the last 40 episodes
             mean_reward = np.mean(y[-40:])
             if self.verbose > 0:
               print("Num timesteps: {}".format(self.num_timesteps))
               print("Best mean reward: {:.2f} - Last mean reward per episode: {:.2f}".format(self.best_mean_reward, mean_reward))

             # New best model, you could save the agent here
             if mean_reward > self.best_mean_reward:
                 self.best_mean_reward = mean_reward
                 # Example for saving best model
                 if self.verbose > 0:
                   print("Saving new best model to {}".format(self.save_path))
                 self.model.save(self.save_path)

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

araffincommented, Oct 12, 2020

The previous hyperparameter combination is recommended by roo. Can those trials with NaNs be eliminated already from zoo without recommending it as best trial (or pruning it)?

You should raise an exception (assertion error) and the trial will be ignored. See https://github.com/araffin/rl-baselines-zoo/blob/master/utils/hyperparams_opt.py#L112

I saw that we can use VecCheckNan to the env, but it seems step_async and step_wait are needed in the env. Is there an example about how these function look like?

Please read the documentation for that.

0reactions

blurLakecommented, Oct 12, 2020

Hi, thanks for the suggestions. I think I found what is the problem. I am using entr_coef= auto in SAC. At certain point, action becomes NaN which leads to state of the env to be NaN also. Since NaN is not incorporated in the condition checking in step function, which leads to doneflag = True even with NaN state.

I guess it is similar to this.

Questions: The previous hyperparameter combination is recommended by roo. Can those trials with NaNs be eliminated already from zoo without recommending it as best trial (or pruning it)?

I saw that we can use VecCheckNan to the env, but it seems step_async and step_wait are needed in the env. Is there an example about how these function look like?

Top Results From Across the Web

Reproduce or Bust?!: Bringing Reproducibility Back to Center ...

Since published studies are the basis of future studies, hypotheses, and clinical trials, one irreproducible paper is likely to have a ripple ...

Reducing versus Embracing Variation as Strategies for ... - MDPI

Irreproducibility is a well-recognized problem in biomedical animal ... In animal research, direct replication studies are a challenge due ...

Predator Control Needs a Standard of Unbiased Randomized ...

Therefore, as with biomedical research, the field of predator control needs the “gold-standard” of randomized, controlled experiment without ...

A TRIP TO THE CATACLYSMIC BINARY ZOO - IOPscience

We report follow-up studies of 35 recently discovered cataclysmic variables (CVs), 32 of which were found in large, automated synoptic sky surveys.

Support for the ARRIVE (Animal Research: Reporting ... - NCBI

Item/sub‑item ARRIVE items and sub‑items Possible categories 1 Title not reported; partially reported; fully reported 2 Abstract not reported; partially reported; fully reported 3 Background depends...