Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] HER+SAC different results to SB2

See original GitHub issue

Hi, I was training on a custom environment on SB2 before and wanted to change to SB3 (mainly because having pytorch would probably be easier for my deployment)

So I trained on SB3 with HER+SAC and the same hyperparameter, but got different results. Is this to be expected due to a different SAC implementation, or what else could be the reason?

SB2 code

env = gym.make('armflex-v4')
eval_env = HERGoalEnvWrapper(env)
eval_callback = EvalCallback(eval_env, best_model_save_path=path,
                                log_path=path, eval_freq=20000,
                                deterministic=True, render=False, n_eval_episodes=15)
model_class = SAC
goal_selection_strategy = 'future'
model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, goal_selection_strategy=goal_selection_strategy, verbose=1, 
    policy_kwargs=dict(layers=[512, 512]), buffer_size=1000000, batch_size=256, gamma=0.99, random_exploration=0.0, 
    ent_coef='auto', gradient_steps=1)
        
model.learn(total_timesteps=TIMESTEPS, callback=eval_callback, log_interval=1)

sb2

SB3 code

    env = make_vec_env(env_name, n_envs=1)
    env = ObsDictWrapper(env)
    eval_callback = EvalCallback(eval_env, best_model_save_path=path,
                            log_path=path, eval_freq=20000,
                            deterministic=True, render=False, n_eval_episodes=15)
    model_class = SAC
    goal_selection_strategy = 'future'
    model = HER('MlpPolicy', env, model_class, n_sampled_goal=4, online_sampling=False, 
        goal_selection_strategy=goal_selection_strategy, verbose=1, policy_kwargs=dict(net_arch=[512, 512]), 
        buffer_size=1000000, batch_size=256, gamma=0.99, ent_coef='auto', gradient_steps=1, max_episode_length=1000)
    
    model.learn(total_timesteps=TIMESTEPS, callback=eval_callback, log_interval=1)

sb3 this should also be the mean over 100 episodes

Issue Analytics

State:
Created 3 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

Ludilucommented, Mar 5, 2021

Thanks, the newer version indeed fixed the issue. Must have installed it just a few days before it was released. I was already able to achieve higher results so I guess this question can be closes. Thanks again for your fast support.

0reactions

araffincommented, Mar 1, 2021

Unfortunately I have a problem with online_sampling=True. I always get the following error at exactly episode = (biffersize/max_episode_length)

Do you have the latest version of Stable-Baselines3? See https://github.com/DLR-RM/stable-baselines3/issues/234

Top Results From Across the Web

Untitled

2008 jeep patriot for sale ontario, Daily results 49s. ... Tierklinik norderstedt kosten, Bicarbonates ph, The fixx two different views youtube!

v0.11.1 PDF - Stable Baselines3 Documentation

Note: Trying to create Atari environments may result to vague errors related to missing DLL files and modules. This is an issue with...

Untitled

Lend a hand you hit upon seeing one another?! ... Lesbian bar washington dc, Teen pregnancy interview questions, Support the me detect seeing!?...

Untitled

#How Burt reynolds actor wiki, Ddj-sb2 hip hop mix, Gor eranosyan, Payless auto rental ... Mayura restaurant rajajinagar, Different types of manic episodes, ......

Full text of "Diary of P.W. Gillette" - Internet Archive

See other formats. ak, ) — ar re ae ie ROLY a ae 7, A ve . | ae a ede Oey see...