[Question] Trained Models not give best rewards (Hugging Face Models)
See original GitHub issue❓ Question
I am trying to load and test trained models. I got trained model from Hugging face and I evaluate performance and that give mean_reward=6.60 +/- 4.758150901348128
. Do I need to set Hyperparameters? If so, how can I do it?
Code:
import gym
from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from google.colab.patches import cv2_imshow
checkpoint = load_from_hub(
repo_id="sb3/ppo-BreakoutNoFrameskip-v4",
filename="ppo-BreakoutNoFrameskip-v4.zip",
)
model = PPO.load(checkpoint)
eval_env = make_atari_env("Breakout-v4", n_envs=4, seed=0)
eval_env = VecFrameStack(eval_env , n_stack=4)
mean_reward, std_reward = evaluate_policy(
model, eval_env, n_eval_episodes=10, deterministic=True, warn=False
)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")
Hyperparameters:
OrderedDict([(‘batch_size’, 256), (‘clip_range’, ‘lin_0.1’), (‘ent_coef’, 0.01), (‘env_wrapper’, [‘stable_baselines3.common.atari_wrappers.AtariWrapper’]), (‘frame_stack’, 4), (‘learning_rate’, ‘lin_2.5e-4’), (‘n_envs’, 8), (‘n_epochs’, 4), (‘n_steps’, 128), (‘n_timesteps’, 10000000.0), (‘policy’, ‘CnnPolicy’), (‘vf_coef’, 0.5), (‘normalize’, False)])
Link - https://huggingface.co/sb3/ppo-BreakoutNoFrameskip-v4
Checklist
- I have checked that there is no similar issue in the repo
- I have read the documentation
- If code there is, it is minimal and working
- If code there is, it is formatted using the markdown code blocks for both code and stack traces.
Issue Analytics
- State:
- Created a year ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Question answering - Hugging Face Course
Time to look at question answering! This task comes in many flavors, but the one we'll focus on in this section is called...
Read more >Introducing Decision Transformers on Hugging Face
It abstracts Reinforcement Learning as a conditional-sequence modeling problem. The main idea is that instead of training a policy using RL ...
Read more >Improve DistilBERT Question and Answering model with ...
I'm wondering if there's a workaround to use a similar approach on improving pre-trained distilBERT model using reinforcement based method ( ...
Read more >Fine-tuning a model with the Trainer API - Hugging Face Course
Transformers provides a Trainer class to help you fine-tune any of the pretrained models it provides on your dataset.
Read more >Fine-tune a pretrained model - Hugging Face
In this tutorial, you will fine-tune a pretrained model with a deep learning framework of your choice: Fine-tune a pretrained model with Transformers...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey there 👋 . My bad, we changed it some months ago, we need to specify the zip file since we can
load_from_hub
any file. But indeed I forgot to update it on the SB3 documentation (it was updated on the Hub documentation and tutorials): https://huggingface.co/docs/hub/stable-baselines3#using-existing-modelsI’ll make a doc update PR today, thanks @indramal for pointing this out 🤗
@simoninithomas no problem