Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Trained Models not give best rewards (Hugging Face Models)

See original GitHub issue

❓ Question

I am trying to load and test trained models. I got trained model from Hugging face and I evaluate performance and that give mean_reward=6.60 +/- 4.758150901348128. Do I need to set Hyperparameters? If so, how can I do it?

Code:

import gym

from huggingface_sb3 import load_from_hub
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy

from google.colab.patches import cv2_imshow

checkpoint = load_from_hub(
    repo_id="sb3/ppo-BreakoutNoFrameskip-v4",
    filename="ppo-BreakoutNoFrameskip-v4.zip",
)
model = PPO.load(checkpoint)

eval_env = make_atari_env("Breakout-v4", n_envs=4, seed=0)
eval_env = VecFrameStack(eval_env , n_stack=4)

mean_reward, std_reward = evaluate_policy(
    model, eval_env, n_eval_episodes=10, deterministic=True, warn=False
)
print(f"mean_reward={mean_reward:.2f} +/- {std_reward}")

Hyperparameters:

OrderedDict([(‘batch_size’, 256), (‘clip_range’, ‘lin_0.1’), (‘ent_coef’, 0.01), (‘env_wrapper’, [‘stable_baselines3.common.atari_wrappers.AtariWrapper’]), (‘frame_stack’, 4), (‘learning_rate’, ‘lin_2.5e-4’), (‘n_envs’, 8), (‘n_epochs’, 4), (‘n_steps’, 128), (‘n_timesteps’, 10000000.0), (‘policy’, ‘CnnPolicy’), (‘vf_coef’, 0.5), (‘normalize’, False)])

Link - https://huggingface.co/sb3/ppo-BreakoutNoFrameskip-v4

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

Issue Analytics

State:
Created a year ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

simoninithomascommented, Oct 23, 2022

Hey there 👋 . My bad, we changed it some months ago, we need to specify the zip file since we can load_from_hub any file. But indeed I forgot to update it on the SB3 documentation (it was updated on the Hub documentation and tutorials): https://huggingface.co/docs/hub/stable-baselines3#using-existing-models

I’ll make a doc update PR today, thanks @indramal for pointing this out 🤗

0reactions

indramalcommented, Oct 24, 2022

@simoninithomas no problem

Top Results From Across the Web

Question answering - Hugging Face Course

Time to look at question answering! This task comes in many flavors, but the one we'll focus on in this section is called...

Introducing Decision Transformers on Hugging Face

It abstracts Reinforcement Learning as a conditional-sequence modeling problem. The main idea is that instead of training a policy using RL ...

Improve DistilBERT Question and Answering model with ...

I'm wondering if there's a workaround to use a similar approach on improving pre-trained distilBERT model using reinforcement based method ( ...

Fine-tuning a model with the Trainer API - Hugging Face Course

Transformers provides a Trainer class to help you fine-tune any of the pretrained models it provides on your dataset.

Fine-tune a pretrained model - Hugging Face

In this tutorial, you will fine-tune a pretrained model with a deep learning framework of your choice: Fine-tune a pretrained model with Transformers...