question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PPO rollouts not terminating with `done == True`

See original GitHub issue

I am using a custom environment, and I’ve already checked the following:

from stable_baselines3.common.env_checker import check_env

env = CustomEnv(arg1, ...)
# It will check your custom environment and output additional warnings if needed
check_env(env)

But the PPO algo is continuing to step(action) when done == True (when the state is no longer in the bounds).

This is how I am interfacing with the algorithm:

class Agent:

    def __init__(self, environment, name, net_arch=[100, 100], n_env=1, n_steps=10000):

        # vectorise environment
        self.environment = environment
        check_env(self.environment)
        venv = DummyVecEnv([lambda: environment]*n_env)

        # load or create model
        assert(isinstance(name, str))
        self.name = name
        try:
            self.model = PPO.load(self.name, venv)
        except:
            self.model = PPO(
                'MlpPolicy', 
                venv,
                use_sde=True,
                sde_sample_freq=5,
                gae_lambda=0.9,
                learning_rate=1e-2,
                verbose=1,
                policy_kwargs=dict(net_arch=net_arch),
                n_steps=n_steps
            )

    def train(self, time_steps):

        # learn and save
        self.model.learn(total_timesteps=time_steps)
        self.model.save(self.name)

    def evaluate(self):

        # simulate
        obs = self.environment.reset()
        while True:
            action, _ = self.model.predict(obs, deterministic=True)
            obs, rew, done, _ = self.environment.step(action)
            if done: break

        # plot
        self.environment.system.plot(fname='{}.pdf'.format(self.name))

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
cispraguecommented, May 15, 2020

@Miffyli, thanks. But, after more thought, I actually need the agent to stay within the state-space bounds during training, because some my underlying code requires that. How can I enforce this?

1reaction
Miffylicommented, May 15, 2020

So, in this case, the reward itself should be enough to eventually enforce that I want the agent to stay within the bounds, right?

If I understood this right (you desire agent to avoid specific situations), then yes, a correct reward and/or terminal states should be able to teach agent to avoid those situations.

Read more comments on GitHub >

github_iconTop Results From Across the Web

PPO — Stable Baselines3 1.7.0a8 documentation
True if function returned with at least n_rollout_steps collected, False if callback terminated rollout prematurely. Return the parameters of the agent.
Read more >
The 37 Implementation Details of Proximal Policy Optimization
For example, the CartPole-v1 has a 500 time limit (see link) and will return done=True if the game lasts for more than 500...
Read more >
Sample Collections and Trajectory Views — Ray 2.2.0
In either case, no episode is allowed to exceed the given horizon number of timesteps (RLlib will artificially terminate an episode if this...
Read more >
How to prevent my reward sum received during evaluation ...
The PPO default of 4000? Is the x-axis really showing iterations or training steps? 1k steps would not be much, so maybe just...
Read more >
A Graphic Guide to Implementing PPO for Atari Games
Learning how Proximal Policy Optimisation (PPO) works and writing a ... no other value getting the chance of being chosen (as it is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found