Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why does env.render() create multiple render screens? | LSTM policy predict with one env [question]

See original GitHub issue

When I run the code example from the docs for cartpole multiprocessing, it renders one window with all env’s playing the game. It also renders individual windows with the same env’s playing the same games.

import gym
import numpy as np

from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import SubprocVecEnv
from stable_baselines.common import set_global_seeds
from stable_baselines import ACKTR

def make_env(env_id, rank, seed=0):
    """
    Utility function for multiprocessed env.

    :param env_id: (str) the environment ID
    :param num_env: (int) the number of environments you wish to have in subprocesses
    :param seed: (int) the inital seed for RNG
    :param rank: (int) index of the subprocess
    """
    def _init():
        env = gym.make(env_id)
        env.seed(seed + rank)
        return env
    set_global_seeds(seed)
    return _init

env_id = "CartPole-v1"
num_cpu = 4  # Number of processes to use
# Create the vectorized environment
env = SubprocVecEnv([make_env(env_id, i) for i in range(num_cpu)])

model = ACKTR(MlpPolicy, env, verbose=1)
model.learn(total_timesteps=25000)

obs = env.reset()
for _ in range(1000):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

System Info Describe the characteristic of your environment:

Vanilla install, followed the docs using pip
gpus: 2-gtx-1080ti’s
Python version 3.6.5
Tensorflow version 1.12.0
ffmpeg 4.0

Additional context cartpole

Issue Analytics

State:
Created 5 years ago
Comments:24 (6 by maintainers)

Top GitHub Comments

1reaction

hn2commented, Jun 18, 2019

n_env = 12 env = PortfolioEnv(history=history, abbreviation=instruments, steps=settings[‘steps’], window_length=settings[‘window_length’], include_ta=settings[‘include_ta’],allow_short=settings[‘allow_short’], reward=settings[‘reward’], debug=settings[‘debug’]) env = SubprocVecEnv([lambda: env for _ in range(1)])

mdl = 'ES_19900102_20180101_5000000_7000_1_return_False_7a686c53e4a34338942a8b4bbe65fa47'
model = PPO2.load(mdl)

# intialized here
obs = env.reset()
zero_completed_obs = np.zeros((n_env,) + env.observation_space.shape)
zero_completed_obs[0, :] = obs

state = None
state = model.initial_state   
done = np.zeros(state.shape[0])   

pos, state = model.predict(zero_completed_obs, state, done)

Traceback (most recent call last): File “C:\Users\hanna\Anaconda3\lib\site-packages\quantiacsToolbox\quantiacsToolbox.py”, line 871, in runts position, settings = TSobject.myTradingSystem(*argList) File “ppo2_quantiacs_test.py”, line 68, in myTradingSystem pos, state = model.predict(zero_completed_obs, state, done) File “C:\Users\hanna\Anaconda3\lib\site-packages\stable_baselines\common\base_class.py”, line 469, in predict vectorized_env = self._is_vectorized_observation(observation, self.observation_space) File “C:\Users\hanna\Anaconda3\lib\site-packages\stable_baselines\common\base_class.py”, line 399, in _is_vectorized_observation .format(", ".join(map(str, observation_space.shape)))) ValueError: Error: Unexpected observation shape (12, 5) for Box environment, please use (10,) or (n_env, 10) for the observation shape.

1reaction

hill-acommented, Jun 18, 2019

Still have a problem in

pos, state = model.predict(zero_completed_obs, state, done)

ValueError: Error: Unexpected observation shape (12, 5) for Box environment, please use (10,) or (n_env, 10) for the observation shape.

Model was trained with n_env = 12

Where this 10 comes from?

A few things:

This is not related to the issue.
You did not give the required code to replicate the issue.
You did not give the full stack trace (which could be used to help find the origin of the issue)
please use the Markdown highlighting code format (https://help.github.com/en/articles/creating-and-highlighting-code-blocks)

Your issue will not be addressed if you do not follow the format described in the issue template (https://github.com/hill-a/stable-baselines/blob/master/.github/ISSUE_TEMPLATE/issue-template.md)

Top Results From Across the Web

Create custom gym environments from scratch — A stock ...

OpenAI's gym is an awesome package that allows you to create custom reinforcement learning agents. ... Render the environment to the screen

Whenever I try to use env.render() for OpenAIgym I get ...

I have tried it but the problem is when I use env.reset(). It creates a pop up windows. and my kernel gets stuck...

SAC — Stable Baselines3 1.7.0a5 documentation

In our implementation, we use an entropy coefficient (as in OpenAI Spinning or Facebook Horizon), which is the equivalent to the inverse of...

Reinforcement Learning (DQN) Tutorial

This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v0 task from the OpenAI Gym....

Rendering Environment Outputs & Q-Learning

Setting the Google Colab rendering mechanism for OpenAI Gym. In this article you'll learn how to render OpenAI environment outputs with env.step() ......