question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cannot train LSTM policy by PPO2 when mujoco env is selected

See original GitHub issue

Hi, I think I discovered bug when I train LSTM Policy by PPO2 when mujoco env is selected.

I run this code. python -m baselines.run --alg=ppo2 --env=Reacher-v2 --num_timesteps=1e6 --network=lstm --nminibatches=2 --num_env=4

and I get this error.

Traceback (most recent call last): File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/runpy.py”, line 193, in _run_module_as_main “main”, mod_spec) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/runpy.py”, line 85, in _run_code exec(code, run_globals) File “/home/isi/yoshida/baselines/baselines/run.py”, line 235, in <module> main() File “/home/isi/yoshida/baselines/baselines/run.py”, line 214, in main model, _ = train(args, extra_args) File “/home/isi/yoshida/baselines/baselines/run.py”, line 69, in train **alg_kwargs File “/home/isi/yoshida/baselines/baselines/ppo2/ppo2.py”, line 245, in learn obs, returns, masks, actions, values, neglogpacs, states, epinfos = runner.run() #pylint: disable=E0632 File “/home/isi/yoshida/baselines/baselines/ppo2/ppo2.py”, line 104, in run actions, values, self.states, neglogpacs = self.model.step(self.obs, S=self.states, M=self.dones) File “/home/isi/yoshida/baselines/baselines/common/policies.py”, line 89, in step a, v, state, neglogp = self._evaluate([self.action, self.vf, self.state, self.neglogp], observation, **extra_feed) File “/home/isi/yoshida/baselines/baselines/common/policies.py”, line 71, in _evaluate return sess.run(variables, feed_dict) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 877, in run run_metadata_ptr) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1100, in _run feed_dict_tensor, options, run_metadata) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1272, in _do_run run_metadata) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1291, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor ‘ppo2_model/vf/Placeholder_1’ with dtype float and shape [1,256] [[Node: ppo2_model/vf/Placeholder_1 = Placeholderdtype=DT_FLOAT, shape=[1,256], _device=“/job:localhost/replica:0/task:0/device:CPU:0”]]

Caused by op ‘ppo2_model/vf/Placeholder_1’, defined at: File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/runpy.py”, line 193, in _run_module_as_main “main”, mod_spec) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/runpy.py”, line 85, in _run_code exec(code, run_globals) File “/home/isi/yoshida/baselines/baselines/run.py”, line 235, in <module> main() File “/home/isi/yoshida/baselines/baselines/run.py”, line 214, in main model, _ = train(args, extra_args) File “/home/isi/yoshida/baselines/baselines/run.py”, line 69, in train **alg_kwargs File “/home/isi/yoshida/baselines/baselines/ppo2/ppo2.py”, line 230, in learn model = make_model() File “/home/isi/yoshida/baselines/baselines/ppo2/ppo2.py”, line 229, in <lambda> max_grad_norm=max_grad_norm) File “/home/isi/yoshida/baselines/baselines/ppo2/ppo2.py”, line 25, in init act_model = policy(nbatch_act, 1, sess) File “/home/isi/yoshida/baselines/baselines/common/policies.py”, line 159, in policy_fn vf_latent, _ = _v_net(encoded_x) File “/home/isi/yoshida/baselines/baselines/common/models.py”, line 105, in network_fn S = tf.placeholder(tf.float32, [nenv, 2*nlstm]) #states File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py”, line 1735, in placeholder return gen_array_ops.placeholder(dtype=dtype, shape=shape, name=name) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py”, line 4925, in placeholder “Placeholder”, dtype=dtype, shape=shape, name=name) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper op_def=op_def) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 454, in new_func return func(*args, **kwargs) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3155, in create_op op_def=op_def) File “/home/isi/yoshida/anaconda3/envs/baselines/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1717, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor ‘ppo2_model/vf/Placeholder_1’ with dtype float and shape [1,256] [[Node: ppo2_model/vf/Placeholder_1 = Placeholderdtype=DT_FLOAT, shape=[1,256], _device=“/job:localhost/replica:0/task:0/device:CPU:0”]]

How can I train LSTM Policy by PPO2 in mujoco?

For your information, I can sucessfully train LSTM policy by PPO2 in PongNoFrameskip-v4.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
pzhokhovcommented, Oct 26, 2018

generally not sharing parameters makes training more stable (less sensitive to hyperparameters such as value function coefficient in the training objective or learning rate) because two different objectives do not compete with each other, whereas sharing parameters allows for faster learning (when it works). For image-based observations (and convolutional layers) we use parameter sharing , because otherwise both value function approximator and policy would have to learn good visual features, and that may take too many samples. Mujoco has simulator state-based observations that do not require much of feature learning; and not sharing parameters gets us training that works on decently on all environments without much hyperparameter tuning.

1reaction
takerfumecommented, Sep 12, 2018

Thank you! I understand that error means I didn’t feed a value for placefolder of value net which is created by ‘copying’ policy net.

I run this command. And sucessfully train LSTM policy! python -m baselines.run --alg=ppo2 --network=lstm --num_timesteps=1e6 --env=Reacher-v2 --num_env=4 --nminibatches=2 --value_network=shared

Read more comments on GitHub >

github_iconTop Results From Across the Web

PPO2 — Stable Baselines 2.10.3a0 documentation
PPO2 ¶. The Proximal Policy Optimization algorithm combines ideas from A2C (having multiple workers) and TRPO (it uses a trust region to improve...
Read more >
The 37 Implementation Details of Proximal Policy Optimization
Despite the complicated situation, we have found ppo2 (ea25b9e) as an implementation worth studying. It obtains good performance in both Atari ...
Read more >
Stable Baselines Documentation - Read the Docs
Here is a quick example of how to train and run PPO2 on a cartpole environment: import gym from stable_baselines.common.policies import ...
Read more >
Making a self-play environment for OpenAI gym - Medium
To see how I approached the problem, check out part 1 here! ... For ppo2, if this is set to None, mlp is...
Read more >
The ICLR Blog Track ·
CartPole-v1 from Gym was his chosen simulation environment, and before long ... MuJoCo tasks, LSTM, and Real-time Strategy (RTS) game tasks.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found