[rllib] 'use_lstm' in PPO value function
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- Ray installed from (source or binary): source
- Ray version: 5.0
- Python version: 3.6.1
- Exact command to reproduce:
Describe the problem
I’m looking into the source code of PPO + LSTM, but I found i can not use both LSTM and GAE for the value function.
As I’ve seen the paper “Learning Dexterous In-Hand Manipulation” from OpenAI, they successfully used both LSTM and GAE.
Question
-
Why can’t I use both LSTM and GAE in the framework?
-
If I remove the
vf_config["use_lstm"] = False
statement, can i use LSTM for the value function without any problem?
Thank you for help.
Source code / logs
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
Models, Preprocessors, and Action Distributions — Ray 2.2.0
More generally, RLlib supports the use of recurrent/attention models for all its policy-gradient algorithms (A3C, PPO, PG, IMPALA), and the necessary sequence ...
Read more >Ppo add the lstm NN - RLlib - Ray
Hello, I want to add the lstm NN to my PPO agent. ... If by critic you mean the value network then yes,the...
Read more >Algorithms — Ray 2.2.0 - the Ray documentation
clip_param – PPO clip parameter. vf_clip_param – Clip param for the value function. Note that this is sensitive to the scale of the...
Read more >RLlib Models, Preprocessors, and Action Distributions
Return the value function estimate for the most recent forward pass. Returns ... import ray import ray.rllib.agents.ppo as ppo from ray.rllib.models import ...
Read more >How To Customize Policies — Ray 2.2.0
In RLlib, loss functions are defined over batches of trajectory data ... In this example, we'll dive into how PPO is defined within...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ericl It works. Thank you so much for the help.
@whikwon the following works for me (it’s a bit less modular than having two LSTM()s, but I think there are some complications with that approach).
Note that the new LSTM state is now a list of four elements: