TD3 Policy and Target Policy Naming Conflict
See original GitHub issueDescription When I try to instantiate a TD3 model, I get an error in the init function on line 136:
ValueError: Variable input/model/pi_fc0/w already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?
It’s possible I misunderstand the variable naming going on, but won’t the following lines from td3.py init() always cause a naming conflict because of the reuse=False
in the scoping?
with tf.variable_scope("input", reuse=False):
self.policy_tf = self.policy(self.sess, self.observation_space, self.action_space, **self.policy_kwargs)`
self.target_policy_tf = self.policy(self.sess, self.observation_space, self.action_space, **self.policy_kwargs)
Code Example You can replicate the issue with the following code:
import gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.policies import FeedForwardPolicy
from stable_baselines import TD3
class MyMlpPolicy(FeedForwardPolicy):
def __init__(self, sess, ob_space, ac_space, n_env=1, n_steps=1, n_batch=None, reuse=False, **_kwargs):
super(MyMlpPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, feature_extraction="mlp", **_kwargs)
env = gym.make('CartPole-v0')
model = TD3(MyMlpPolicy, env)
System Info
- Stable-baselines version: 2.7.0, installed via pip on 09/10/19
- GPU: N/A
- Python Version: 3.5
- Tensorflow version: 1.9.0-rc0
Issue Analytics
- State:
- Created 4 years ago
- Comments:5
Top Results From Across the Web
TD3 — Stable Baselines 2.10.3a0 documentation
TD3 is a direct successor of DDPG and improves it using three major tricks: clipped double Q-Learning, delayed policy update and target policy...
Read more >The Control Method of Twin Delayed Deep Deterministic ...
Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree ...
Read more >(PDF) Application of twin delayed deep deterministic policy ...
In particular, the present study exploits the application of twin delayed deep deterministic policy gradient (TD3) based RL for the ...
Read more >Comparing Deep Reinforcement Learning Algorithms' Ability ...
The Twin Delayed DDPG (TD3) algorithm introduces a set of ... Reducing the frequency for updating the policy and target networks allows the ......
Read more >RULE-BASED POLICY REGULARIZATION FOR REIN
the-art algorithm: TD3. It applies target policy smoothing regularization to avoid overfitting in the value estimate with deterministic policies.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hello,
as mentioned in the documentation, you should be using
td3.policies
, and a continuous action environment likePendulum-v0
, not cartpole, because TD3 only support continuous actions.the following code works:
Ah my bad, I did not notice the issue was with using wrong policies. My bad! Perhaps a check for that, but it is already well-documented with proper highlights ^^