question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TD3 Policy and Target Policy Naming Conflict

See original GitHub issue

Description When I try to instantiate a TD3 model, I get an error in the init function on line 136:

ValueError: Variable input/model/pi_fc0/w already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

It’s possible I misunderstand the variable naming going on, but won’t the following lines from td3.py init() always cause a naming conflict because of the reuse=False in the scoping?

with tf.variable_scope("input", reuse=False):
    self.policy_tf = self.policy(self.sess, self.observation_space, self.action_space, **self.policy_kwargs)`                           
    self.target_policy_tf = self.policy(self.sess, self.observation_space, self.action_space, **self.policy_kwargs)

Code Example You can replicate the issue with the following code:

import gym                                                                                 
                                                                                                
from stable_baselines.common.policies import MlpPolicy                                           
from stable_baselines.common.policies import FeedForwardPolicy                                   
from stable_baselines import TD3                                                                 
                                                                                                
                                                                                                
class MyMlpPolicy(FeedForwardPolicy):                                                            
    def __init__(self, sess, ob_space, ac_space, n_env=1, n_steps=1, n_batch=None, reuse=False, **_kwargs):                                                                                       
        super(MyMlpPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, feature_extraction="mlp", **_kwargs)                     
                                                                                                 
env = gym.make('CartPole-v0')                                                                    
model = TD3(MyMlpPolicy, env)                                                           

System Info

  • Stable-baselines version: 2.7.0, installed via pip on 09/10/19
  • GPU: N/A
  • Python Version: 3.5
  • Tensorflow version: 1.9.0-rc0

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
araffincommented, Sep 10, 2019

Hello,

as mentioned in the documentation, you should be using td3.policies, and a continuous action environment like Pendulum-v0, not cartpole, because TD3 only support continuous actions.

the following code works:

import gym                                                                                 
                                                                                                
from stable_baselines.td3.policies import FeedForwardPolicy                                   
from stable_baselines import TD3                                                                 
                                                                                                
                                                                                                
class MyMlpPolicy(FeedForwardPolicy):                                                            
    def __init__(self, sess, ob_space, ac_space, n_env=1, n_steps=1, n_batch=None, reuse=False, **_kwargs):                                                                                       
        super(MyMlpPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, feature_extraction="mlp", **_kwargs)                     
                                                                                                 
env = gym.make('Pendulum-v0')                                                                    
model = TD3(MyMlpPolicy, env)                                                           

0reactions
Miffylicommented, Sep 11, 2019

Ah my bad, I did not notice the issue was with using wrong policies. My bad! Perhaps a check for that, but it is already well-documented with proper highlights ^^

Read more comments on GitHub >

github_iconTop Results From Across the Web

TD3 — Stable Baselines 2.10.3a0 documentation
TD3 is a direct successor of DDPG and improves it using three major tricks: clipped double Q-Learning, delayed policy update and target policy...
Read more >
The Control Method of Twin Delayed Deep Deterministic ...
Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree ...
Read more >
(PDF) Application of twin delayed deep deterministic policy ...
In particular, the present study exploits the application of twin delayed deep deterministic policy gradient (TD3) based RL for the ...
Read more >
Comparing Deep Reinforcement Learning Algorithms' Ability ...
The Twin Delayed DDPG (TD3) algorithm introduces a set of ... Reducing the frequency for updating the policy and target networks allows the ......
Read more >
RULE-BASED POLICY REGULARIZATION FOR REIN
the-art algorithm: TD3. It applies target policy smoothing regularization to avoid overfitting in the value estimate with deterministic policies.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found