Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TD3 Policy and Target Policy Naming Conflict

See original GitHub issue

Description When I try to instantiate a TD3 model, I get an error in the init function on line 136:

ValueError: Variable input/model/pi_fc0/w already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope?

It’s possible I misunderstand the variable naming going on, but won’t the following lines from td3.py init() always cause a naming conflict because of the reuse=False in the scoping?

with tf.variable_scope("input", reuse=False):
    self.policy_tf = self.policy(self.sess, self.observation_space, self.action_space, **self.policy_kwargs)`                           
    self.target_policy_tf = self.policy(self.sess, self.observation_space, self.action_space, **self.policy_kwargs)

Code Example You can replicate the issue with the following code:

import gym                                                                                 
                                                                                                
from stable_baselines.common.policies import MlpPolicy                                           
from stable_baselines.common.policies import FeedForwardPolicy                                   
from stable_baselines import TD3                                                                 
                                                                                                
                                                                                                
class MyMlpPolicy(FeedForwardPolicy):                                                            
    def __init__(self, sess, ob_space, ac_space, n_env=1, n_steps=1, n_batch=None, reuse=False, **_kwargs):                                                                                       
        super(MyMlpPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, feature_extraction="mlp", **_kwargs)                     
                                                                                                 
env = gym.make('CartPole-v0')                                                                    
model = TD3(MyMlpPolicy, env)

System Info

Stable-baselines version: 2.7.0, installed via pip on 09/10/19
GPU: N/A
Python Version: 3.5
Tensorflow version: 1.9.0-rc0

Issue Analytics

State:
Created 4 years ago
Comments:5

Top GitHub Comments

1reaction

araffincommented, Sep 10, 2019

Hello,

as mentioned in the documentation, you should be using td3.policies, and a continuous action environment like Pendulum-v0, not cartpole, because TD3 only support continuous actions.

the following code works:

import gym                                                                                 
                                                                                                
from stable_baselines.td3.policies import FeedForwardPolicy                                   
from stable_baselines import TD3                                                                 
                                                                                                
                                                                                                
class MyMlpPolicy(FeedForwardPolicy):                                                            
    def __init__(self, sess, ob_space, ac_space, n_env=1, n_steps=1, n_batch=None, reuse=False, **_kwargs):                                                                                       
        super(MyMlpPolicy, self).__init__(sess, ob_space, ac_space, n_env, n_steps, n_batch, reuse, feature_extraction="mlp", **_kwargs)                     
                                                                                                 
env = gym.make('Pendulum-v0')                                                                    
model = TD3(MyMlpPolicy, env)

0reactions

Miffylicommented, Sep 11, 2019

Ah my bad, I did not notice the issue was with using wrong policies. My bad! Perhaps a check for that, but it is already well-documented with proper highlights ^^

Top Results From Across the Web

TD3 — Stable Baselines 2.10.3a0 documentation

TD3 is a direct successor of DDPG and improves it using three major tricks: clipped double Q-Learning, delayed policy update and target policy...

The Control Method of Twin Delayed Deep Deterministic ...

Twin Delayed Deep Deterministic Policy Gradient with Rebirth Mechanism (RTD3) was proposed. The experimental results show that RTD3 applied to multi degree ...

(PDF) Application of twin delayed deep deterministic policy ...

In particular, the present study exploits the application of twin delayed deep deterministic policy gradient (TD3) based RL for the ...

Comparing Deep Reinforcement Learning Algorithms' Ability ...

The Twin Delayed DDPG (TD3) algorithm introduces a set of ... Reducing the frequency for updating the policy and target networks allows the ......

RULE-BASED POLICY REGULARIZATION FOR REIN

the-art algorithm: TD3. It applies target policy smoothing regularization to avoid overfitting in the value estimate with deterministic policies.