Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to log the gradient and weights of the actor policy in TD3 using tensorboard?

See original GitHub issue

Code snippet:

import tensorflow as tf
def Callback(_locals, _globals):
    self_ = _locals['self']
    tf.summary.histogram("Actor Network Weights Histogram", self_.policy_out.policy.??? )

I cannot understand from the

class FeedForwardPolicy(TD3Policy):

in FeedForwardPolicy(TD3Policy)

how to get the the trainable variables for the network, something like tf.trainable_variables() used to plot historgams in tensorboard.

Issue Analytics

State:
Created 4 years ago
Comments:6

Top GitHub Comments

1reaction

Miffylicommented, May 4, 2020

@yotamitai

Use get_parameters(), which is right next to get_parameter_list().

1reaction

araffincommented, Oct 28, 2019

Hello,

You should take a look at the code of TD3, not the policy, that’s where we use tf.trainable_variables(). (I recommend you to take a look at PPO2 too, where we log more things using tensorboard) You can get the name of the weights (scope + name) using model.get_parameter_list() (cf doc)

Since https://github.com/hill-a/stable-baselines/issues/409, we also added some documentation on how to log additional variables.

Top Results From Across the Web

Artificial Intelligence Learns to Walk with Actor Critic Deep ...

Twin Delayed Deep Deterministic Policy Gradients ( TD3 ) is a state of the art actor critic algorithm for mastering environments with ......

TensorFlow 2.x Implementation For DDPG and TD3

In this article, we will be implementing Deep Deterministic Policy Gradient and Twin Delayed Deep Deterministic Policy Gradient methods with ...

Twin Delayed Deep Deterministic Policy Gradient(TD3) in ...

We will update both online critic networks' weights using the calculated loss value. with tf.GradientTape(persistent=True) as tape: target_actions = self.

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Running python cleanrl/td3_continuous_action.py will automatically record various metrics such as various losses in Tensorboard. Below are the documentation for ...

Policy Gradient Algorithms | Lil'Log

[Updated on 2018-09-30: add a new policy gradient method, TD3.] ... we can simple adjust it with a weighted sum and the weight...