question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Huge actor loss in `TD3`

See original GitHub issue

Question

I am using a TD3 agent to solve a MountainCarContinuous-v0 environment. Is it normal that the actor’s loss gets huge at some point? Isn’t this dangerous for the weights of the model? The weird part is that the agent is actually solving the environment, since the accumulated reward is gradually increasing.

Additional context

This is my code (as suggested by @qgallouedec in https://github.com/DLR-RM/stable-baselines3/issues/936):

import gym
import numpy as np
from stable_baselines3 import TD3
from stable_baselines3.common.noise import OrnsteinUhlenbeckActionNoise

env = gym.make("MountainCarContinuous-v0")

action_noise = OrnsteinUhlenbeckActionNoise(np.zeros(1), 0.5 * np.ones(1))
model = TD3("MlpPolicy", env, action_noise=action_noise, verbose=1)

model.learn(total_timesteps=1_000_000)

This is the logger’s output after some iterations.

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 361      |
|    ep_rew_mean     | 60.8     |
| time/              |          |
|    episodes        | 68       |
|    fps             | 111      |
|    time_elapsed    | 219      |
|    total_timesteps | 24547    |
| train/             |          |
|    actor_loss      | -14.4    |
|    critic_loss     | 1.08     |
|    learning_rate   | 0.001    |
|    n_updates       | 24471    |
---------------------------------

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
araffincommented, Jun 20, 2022

it depends on cumulative reward in your environment…

yes, that’s how the actor loss is defined: https://github.com/DLR-RM/stable-baselines3/blob/a6f5049a99a4c21a6f0bcce458ca3306cef310e0/stable_baselines3/td3/td3.py#L182

It depends on the magnitude of the reward and on the discount factor.

0reactions
AndreasKaratzascommented, Jun 20, 2022

Yeah, correct. Well, I think that covers my question. Thank you all 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Problems with training actor-critic (huge negative loss) - Reddit
I am implementing actor critic and trying to train it on some simple environment like CartPole but my loss goes towards -∞ and...
Read more >
Visualizing the Loss Landscape of Actor Critic Methods ... - arXiv
Figure 4: Actor loss functions of TD3 trained on Walker2d, Hopper, Ant, and HalfCheetah. Action smoothing on the upper row. Action smoothing in ......
Read more >
actor critic policy loss going to zero (with no improvement)
As for the value, a high loss at the beginning is expected because it is essentially guessing at what the optimal value is....
Read more >
Artificial Intelligence Learns to Walk with Actor Critic Deep ...
Twin Delayed Deep Deterministic Policy Gradients ( TD3 ) is a state of the art actor critic algorithm for mastering environments with ...
Read more >
Introduction to Reinforcement Learning (DDPG and TD3) for ...
Eventually it the actor will do better actions (maybe maybe maybe) and the loss will converge to zero or something.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found