question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[question] Custom output activation function for actor (A2C, SAC)

See original GitHub issue

I really like stable_baselines and would like to use it for a custom environment with continuous actions. To match the specific needs of the environment, I need to apply a custom activation function to the output of the policy/actor network. In particular, I want to apply separate softmax activation functions to different parts of the output (e.g., softmax(first n action), then softmax(next n), etc).

I know how to define such an activation in general, but don’t know what the best and cleanest way is to implement such a policy in stable_baselines. I’d like to reuse the MlpPolicy and just change the activation of the output layer. I’m interested in using this with A2C and SAC.

In A2C, it seems like this is handled here or here. But I don’t want to mess something up making changes there without being certain.

In SAC, I guess I would only have to adjust this part: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/sac/policies.py#L217 Or do I need to change the log_std below as well?

This seems related to this issue. Unfortunately, it didn’t help me figure out my problem/question.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
araffincommented, Nov 14, 2019

just created a separate softmax activation functions for a) action 1 and 2 and b) for action 3 and 4 to make sure that each sum up to 1

Can’t you do the normalization inside the environment? Or using a gym wrapper (cf tutorial)?

You also have to know that doing so, you change the probability distribution (for SAC or A2C, in the case of DDPG/TD3, there is no, so it is not a problem).

0reactions
stefanbschneidercommented, Nov 14, 2019

Ok, thanks! I’ll see how far I get with normalizing inside the environment.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SAC — Stable Baselines 2.10.3a0 documentation
Soft Actor-Critic (SAC) Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, This implementation borrows code from original ...
Read more >
Stable-Baselines3: Reliable Reinforcement Learning ...
To help with this problem, we present Stable-Baselines3 (SB3), ... SAC # Custom actor (pi) and value function (vf) networks # of two...
Read more >
Newest 'actor-critic-methods' Questions
I understand the general idea behind the Actor-Critic architecture. The actor maps state to action, and the critic maps state + action to...
Read more >
Stable Baselines Documentation - Read the Docs
When applying RL to a custom problem, you should always normalize the ... A better solution would be to use a squashing function...
Read more >
Algorithms — Ray 2.2.0 - the Ray documentation
actor_hidden_activation – The activation used in the actor's fc network. ... on top of any off-policy Q-learning algorithm (here, we provide this for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found