Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] different activation functions in the network_architecture through the policy_kwargs

See original GitHub issue

🚀 Feature

Introduce the possibility of passing multiple activation functions to the policy newtork using the policy_kwargs.

Motivation

From what I understand, through the policy_kwargs it is possible to pass an activation function to be used by the net_arch part of the policy network. Oftentimes, though, the policy net (pi) and value function net (vf) need different activation functions. It looks like the only way to have different activation functions in these two sub-networks is to implement our own policy network, as shown in the advanced example here in your documentation. This is mentioned also in this issue #481

Alternatives

Ideally it would be possible to have multiple activation functions as follows: one for the shared layers and one for each of the layers of the two sub-networks (policy net (pi) and value net (vf)), mimicking how the architecture is passed. The architecture is passed this way: [<shared layers>, dict(vf=[<non-shared value network layers>], pi=[<non-shared policy network layers>])] (source: here), so I think it would be possible to use the same structure, but using PyTorch’s activation functions instead of integers.

Example:

from torch.nn import ReLU, Softmax, Tanh

model = A2C('MultiInputPolicy', env,
             policy_kwargs=dict(
                 net_arch=[256, dict(pi=[128, 50], vf=[32, 1])],
                 activation_fn=[Tanh, dict(pi=[ReLU, Softmax], vf=[ReLU, ReLU])]
             )
        )

Issue Analytics

State:
Created a year ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

araffincommented, Sep 19, 2022

is used in all the other layers of both the policy net and the value net. Is that correct?

yes

0reactions

AlexPasquacommented, Sep 13, 2022

@araffin ok so, if I understood correctly, the last layer of the policy net for discrete actions has automatically a softmax activation function, then the one I put in the policy_kwargs is used in all the other layers of both the policy net and the value net. Is that correct?

(I’ll try to work on a draft PR for that feature anyway!)