[Question] Non-shared features extractor in on-policy algorithm
See original GitHub issueQuestion
I’ve checked the docs (custom policy -> advanced example), but it is not clear to me how to create a custom policy without sharing the features extractor between the actor and the critic networks in on-policy algorithms.
If I pass a features_extractor_class
in the policy_kwargs
, this is shared by default I think.
I can have a non-shared mlp_extractor
by implementing my own _build_mlp_extractor
method in my custom policy and creating a network with 2 distinct sub-networks (self.policy_net
and self.value_net
), but I didn’t understand how to do the same with the features extractor.
On the docs (custom policy -> custom features extractor), it says: Therefore, since I’m using A2C, I think it should be possible to have a non-shared features extractor by implementing my own policy, just I didn’t understand how to do it.
Thanks in advance any clarification!
Checklist
- I have read the documentation (required)
- I have checked that there is no similar issue in the repo (required)
Issue Analytics
- State:
- Created a year ago
- Comments:8 (6 by maintainers)
Top GitHub Comments
@wlxer I think you could pass the dimensions as parameters to your policy network (not necessarily within
kwargs
, but explicitly). Then you “save” them in some net’s attributes and only then you call the superclass’ constructor. It is something that I actually do in my code, but I didn’t report it previously because it was just a personal need.You can do something a bit like this:
I managed to make it run without errors! 🎉
But since I haven’t found a guide/demo nor a similar issue here, I’ll briefly explain how I did it:
ActorCriticPolicy
).forward
,extract_features
,evaluate_actions
andpredict_values
).Quick demo
Hope it can help someone!