Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Refactor `predict` method in `BasePolicy` class

See original GitHub issue

🚀 Feature

A clear and concise description of the feature proposal.

At present the predict method in the BasePolicy class contains quite a lot of logic that could be reused to provide similar functionality. In particular, the current logic of the this method is as follows:

Pre-process NumPy observation and convert it into a PyTorch Tensor.
Generate a action(s) from the child policy class through the _predict method, with these actions in the form of a PyTorch Tensor.
Post-process the actions, including converting the PyTorch Tensor into a NumPy array.

My suggestion is that steps (1) and (3) are refactors into individual functions on the BasePolicy class, which are then called in the predict method.

Motivation

I would like to introduce some policy classes for which I can calculate the action probabilities and not the actions themselves. (This is for some work on off-policy estimation that I am doing.)

Let’s call this functionality predict_probabilities, then at present the initial logic of this functionality is identical to step (1) of the predict method. If the code is refactored as suggested, then both approaches can use the same pre-processing functionality.

Additionally, I think the refactor would generally make the code more readable and easier to extend parts of functionality to other similar uses.

Pitch

I am happy to do a PR for the proposed refactor, so I would like to know whether or not you would be happy with the proposal.

Alternatives

None

Additional context

None

### Checklist

I have checked that there is no similar issue in the repo (required)

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:20 (5 by maintainers)

Top GitHub Comments

2reactions

Miffylicommented, Sep 5, 2021

(Sorry, the code formatting was messing up, so removed it…)

Try putting your code ``` like this ```, that should look nice 😃

It seems to work, but it’s only giving me one probability, which is fine since I only have two actions so I subtract from 1 to get the other. Does that look right?

Kind of. Your code is answering the question “what is the log-probability of the action it chose”. You need to inspect the distribution variable if you want to know probability of picking any one of the actions. The exact code depends on your action space, but for Discrete space this would be distribution.distribution.probs (the distribution.distribution object is a pytorch distribution object).

Btw, do you guys have a patreon page? Stable-baselines is such an excellent project! It’s taught me a lot about coding/machine learning and it’s so straight forward. Love it!

Nope, maintainers are doing this for their free-time and partially for their work 😃. Best way to contribute back is by giving comments, spotting errors and best of all: doing PRs to update things!

2reactions

EloyAnguianocommented, Aug 15, 2021

Please please make this happen, it would be so nice to see how ‘certain’ the neural net is of it’s action on each step. I’m working on a wind/tides program for myself to classify environmental conditions into ‘go kiting’ or ‘don’t go kiting’ 😃 I know, not how the gym env supposed to be used, but stable baselines just make it so easy to code even I can do it. I don’t understand the mathematics enough underneath it to make a PR for you guys. I tried hacking around some print statements to no avail 😦 (maybe someone has a quick and dirty one for PPO).

@ziegenbalg Actually you can make a trick for this with PPO. As @Miffyli said above, you can use the evaluate_actions method for the policy object. This example worked for me (I think, maybe @Miffyli sees some error):

# Coverts lists to tensors
states_tensor = th.from_numpy(np.asmatrix(states))
actions_tensor = th.from_numpy(np.ones_like(np.asmatrix(actions))) # Prob of perform action

values, log_prob, entropy = best_model.policy.evaluate_actions(states_tensor,actions_tensor)

probs = np.exp(log_prob.detach().numpy())

Note that im trying to get the probability of perform an action (my action space is Binary at this case), therefore I use the np.ones_like function.

Top Results From Across the Web

Can feature requests reveal the refactoring types?

predict the appropriate refactoring, given as input an Open issue description. ... approach consumes a set of feature requests, labeled with 4 classes, ......

Class-Level Refactoring Prediction by Ensemble Learning ...

The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software.

Changelog — Stable Baselines 2.10.3a0 documentation

Fixed bug in pretraining method that prevented from calling it twice. ... Refactored test to remove duplicated code; Add pull request template ...

Stable Baselines Documentation - Read the Docs

calling the .predict() method, this frequently leads to better performance. Looking at the training curve (episode.

Feature requests-based recommendation of ... - UC Homepages

1, the proposed approach follows the following six key steps to predict the need for refactoring and recommend the required refactorings. First, ...