Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[rllib] Add automatic autoregressive action builder

See original GitHub issue

We should support automatic autoregressive action model builder, for now for discrete actions, that supports parametricity in child spaces (choose a child space based on action selected in parent space), which takes as input a specification of an action dependence tree. (We are not referring to variable length action spaces conditioned on observations).

Here we take elements in a list to be independently sampled actions, and tuples, which must be of even length, express parent child dependencies, so that the even entries (counting from 0) represent the parents, and children lie in the odd entries, with tuples of length > 2 expressing parametric decomposition.

The builder will then collect the results and return the indices of the selected actions in the same format, where None will replace entries where parametric actions were not selected.

In addition, standard function calls such as entropy and kl will be built automatically, although due to the nested structure we might want to support either a mean over samples approach or explicit computation to get a more accurate estimate of the respective quantities.

A possible caveat is that the gym action space datatypes don’t seem to support such a return format.

Example specification:

[(19, [5,5], 1, [15]), 8]

Example results:

[(8, [3,3], None, None), 6]
[(None, None, 1, 12), 8]

In a simple a2|a1 scenario, both

(2,2)
[(2,2)]

should be fine.

The models will by default be conditioned on a context input, and built and stored in the following format, with branching conditioned determined by the above tree structure. (Perhaps I should use an actual tree instead of this “DSL”). Parametric child spaces will introduce a special intermediate index, which is bracketed.

[(models["0"], models["0(0)0"], models["0(0)1"], models["0(1)0"]), models["1"]]

Issue Analytics

State:
Created 4 years ago
Comments:17 (13 by maintainers)

Top GitHub Comments

1reaction

jon-chuangcommented, Aug 21, 2019

Sure. I’ve written most of the code, will submit a pull request soon. I’ll try to use a hypothetical starcraft II env as an example, which has a huge nested action space. In particular, I will show how to use the builder to implement select rectangle. I’ll write some documentation.

https://github.com/deepmind/pysc2/blob/master/docs/environment.md#actions https://arxiv.org/abs/1708.04782 (section 4.2)

0reactions

RocketRidercommented, Mar 26, 2021

Did you get it running? I am trying to implement something very similar, but running into training issues.

Top Results From Across the Web

Models, Preprocessors, and Action Distributions — Ray 2.2.0

So far we talked about a) the default models that are built into RLlib and are being provided automatically if you don't specify...

Custom Autoregressive Action Models/Distributions - RLlib - Ray

Hello, I'm looking into trying out an autoregressive action model for one of my projects. I just looked through the example autoregressive ......

Sample Collections and Trajectory Views — Ray 2.2.0

How does RLlib determine, which Views are required? Setting ViewRequirements manually in your Model; Setting ViewRequirements manually after Policy construction ...

RLlib Models, Preprocessors, and Action Distributions

Additional supervised / self-supervised losses can be added via the custom_loss method ... This is commonly used to implement autoregressive action outputs.

Environments — Ray 2.2.0

RLlib will auto-vectorize Gym envs for batch evaluation if the num_envs_per_worker config is set, or you can define a custom environment class that...