[rllib] Add automatic autoregressive action builder
See original GitHub issueWe should support automatic autoregressive action model builder, for now for discrete actions, that supports parametricity in child spaces (choose a child space based on action selected in parent space), which takes as input a specification of an action dependence tree. (We are not referring to variable length action spaces conditioned on observations).
Here we take elements in a list to be independently sampled actions, and tuples, which must be of even length, express parent child dependencies, so that the even entries (counting from 0) represent the parents, and children lie in the odd entries, with tuples of length > 2 expressing parametric decomposition.
The builder will then collect the results and return the indices of the selected actions in the same format, where None
will replace entries where parametric actions were not selected.
In addition, standard function calls such as entropy and kl will be built automatically, although due to the nested structure we might want to support either a mean over samples approach or explicit computation to get a more accurate estimate of the respective quantities.
A possible caveat is that the gym action space datatypes don’t seem to support such a return format.
Example specification:
[(19, [5,5], 1, [15]), 8]
Example results:
[(8, [3,3], None, None), 6]
[(None, None, 1, 12), 8]
In a simple a2|a1 scenario, both
(2,2)
[(2,2)]
should be fine.
The models will by default be conditioned on a context input, and built and stored in the following format, with branching conditioned determined by the above tree structure. (Perhaps I should use an actual tree instead of this “DSL”). Parametric child spaces will introduce a special intermediate index, which is bracketed.
[(models["0"], models["0(0)0"], models["0(0)1"], models["0(1)0"]), models["1"]]
Issue Analytics
- State:
- Created 4 years ago
- Comments:17 (13 by maintainers)
Top GitHub Comments
Sure. I’ve written most of the code, will submit a pull request soon. I’ll try to use a hypothetical starcraft II env as an example, which has a huge nested action space. In particular, I will show how to use the builder to implement select rectangle. I’ll write some documentation.
https://github.com/deepmind/pysc2/blob/master/docs/environment.md#actions https://arxiv.org/abs/1708.04782 (section 4.2)
Did you get it running? I am trying to implement something very similar, but running into training issues.