question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] Flatten Discrete box potentially problematic

See original GitHub issue

Question

Flattening Discrete space to Box space may be problematic. The flatten wrapper converts Discrete to Box as a one-hot encoding. Suppose the original space is Discrete(3), then:

0 maps to [1, 0, 0]
1 maps to [0, 1, 0]
3 maps to [0, 0, 1]

When we sample the action space for random actions, it samples the Box, which can produce any of the eight combination of 0s and 1s in a three-element array, namely:

[0, 0, 0],
[0, 0, 1], *
[0, 1, 0], *
[0, 1, 1],
[1, 0, 0], *
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]

Only three of these eight that I’ve starred are useable in the strict sense of the mapping. The unflatten function for a Discrete space uses np.nonzero(x)[0][0], and here’s at table of what the above arrays map to:

+ ------------------ + ---------------- + --------------------------------------------- +
| In Flattened Space | np.nonzero(x)[0] | np.nonzero(x)[0][0] (aka discrete equivalent) |
+ ------------------ + ---------------- + --------------------------------------------- +
| 0, 0, 0            | Error            | Error                                         |
| 0, 0, 1            | [2]              | 2                                             |
| 0, 1, 0            | [1]              | 1                                             |
| 0, 1, 1            | [1, 2]           | 1                                             |
| 1, 0, 0            | [0]              | 0                                             |
| 1, 0, 1            | [0, 2]           | 0                                             |
| 1, 1, 0            | [0, 1]           | 0                                             |
| 1, 1, 1            | [0, 1, 2]        | 0                                             |
+ ------------------ + ---------------- + --------------------------------------------- +

Implications

Obviously, [0, 0, 0] will fail because there is no nonzero. Importantly, only one eighth of the random samples will map to 2. One fourth will map to 1, and one half will map to 0. This has some important implications on exploration, especially if action 2 is the “correct action” throughout much of the simulation. I’m very curious why I have not seen this come up before. This type of skewing in the random sampling can have major implications in the way the algorithm explores and learns, and the problem is exacerbated when Discrete(n), n is large. Am I missing something here?

Solution

This is unique to Discrete spaces. Instead of mapping to a one-hot encoding we could just map to a box of a single element with the appropriate range. Discrete(n) maps to Box(0, n-1, (1,), int) instead of Box(0, 1, (n,), int).

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:17 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
rusu24edwardcommented, Nov 21, 2022

@pseudo-rnd-thoughts Yup, just wanted to provide an example script in case anyone else comes by this post.

Sure, I can add that

1reaction
RedTachyoncommented, Nov 9, 2022

I’m not yet completely sure what we want to do here. I’d definitely be opposed to adding an extra space just for the right flattening, unless we completely change our philosophy with regards to the spaces.

If this one change makes it so that we can always ensure the reversibility of flatten -> unflatten, then it’d be tempting.

At the same time, the whole idea of flatten and unflatten is not necessarily well though-out, since it was introduced in some OpenAI code, and then received various updates which might or might not have followed the original intentions.

My interpretation of this functionality is more or less “I don’t care what this fancy space is, I want a fixed sized float32 vector for my NN”. In which case any reversibility fails due to the dtypes. But we don’t actually do any automatic type conversion.

I also feel like flattening into one-hot is much better on the algorithmic side. If you want to “flatten” a discrete action, you definitely don’t want to keep the ordinal structure – i.e. action 1 isn’t between action 0 and action 2 in any meaningful sense, so a one-hot embedding is likely to work much better.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to flatten space when action masking? - RLlib - Ray
I separated into 3 related questions: 1) Instead of using flatten in ActionMaskModel(TFModelV2) , can I simply move "obs1" , "obs2" outside "actual_obs"...
Read more >
How to flatten spaces? · Issue #1830 · openai/gym - GitHub
So I am using the Blackjack-v0 environment and trying a q-learning algorithm. The state spaces is env.observation_space Out[39]: ...
Read more >
OpenAI Gym Custom Environment Observation Space returns ...
It may not be the gym problem. ... flatten_space(discrete) Box(5,) >>> flatten(box, box.sample()) in flatten_space(box)
Read more >
Population Dynamics - HHMI BioInteractive
biotic potential, carrying capacity, density-dependent and independent factors or regulation, emigration, exponential and logistic growth models, ...
Read more >
After 100 Years, Can We Finally Crack Post's Problem of Tag ...
Take a string of 0s and 1s. Drop its first ν elements. Look at the first dropped element. If it's a 0 add...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found