Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Questions about the local state usage and PER updates

See original GitHub issue

@chuyangliu Could you help me, please, with some questions?

  1. I read in your docs, about the algorithms, that you’re using a local state vector:

The second part is the local state vector, which tells the snake its surrounding situation. The vector contains 3 values (0 or 1), indicating whether the point in front/left/right of the snake head is dangerous (i.e., wall or body in the direction).

I’m achieving more or less the same performance as you in the global state, but the local state brought, as you showed, huge improvements. In you code, you verify which of those (3 or 4) positions are safe, flatten and then you stack them horizontally (axis = 1) with the stack vector, right? Is it enough to make it usable by the model? Also, everything that is food or body or important, you assign as 1. But does the network understand by itself which is which (to get objectives like food, or avoid the body…)?

Your code

def _state(self):
        """Return a vector indicating current state."""

        # Visual state
        visual_state = np.zeros(self._SHAPE_VISUAL_STATE, dtype=np.int32)
        for i in range(1, - 1):
            for j in range(1, - 1):

                pos = Pos(i, j)
                if self._USE_RELATIVE:
                    if self.snake.direc == Direc.LEFT:
                        pos = Pos( - 1 - j, i)
                    elif self.snake.direc == Direc.UP:
                        pos = Pos(i, j)
                    elif self.snake.direc == Direc.RIGHT:
                        pos = Pos(j, - 1 - i)
                    elif self.snake.direc == Direc.DOWN:
                        pos = Pos( - 1 - i, - 1 - j)

                t =
                if t == PointType.EMPTY:
                    visual_state[i - 1][j - 1][0] = 1
                elif t == PointType.FOOD:
                    visual_state[i - 1][j - 1][1] = 1
                elif t == PointType.HEAD_L or t == PointType.HEAD_U or \
                     t == PointType.HEAD_R or t == PointType.HEAD_D:
                    visual_state[i - 1][j - 1][2] = 1
                elif t == PointType.BODY_LU  or t == PointType.BODY_UR or \
                     t == PointType.BODY_RD  or t == PointType.BODY_DL or \
                     t == PointType.BODY_HOR or t == PointType.BODY_VER:
                    visual_state[i - 1][j - 1][3] = 1
                    raise ValueError("Unsupported PointType: {}".format(t))

        if self._USE_VISUAL_ONLY:
            return visual_state.flatten()
            # Important state
            important_state = np.zeros(self._NUM_IMPORTANT_FEATURES, dtype=np.int32)
            head = self.snake.head()

            if self._USE_RELATIVE:
                for i, action in enumerate([SnakeAction.LEFT, SnakeAction.FORWARD, SnakeAction.RIGHT]):
                    direc = SnakeAction.to_direc(action, self.snake.direc)
                    if not
                        important_state[i] = 1
                for i, direc in enumerate([Direc.LEFT, Direc.UP, Direc.RIGHT, Direc.DOWN]):
                    if not
                        important_state[i] = 1

            return np.hstack((visual_state.flatten(), important_state))

My code

    def state(self):
        """Create a matrix of the current state of the game."""
        body = self.snake.return_body()
        canvas = zeros((var.BOARD_SIZE, var.BOARD_SIZE))

        for part in body:
            canvas[part[0] - 1, part[1] - 1] = 1.

        canvas[self.food_pos[0] - 1, self.food_pos[1] - 1] = .5

        return canvas
  1. For implementing PER, what weights do you use? Absolute difference between Q(s,a) and Q(s’,a’) or the MSE? Do you always update the PER in the observations you batched?

_Originally posted by @Neves4 in

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

chuyangliucommented, Oct 5, 2018

You are welcome! PER considers an experience “good” if the experience produces large error between the actual value and predict value of Q generated by current model. Personally I think it’s more reliable than just thinking “latest experiences may be better than earliest experiences”. But again, experiments are necessary to finally confirm which is better. 😃

And sorry I haven’t heard about Hindsight Experience Replay since I am not focusing on reinforcement learning recently.

chuyangliucommented, Oct 6, 2018
  1. It’s the number of episodes.
  2. Yes, I didn’t stack the states for 4 frames like the paper suggests. Instead I just consider a training sample as only one experience/frame. And I think I didn’t take the current action into account anywhere else.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Coronavirus State and Local Fiscal Recovery Funds
The Final Rule FAQs provide responses to frequently asked questions regarding the use of funds under the Final Rule. The Department of Treasury...
Read more >
ARPA Local Relief Frequently Asked Questions
These answers will be updated as additional information becomes available. ... State and local funding allocated in the American Rescue Plan is subject...
Read more >
Local Sales and Use Tax Frequently Asked Questions
What tax rate do I use? The Texas state sales and use tax rate is 6.25 percent, but local taxing jurisdictions (cities, counties,...
Read more >
State & Local Government |
Learn more about how state and local government functions in the U.S. Constitutional system.
Read more >
Frequently Asked Questions from FTA Grantees Regarding ...
CA11: May a transit agency use CARES Act or CRRSAA funds to pay intercity bus service providers that it usually pays with State/local...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found