Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Questions about the local state usage and PER updates

See original GitHub issue

@chuyangliu Could you help me, please, with some questions?

I read in your docs, about the algorithms, that you’re using a local state vector:

The second part is the local state vector, which tells the snake its surrounding situation. The vector contains 3 values (0 or 1), indicating whether the point in front/left/right of the snake head is dangerous (i.e., wall or body in the direction).

I’m achieving more or less the same performance as you in the global state, but the local state brought, as you showed, huge improvements. In you code, you verify which of those (3 or 4) positions are safe, flatten and then you stack them horizontally (axis = 1) with the stack vector, right? Is it enough to make it usable by the model? Also, everything that is food or body or important, you assign as 1. But does the network understand by itself which is which (to get objectives like food, or avoid the body…)?

Your code

def _state(self):
        """Return a vector indicating current state."""

        # Visual state
        visual_state = np.zeros(self._SHAPE_VISUAL_STATE, dtype=np.int32)
        for i in range(1, self.map.num_rows - 1):
            for j in range(1, self.map.num_cols - 1):

                pos = Pos(i, j)
                if self._USE_RELATIVE:
                    if self.snake.direc == Direc.LEFT:
                        pos = Pos(self.map.num_rows - 1 - j, i)
                    elif self.snake.direc == Direc.UP:
                        pos = Pos(i, j)
                    elif self.snake.direc == Direc.RIGHT:
                        pos = Pos(j, self.map.num_cols - 1 - i)
                    elif self.snake.direc == Direc.DOWN:
                        pos = Pos(self.map.num_rows - 1 - i, self.map.num_cols - 1 - j)

                t = self.map.point(pos).type
                if t == PointType.EMPTY:
                    visual_state[i - 1][j - 1][0] = 1
                elif t == PointType.FOOD:
                    visual_state[i - 1][j - 1][1] = 1
                elif t == PointType.HEAD_L or t == PointType.HEAD_U or \
                     t == PointType.HEAD_R or t == PointType.HEAD_D:
                    visual_state[i - 1][j - 1][2] = 1
                elif t == PointType.BODY_LU  or t == PointType.BODY_UR or \
                     t == PointType.BODY_RD  or t == PointType.BODY_DL or \
                     t == PointType.BODY_HOR or t == PointType.BODY_VER:
                    visual_state[i - 1][j - 1][3] = 1
                else:
                    raise ValueError("Unsupported PointType: {}".format(t))

        if self._USE_VISUAL_ONLY:
            return visual_state.flatten()
        else:
            # Important state
            important_state = np.zeros(self._NUM_IMPORTANT_FEATURES, dtype=np.int32)
            head = self.snake.head()

            if self._USE_RELATIVE:
                for i, action in enumerate([SnakeAction.LEFT, SnakeAction.FORWARD, SnakeAction.RIGHT]):
                    direc = SnakeAction.to_direc(action, self.snake.direc)
                    if not self.map.is_safe(head.adj(direc)):
                        important_state[i] = 1
            else:
                for i, direc in enumerate([Direc.LEFT, Direc.UP, Direc.RIGHT, Direc.DOWN]):
                    if not self.map.is_safe(head.adj(direc)):
                        important_state[i] = 1

            return np.hstack((visual_state.flatten(), important_state))

My code

    def state(self):
        """Create a matrix of the current state of the game."""
        body = self.snake.return_body()
        canvas = zeros((var.BOARD_SIZE, var.BOARD_SIZE))

        for part in body:
            canvas[part[0] - 1, part[1] - 1] = 1.

        canvas[self.food_pos[0] - 1, self.food_pos[1] - 1] = .5

        return canvas

For implementing PER, what weights do you use? Absolute difference between Q(s,a) and Q(s’,a’) or the MSE? Do you always update the PER in the observations you batched?

_Originally posted by @Neves4 in https://github.com/chuyangliu/Snake/commit/5a18244dc0fa021663f044f34e3c1c918a72631c#commitcomment-30772870_

Issue Analytics

State:
Created 5 years ago
Comments:9 (4 by maintainers)

Top GitHub Comments

2reactions

chuyangliucommented, Oct 5, 2018

You are welcome! PER considers an experience “good” if the experience produces large error between the actual value and predict value of Q generated by current model. Personally I think it’s more reliable than just thinking “latest experiences may be better than earliest experiences”. But again, experiments are necessary to finally confirm which is better. 😃

And sorry I haven’t heard about Hindsight Experience Replay since I am not focusing on reinforcement learning recently.

1reaction

chuyangliucommented, Oct 6, 2018

It’s the number of episodes.
Yes, I didn’t stack the states for 4 frames like the paper suggests. Instead I just consider a training sample as only one experience/frame. And I think I didn’t take the current action into account anywhere else.

Top Results From Across the Web

Coronavirus State and Local Fiscal Recovery Funds

The Final Rule FAQs provide responses to frequently asked questions regarding the use of funds under the Final Rule. The Department of Treasury...

ARPA Local Relief Frequently Asked Questions

These answers will be updated as additional information becomes available. ... State and local funding allocated in the American Rescue Plan is subject...

Local Sales and Use Tax Frequently Asked Questions

What tax rate do I use? The Texas state sales and use tax rate is 6.25 percent, but local taxing jurisdictions (cities, counties,...

State & Local Government | whitehouse.gov

Learn more about how state and local government functions in the U.S. Constitutional system.

Frequently Asked Questions from FTA Grantees Regarding ...

CA11: May a transit agency use CARES Act or CRRSAA funds to pay intercity bus service providers that it usually pays with State/local...