Questions about the local state usage and PER updates
See original GitHub issue@chuyangliu Could you help me, please, with some questions?
- I read in your docs, about the algorithms, that you’re using a local state vector:
The second part is the local state vector, which tells the snake its surrounding situation. The vector contains 3 values (0 or 1), indicating whether the point in front/left/right of the snake head is dangerous (i.e., wall or body in the direction).
I’m achieving more or less the same performance as you in the global state, but the local state brought, as you showed, huge improvements. In you code, you verify which of those (3 or 4) positions are safe, flatten and then you stack them horizontally (axis = 1) with the stack vector, right? Is it enough to make it usable by the model? Also, everything that is food or body or important, you assign as 1. But does the network understand by itself which is which (to get objectives like food, or avoid the body…)?
Your code
def _state(self):
"""Return a vector indicating current state."""
# Visual state
visual_state = np.zeros(self._SHAPE_VISUAL_STATE, dtype=np.int32)
for i in range(1, self.map.num_rows - 1):
for j in range(1, self.map.num_cols - 1):
pos = Pos(i, j)
if self._USE_RELATIVE:
if self.snake.direc == Direc.LEFT:
pos = Pos(self.map.num_rows - 1 - j, i)
elif self.snake.direc == Direc.UP:
pos = Pos(i, j)
elif self.snake.direc == Direc.RIGHT:
pos = Pos(j, self.map.num_cols - 1 - i)
elif self.snake.direc == Direc.DOWN:
pos = Pos(self.map.num_rows - 1 - i, self.map.num_cols - 1 - j)
t = self.map.point(pos).type
if t == PointType.EMPTY:
visual_state[i - 1][j - 1][0] = 1
elif t == PointType.FOOD:
visual_state[i - 1][j - 1][1] = 1
elif t == PointType.HEAD_L or t == PointType.HEAD_U or \
t == PointType.HEAD_R or t == PointType.HEAD_D:
visual_state[i - 1][j - 1][2] = 1
elif t == PointType.BODY_LU or t == PointType.BODY_UR or \
t == PointType.BODY_RD or t == PointType.BODY_DL or \
t == PointType.BODY_HOR or t == PointType.BODY_VER:
visual_state[i - 1][j - 1][3] = 1
else:
raise ValueError("Unsupported PointType: {}".format(t))
if self._USE_VISUAL_ONLY:
return visual_state.flatten()
else:
# Important state
important_state = np.zeros(self._NUM_IMPORTANT_FEATURES, dtype=np.int32)
head = self.snake.head()
if self._USE_RELATIVE:
for i, action in enumerate([SnakeAction.LEFT, SnakeAction.FORWARD, SnakeAction.RIGHT]):
direc = SnakeAction.to_direc(action, self.snake.direc)
if not self.map.is_safe(head.adj(direc)):
important_state[i] = 1
else:
for i, direc in enumerate([Direc.LEFT, Direc.UP, Direc.RIGHT, Direc.DOWN]):
if not self.map.is_safe(head.adj(direc)):
important_state[i] = 1
return np.hstack((visual_state.flatten(), important_state))
My code
def state(self):
"""Create a matrix of the current state of the game."""
body = self.snake.return_body()
canvas = zeros((var.BOARD_SIZE, var.BOARD_SIZE))
for part in body:
canvas[part[0] - 1, part[1] - 1] = 1.
canvas[self.food_pos[0] - 1, self.food_pos[1] - 1] = .5
return canvas
- For implementing PER, what weights do you use? Absolute difference between Q(s,a) and Q(s’,a’) or the MSE? Do you always update the PER in the observations you batched?
_Originally posted by @Neves4 in https://github.com/chuyangliu/Snake/commit/5a18244dc0fa021663f044f34e3c1c918a72631c#commitcomment-30772870_
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (4 by maintainers)
You are welcome! PER considers an experience “good” if the experience produces large error between the actual value and predict value of Q generated by current model. Personally I think it’s more reliable than just thinking “latest experiences may be better than earliest experiences”. But again, experiments are necessary to finally confirm which is better. 😃
And sorry I haven’t heard about Hindsight Experience Replay since I am not focusing on reinforcement learning recently.