Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Maybe there is one problem in implementing the class PrioritizedReplayBuffer

See original GitHub issue

In the file https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/buffers.py, (line 206) total = self._it_sum.sum(0, len(self._storage) - 1) Use the above code to compute the total priorities and set param end of function self._it_sum.sum to len(self._storage) - 1.

    def _sample_proportional(self, batch_size):
        mass = []
        total = self._it_sum.sum(0, len(self._storage) - 1)
        # TODO(szymon): should we ensure no repeats?
        mass = np.random.random(size=batch_size) * total
        idx = self._it_sum.find_prefixsum_idx(mass)
        return idx

But in the file https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/segment_tree.py, (line 75) the code end -= 1 in the function reduce which is called by the above function self._it_sum.sum also subtract by 1.

    def reduce(self, start=0, end=None):
        """
        Returns result of applying `self.operation`
        to a contiguous subsequence of the array.
            self.operation(arr[start], operation(arr[start+1], operation(... arr[end])))
        :param start: (int) beginning of the subsequence
        :param end: (int) end of the subsequences
        :return: (Any) result of reducing self.operation over the specified range of array elements.
        """
        if end is None:
            end = self._capacity
        if end < 0:
            end += self._capacity
        end -= 1
        return self._reduce_helper(start, end, 1, 0, self._capacity - 1)

Has it been repeatedly subtracted by 1?

I simply verified my idea with the following code.

from stable_baselines.common.buffers import PrioritizedReplayBuffer
buffer = PrioritizedReplayBuffer(100, 0.6)
x = np.array([1.])
for _ in range(10):
    buffer.add(x, x, x, x, x)
print(buffer._it_sum.sum(0, len(buffer._storage-1)))#result:9.0
print(buffer._it_sum.sum(0, len(buffer._storage)))#result:10.0

If changing len(buffer._storage-1) to len(buffer._storage), I can get the correct result. Because I add 10 new data into the buffer, the total priorities I think should be 10. If I misunderstood the code, please let me know.

Issue Analytics

State:
Created 3 years ago
Comments:9

Top GitHub Comments

1reaction

Miffylicommented, Jul 28, 2020

@Jogima-cyber

Dare I say most of it is working, except the last added sample is not included in the random sampling process. Given the number of samples in buffer this is seems like a minuscule error (which still should be fixed!), but I can not say for sure if the effect on learning is small.

@UPUPGOO

Any update on PR for this? I am asking to check if somebody is working on this and wants to make a PR out of it. If not, I can add it.

1reaction

UPUPGOOcommented, Jul 14, 2020

By the way, the codes for calculating weights in the function sample of the class PrioritizedReplayBuffer can be simplified.

change

p_min = self._it_min.min() / self._it_sum.sum()
max_weight = (p_min * len(self._storage)) ** (-beta)
p_sample = self._it_sum[idxes] / self._it_sum.sum()
weights = (p_sample * len(self._storage)) ** (-beta) / max_weight

weights2 = (self._it_sum[idxes] / self._it_min.min()) ** (-beta)

This can be derived by simple mathematical derivation.

I also did some experiments to verify this with the following code.

from stable_baselines.common.buffers import PrioritizedReplayBuffer

buffer = PrioritizedReplayBuffer(100, 0.6)
x = np.array([1.])
for i in range(10):
    x = np.array([i])
    buffer.add(x, x, x, x, x)
#update priorities [0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5 ]
buffer.update_priorities(np.arange(10), np.linspace(0.05, 0.5, 10))
data = buffer.sample(10, beta=0.5)
weights1 = data[-2]
idxes = data[-1]
weights2 = (buffer._it_sum[idxes] / buffer._it_min.min()) ** (-0.5)
print(weights1 - weights2)

The result is all 0. The original codes are more like the formula of the paper, but the simplified codes I think are much faster.

Top Results From Across the Web

How to implement Prioritized Experience Replay for a Deep Q ...

We will focus on the class `ReplayBuffer` as it contains most of the implementation related to the Prioritized Experience Replay, ...

D3QN Agent with Prioritized Experience Replay - PyLessons

There is no bias because each experience has the same chance to update our weights. But, because we use priority sampling, purely random ......

How To Speed Up Training With Prioritized Experience Replay

Well this video explains just that with a simple example in actual code! ... Prioritized Experience Replay Research Paper: ...

Improving the Double DQN algorithm using prioritized ...

Prioritized Experience Replay. Using an experience replay buffer naturally leads to two issues that need to be addressed.

Prioritized Experience Replay in DRQN - Kamal

Since the agent sees rewards very rarely, the rewards from almost all transitions in the replay buffer will be zero, and the Q-function...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Maybe there is one problem in implementing the class PrioritizedReplayBuffer

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

evn.render() in doc examples don't run unless n_envs=1 in make_vec_env()

AssertionError: The observation space must inherit from gym.spaces cf https://github.com/openai/gym/blob/master/gym/spaces/