question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Maybe there is one problem in implementing the class PrioritizedReplayBuffer

See original GitHub issue

In the file https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/buffers.py, (line 206) total = self._it_sum.sum(0, len(self._storage) - 1) Use the above code to compute the total priorities and set param end of function self._it_sum.sum to len(self._storage) - 1.

    def _sample_proportional(self, batch_size):
        mass = []
        total = self._it_sum.sum(0, len(self._storage) - 1)
        # TODO(szymon): should we ensure no repeats?
        mass = np.random.random(size=batch_size) * total
        idx = self._it_sum.find_prefixsum_idx(mass)
        return idx

But in the file https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/segment_tree.py, (line 75) the code end -= 1 in the function reduce which is called by the above function self._it_sum.sum also subtract by 1.

    def reduce(self, start=0, end=None):
        """
        Returns result of applying `self.operation`
        to a contiguous subsequence of the array.
            self.operation(arr[start], operation(arr[start+1], operation(... arr[end])))
        :param start: (int) beginning of the subsequence
        :param end: (int) end of the subsequences
        :return: (Any) result of reducing self.operation over the specified range of array elements.
        """
        if end is None:
            end = self._capacity
        if end < 0:
            end += self._capacity
        end -= 1
        return self._reduce_helper(start, end, 1, 0, self._capacity - 1)

Has it been repeatedly subtracted by 1?

I simply verified my idea with the following code.

from stable_baselines.common.buffers import PrioritizedReplayBuffer
buffer = PrioritizedReplayBuffer(100, 0.6)
x = np.array([1.])
for _ in range(10):
    buffer.add(x, x, x, x, x)
print(buffer._it_sum.sum(0, len(buffer._storage-1)))#result:9.0
print(buffer._it_sum.sum(0, len(buffer._storage)))#result:10.0

If changing len(buffer._storage-1) to len(buffer._storage), I can get the correct result. Because I add 10 new data into the buffer, the total priorities I think should be 10. If I misunderstood the code, please let me know.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
Miffylicommented, Jul 28, 2020

@Jogima-cyber

Dare I say most of it is working, except the last added sample is not included in the random sampling process. Given the number of samples in buffer this is seems like a minuscule error (which still should be fixed!), but I can not say for sure if the effect on learning is small.

@UPUPGOO

Any update on PR for this? I am asking to check if somebody is working on this and wants to make a PR out of it. If not, I can add it.

1reaction
UPUPGOOcommented, Jul 14, 2020

By the way, the codes for calculating weights in the function sample of the class PrioritizedReplayBuffer can be simplified.

change

p_min = self._it_min.min() / self._it_sum.sum()
max_weight = (p_min * len(self._storage)) ** (-beta)
p_sample = self._it_sum[idxes] / self._it_sum.sum()
weights = (p_sample * len(self._storage)) ** (-beta) / max_weight

to

weights2 = (self._it_sum[idxes] / self._it_min.min()) ** (-beta)

This can be derived by simple mathematical derivation.

I also did some experiments to verify this with the following code.

from stable_baselines.common.buffers import PrioritizedReplayBuffer

buffer = PrioritizedReplayBuffer(100, 0.6)
x = np.array([1.])
for i in range(10):
    x = np.array([i])
    buffer.add(x, x, x, x, x)
#update priorities [0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5 ]
buffer.update_priorities(np.arange(10), np.linspace(0.05, 0.5, 10))
data = buffer.sample(10, beta=0.5)
weights1 = data[-2]
idxes = data[-1]
weights2 = (buffer._it_sum[idxes] / buffer._it_min.min()) ** (-0.5)
print(weights1 - weights2)

The result is all 0. The original codes are more like the formula of the paper, but the simplified codes I think are much faster.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to implement Prioritized Experience Replay for a Deep Q ...
We will focus on the class `ReplayBuffer` as it contains most of the implementation related to the Prioritized Experience Replay, ...
Read more >
D3QN Agent with Prioritized Experience Replay - PyLessons
There is no bias because each experience has the same chance to update our weights. But, because we use priority sampling, purely random ......
Read more >
How To Speed Up Training With Prioritized Experience Replay
Well this video explains just that with a simple example in actual code! ... Prioritized Experience Replay Research Paper: ...
Read more >
Improving the Double DQN algorithm using prioritized ...
Prioritized Experience Replay. Using an experience replay buffer naturally leads to two issues that need to be addressed.
Read more >
Prioritized Experience Replay in DRQN - Kamal
Since the agent sees rewards very rarely, the rewards from almost all transitions in the replay buffer will be zero, and the Q-function...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found