Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Understanding batch_T

See original GitHub issue

Hi, batch_T (int) – number of time-steps per sample batch I don’t understand the effect of batch_T in samplers. I see another batch_T in R2D1 too. So what is the difference? What is the relation between them and how we should set these two values? And also batch_B values for R2D1 and its sampler?

I want to understand the effect of this parameter, batch_T, especially in recurrent algos such as R2D1 and PPO_LSTM. Does it affect the memory/history information that the LSTM can learn/memorize? Based on the code, the agent uses a trajectory of size batch_T to train LSTM, so it can limit the time horizon the network can memorize info. So it should be set to average trajectory size of the env, based on each env, right?

Thank you

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:6 (2 by maintainers)

Top GitHub Comments

3reactions

astookecommented, Apr 20, 2020

Hi, good questions!

To clarify some earlier questions…in the policy gradient algorithms, like PPO, there is only the sampler’s batch_T and batch_B, and then whatever the sampler returns in one iteration forms the minibatch for the algorithm. In replay-based algorithms like DQN, there is still the batch_T and batch_B of the sampler, which keep the same meaning as the amount of data collected per iteration. But these algorithms also have their own batch_size–or in the case of R2D1 batch_T and batch_B–to determine how much data is replayed from the buffer for each training minibatch.

Regarding done=True for multiple timesteps, yes that is because when and environment episode ends during sampling, the environment might not reset until the beginning of the following sampling batch, so that the start of an episode aligns with the interval for storing the RNN state. But in the meantime, all the (dummy) data from the inactive environment still gets written to the replay buffer. Populating done=True for all those steps makes it obvious where the new episode actually begins in the buffer, which is the first new step where done=False. And if you look at the valid_from_done() function which generates the mask for the RNN, it masks out all data after the first done=True, so it’s ok to have more done=True after that. Kind of a long explanation, but does that make sense?

@bmazoure The discrepancy between the length of the observations returned is because it also includes the target observations, which extend out to n steps past the agent observations, for n-step returns: https://github.com/astooke/rlpyt/blob/668290d1ca94e9d193388a599d4f719bc3a23fba/rlpyt/replays/sequence/n_step.py#L88
Then inside the R2D1 algorithm it moves the one copy of the whole observation set to the GPU once, and then creates sliced views to this data for the agent inputs and target inputs. R2D1 default n_step_return is 5, so that should add up. Sorry that’s a tricky one!

2reactions

astookecommented, Apr 27, 2020

Hi! That is correct, the environment state carries forward to the next sampling batch. The environment only resets when an episode finishes, even if the sampler batch_T is much shorter than this. So the sampler’s batch_T should have small-to-no effect on training, whereas the algorithm’s batch_T can have a large effect, because this the is the length of LSTM backprop-through-time for training. Hope that helps!

Top Results From Across the Web

[1806.02375] Understanding Batch Normalization - arXiv

Abstract: Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks.

Batch Script Tutorial

This tutorial has been prepared for beginners to understand the basic concepts of Batch Script. Prerequisites. A reasonable knowledge of computer programming ...

Difference Between a Batch and an Epoch in a Neural Network

When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent.

Beginners Guides: Understanding and Creating Batch Files

At their simplest, batch files are text files which execute one or more command prompt commands in a specific order. The power of...

Batch normalization in 3 levels of understanding

An updated explanation of Batch Normalization through 3 levels of understanding : in 30 seconds, 3 minutes, and a comprehensive guide ...