question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Understanding batch_T

See original GitHub issue

Hi, batch_T (int) – number of time-steps per sample batch I don’t understand the effect of batch_T in samplers. I see another batch_T in R2D1 too. So what is the difference? What is the relation between them and how we should set these two values? And also batch_B values for R2D1 and its sampler?

I want to understand the effect of this parameter, batch_T, especially in recurrent algos such as R2D1 and PPO_LSTM. Does it affect the memory/history information that the LSTM can learn/memorize? Based on the code, the agent uses a trajectory of size batch_T to train LSTM, so it can limit the time horizon the network can memorize info. So it should be set to average trajectory size of the env, based on each env, right?

Thank you

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
astookecommented, Apr 20, 2020

Hi, good questions!

To clarify some earlier questions…in the policy gradient algorithms, like PPO, there is only the sampler’s batch_T and batch_B, and then whatever the sampler returns in one iteration forms the minibatch for the algorithm. In replay-based algorithms like DQN, there is still the batch_T and batch_B of the sampler, which keep the same meaning as the amount of data collected per iteration. But these algorithms also have their own batch_size–or in the case of R2D1 batch_T and batch_B–to determine how much data is replayed from the buffer for each training minibatch.

Regarding done=True for multiple timesteps, yes that is because when and environment episode ends during sampling, the environment might not reset until the beginning of the following sampling batch, so that the start of an episode aligns with the interval for storing the RNN state. But in the meantime, all the (dummy) data from the inactive environment still gets written to the replay buffer. Populating done=True for all those steps makes it obvious where the new episode actually begins in the buffer, which is the first new step where done=False. And if you look at the valid_from_done() function which generates the mask for the RNN, it masks out all data after the first done=True, so it’s ok to have more done=True after that. Kind of a long explanation, but does that make sense?

@bmazoure The discrepancy between the length of the observations returned is because it also includes the target observations, which extend out to n steps past the agent observations, for n-step returns: https://github.com/astooke/rlpyt/blob/668290d1ca94e9d193388a599d4f719bc3a23fba/rlpyt/replays/sequence/n_step.py#L88
Then inside the R2D1 algorithm it moves the one copy of the whole observation set to the GPU once, and then creates sliced views to this data for the agent inputs and target inputs. R2D1 default n_step_return is 5, so that should add up. Sorry that’s a tricky one!

2reactions
astookecommented, Apr 27, 2020

Hi! That is correct, the environment state carries forward to the next sampling batch. The environment only resets when an episode finishes, even if the sampler batch_T is much shorter than this. So the sampler’s batch_T should have small-to-no effect on training, whereas the algorithm’s batch_T can have a large effect, because this the is the length of LSTM backprop-through-time for training. Hope that helps!

Read more comments on GitHub >

github_iconTop Results From Across the Web

[1806.02375] Understanding Batch Normalization - arXiv
Abstract: Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks.
Read more >
Batch Script Tutorial
This tutorial has been prepared for beginners to understand the basic concepts of Batch Script. Prerequisites. A reasonable knowledge of computer programming ...
Read more >
Difference Between a Batch and an Epoch in a Neural Network
When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent.
Read more >
Beginners Guides: Understanding and Creating Batch Files
At their simplest, batch files are text files which execute one or more command prompt commands in a specific order. The power of...
Read more >
Batch normalization in 3 levels of understanding
An updated explanation of Batch Normalization through 3 levels of understanding : in 30 seconds, 3 minutes, and a comprehensive guide ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found