Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for data incremental scenario

See original GitHub issue

We need to support the data incremental scenario with no task labels, no task boundaries. The problem is that experiences always have boundaries between them. After a bit of discussion I think there are two possible solutions:

1 - single experience, where the entire scenario is a single Experience carefully loaded by a data loader that does not reuse data. 2 - a separate experience for each mini-batch, where boundaries happen at the mini-batch level, which makes each mini-batch independent.

Personally, I prefer option (2). It seems natural to me to represent the passage of time using the stream of experiences. When you have a stream of experiences, it’s clear that you either store them in a buffer or you will not encounter anymore. Option (1) is possible, and maybe slightly more efficient, but it’s more error-prone. If you don’t use a proper data loader it’s not data incremental anymore. If you have multiple epoch it’s not data incremental. And so on.

@lrzpellegrini what do you think? Is it possible to have such a large number of experiences?

@Mattdl what do you think of this solution?

See also previous discussion.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

AntonioCartacommented, Mar 5, 2021

The point is, do we need some other way to create mini-batch experiences out of initial experiences?

I don’t think we need this if we already support data incremental.

Since you agree that there are no limitations for creating a large number of experiences (especially in a lazy way, which may be useful), I think we only need to provide some examples of scenarios already implemented and the documentation at all levels (examples, tutorials, api). The main problem that I see right now is that most of the end users may not clearly see how avalanche streams map to the common benchmarks in the literature.

The good thing about having experiences is that they clearly define “a bunch of things you won’t able to see again”

Totally agree. This is another point that we should stress in the documentation.

lrzpellegrinicommented, Mar 5, 2021

Having an experience for each mini-batch (2) is totally doable. It shouldn’t add too much overhead.

A custom scenario can already generate the required experiences if correctly implemented, even experiences containing 1 single pattern. Also, experiences are now created on the fly in a lazy way and the underlying dataset is generated based on lists of indexes, so this is why expect little to no overhead.

The point is, do we need some other way to create mini-batch experiences out of initial experiences?

  • It should be doable to add a “split” method to the Experience class to allow the creation of sub-experiences of a certain size (or based on other criteria) on the fly (maybe in the very first step of the strategy). However, in that case we would need to rethink the current_experience field in the Experience class. Also, this would really entangle the scenario definition with the strategy.
  • Another solution would be implementing the “split” method at a stream level, which would allow the stream to output pre-split experiences, thus changing the semantic of the scenario itself (which, if I got the point of the discussion, is the right thing to do). I think that this solution is better because 1) The definition of the scenario would be completely handled outside the strategy, possibly inside the scenario generator as it should be; 2) The current_experience field of the Experience class would make sense.

I’m not sure that (1) would be a great idea: it makes things complicated in my opinion. The good thing about having experiences is that they clearly define “a bunch of things you won’t able to see again”… I don’t really like idea that the order of things should be managed at the DataLoader level.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Incremental Data Loading - Gooddata Growth
An incremental data load is a method of updating the dataset in which only new or modified records are uploaded to the project....
Read more >
The What, Why, When, and How of Incremental Loads
What is an incremental data load and why is it important? In this post, we review the merits of using incremental loads in...
Read more >
Incremental refresh for datasets and real-time data in Power BI
In fact, the process and partitions created from it are not even visible in the service. In most cases, a well-defined incremental refresh ......
Read more >
Incremental Loading: The Smarter Way to Update Data
It can help you load the data from your data sources correctly by loading your table properly and even scaling your peak load...
Read more >
Building Efficient Data Pipelines With Incremental Updates
A key feature of an efficient data pipeline is the ability to incrementally update. The alternative to incremental updates is to sync entire ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found