Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for data incremental scenario

See original GitHub issue

We need to support the data incremental scenario with no task labels, no task boundaries. The problem is that experiences always have boundaries between them. After a bit of discussion I think there are two possible solutions:

1 - single experience, where the entire scenario is a single Experience carefully loaded by a data loader that does not reuse data. 2 - a separate experience for each mini-batch, where boundaries happen at the mini-batch level, which makes each mini-batch independent.

Personally, I prefer option (2). It seems natural to me to represent the passage of time using the stream of experiences. When you have a stream of experiences, it’s clear that you either store them in a buffer or you will not encounter anymore. Option (1) is possible, and maybe slightly more efficient, but it’s more error-prone. If you don’t use a proper data loader it’s not data incremental anymore. If you have multiple epoch it’s not data incremental. And so on.

@lrzpellegrini what do you think? Is it possible to have such a large number of experiences?

@Mattdl what do you think of this solution?

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

AntonioCartacommented, Mar 5, 2021

The point is, do we need some other way to create mini-batch experiences out of initial experiences?

I don’t think we need this if we already support data incremental.

Since you agree that there are no limitations for creating a large number of experiences (especially in a lazy way, which may be useful), I think we only need to provide some examples of scenarios already implemented and the documentation at all levels (examples, tutorials, api). The main problem that I see right now is that most of the end users may not clearly see how avalanche streams map to the common benchmarks in the literature.

The good thing about having experiences is that they clearly define “a bunch of things you won’t able to see again”

Totally agree. This is another point that we should stress in the documentation.

1reaction

lrzpellegrinicommented, Mar 5, 2021

Having an experience for each mini-batch (2) is totally doable. It shouldn’t add too much overhead.

A custom scenario can already generate the required experiences if correctly implemented, even experiences containing 1 single pattern. Also, experiences are now created on the fly in a lazy way and the underlying dataset is generated based on lists of indexes, so this is why expect little to no overhead.

The point is, do we need some other way to create mini-batch experiences out of initial experiences?

It should be doable to add a “split” method to the Experience class to allow the creation of sub-experiences of a certain size (or based on other criteria) on the fly (maybe in the very first step of the strategy). However, in that case we would need to rethink the current_experience field in the Experience class. Also, this would really entangle the scenario definition with the strategy.
Another solution would be implementing the “split” method at a stream level, which would allow the stream to output pre-split experiences, thus changing the semantic of the scenario itself (which, if I got the point of the discussion, is the right thing to do). I think that this solution is better because 1) The definition of the scenario would be completely handled outside the strategy, possibly inside the scenario generator as it should be; 2) The current_experience field of the Experience class would make sense.

I’m not sure that (1) would be a great idea: it makes things complicated in my opinion. The good thing about having experiences is that they clearly define “a bunch of things you won’t able to see again”… I don’t really like idea that the order of things should be managed at the DataLoader level.