Support for data incremental scenario
See original GitHub issueWe need to support the data incremental scenario with no task labels, no task boundaries. The problem is that experiences always have boundaries between them. After a bit of discussion I think there are two possible solutions:
1 - single experience, where the entire scenario is a single Experience
carefully loaded by a data loader that does not reuse data.
2 - a separate experience for each mini-batch, where boundaries happen at the mini-batch level, which makes each mini-batch independent.
Personally, I prefer option (2). It seems natural to me to represent the passage of time using the stream of experiences. When you have a stream of experiences, it’s clear that you either store them in a buffer or you will not encounter anymore. Option (1) is possible, and maybe slightly more efficient, but it’s more error-prone. If you don’t use a proper data loader it’s not data incremental anymore. If you have multiple epoch it’s not data incremental. And so on.
@lrzpellegrini what do you think? Is it possible to have such a large number of experiences?
@Mattdl what do you think of this solution?
See also previous discussion.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6 (1 by maintainers)
Top GitHub Comments
I don’t think we need this if we already support data incremental.
Since you agree that there are no limitations for creating a large number of experiences (especially in a lazy way, which may be useful), I think we only need to provide some examples of scenarios already implemented and the documentation at all levels (examples, tutorials, api). The main problem that I see right now is that most of the end users may not clearly see how avalanche streams map to the common benchmarks in the literature.
Totally agree. This is another point that we should stress in the documentation.
Having an experience for each mini-batch (2) is totally doable. It shouldn’t add too much overhead.
A custom scenario can already generate the required experiences if correctly implemented, even experiences containing 1 single pattern. Also, experiences are now created on the fly in a lazy way and the underlying dataset is generated based on lists of indexes, so this is why expect little to no overhead.
The point is, do we need some other way to create mini-batch experiences out of initial experiences?
Experience
class to allow the creation of sub-experiences of a certain size (or based on other criteria) on the fly (maybe in the very first step of the strategy). However, in that case we would need to rethink thecurrent_experience
field in theExperience
class. Also, this would really entangle the scenario definition with the strategy.current_experience
field of theExperience
class would make sense.I’m not sure that (1) would be a great idea: it makes things complicated in my opinion. The good thing about having experiences is that they clearly define “a bunch of things you won’t able to see again”… I don’t really like idea that the order of things should be managed at the DataLoader level.