Rethinking generators
See original GitHub issueIn constructing examples for the new docs, I’m coming to the conclusion that our use of generators instead of lists is kind of a UI nightmare. It’s pretty much impossible for a naive user (or even me in some cases) to know what kind of iterable they’re getting back, which makes inspection of the resulting objects much more cumbersome than it should be. Basically, unless you’re planning to go all the way to a pandas DataFrame
with pliers code (e.g, in a Graph
), things get hairy pretty quickly.
It’s clearly not an optimal solution to just return lists everywhere, since almost anything meaningful one might want to do with, e.g., a large movie, could potentially result in crazy memory use (and this was what prompted me to use generators in the first place). But I think that’s at least an easily understandable problem from the user’s perspective–i.e., it wouldn’t take much for a user to write an outer loop around VideoFrameStim
s and save the results to file in batches.
Perhaps the right approach is to build batching–and possibly file-writing–functionality into the Graph
. That way, when users initialize a Graph
, they just have to specify a batch size, file store, etc. We could potentially use HDF5 to store intermediate results if needed.
In any case, I don’t think we should hold up the former change (dropping generators, at least by default) until we have a scheme along the latter lines figured out. We should probably make it a high priority to just use lists for now (and maybe have a config setting that enables them if the user really knows what they’re doing).
Issue Analytics
- State:
- Created 6 years ago
- Comments:5
From a naive user perspective, the main problem I see is that generators will be returned any time iteration is involved–and that can violate expectations in odd ways. Consider these two calls:
Out of the box in pliers, the first call returns another
Stim
instance, while the second returns[<generator>]
. I think this is really unintuitive for naive users (and probably also for many non-naive users). But I don’t see any good way to avoid it so long as we want to use generators internally. So probably best to default to always returning lists, while still allowing power users to unmask the generator usage that’s already going on under the hood.Yep!