Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for multiple data sources

See original GitHub issue

Idea is to enhance Engine to be able to run on multiple data sources:

data1 = ...
data2 = ...
data3 = ...
...

def process_function(engine, batches):
    batch1 = batches['data1']
    batch2 = batches['data2']
    batch3 = batches['data3']
    ...

engine = Engine(process_function)
engine.run({"data1": data1, "data2": data2, "data3": data3}, max_epochs=10)

Issue Analytics

State:
Created 4 years ago
Comments:12 (8 by maintainers)

Top GitHub Comments

4reactions

amatsukawacommented, Jun 17, 2020

I randomly saw this issue while checking in on the 0.4 release. I also feel that this should be left up to the datasets. It’s not immediately clear what the semantics are if the iterators are not the same length.

Perhaps this is not the right venue for this, but I just want to offer my unsolicited $.02 on features such as this one.

IMHO the strength of ignite over other packages such as pytorch-lightening is that the engine and event system is extremely simple. It allows one to inject logic anywhere in the loop as one pleases, and it’s very obvious what will happen. The reason why I don’t like pytorch lightening as much is exactly because it’s filled with features like this one here. It’s got a huge API, and has a lot of logic along the lines of “if you want to do this, then use this API and use a None for that argument but pass a dict for this other argument, etc etc”. What I love about ignite is that basically everything except the engine, even very common things like model checkpointing, is a “plugin” to be registered. If I want the model checkpointing to write 10 copies of the same file, I am fully empowered to write a handler to do so. I’m not sure how that would be done in other frameworks. (obviously a contrived example to trying to make a point, and to avoid the argument of “actually, in lightening you can do xyz”)

I would personally implore you to ruthlessly keep the Engine from becoming complicated. For instance, the determinism changes added to the engine caused some roadblocks for me (#935, #941).

I absolutely understand the desire to cover more use cases for users, and you have done an amazing job so far. However, when confronted with a design decision exactly like this one, IMHO a solution that can be implemented without additions to the engine should be chosen. (eg. by providing the Dataset and DataLoader implementations under a ignite.data or ignite.contrib. A similar argument can be made about the determinism stuff: subclasses of Engine and optional datasets should be preferred.

Thanks for your continued work on an amazing library 😃

2reactions

amatsukawacommented, Jun 17, 2020

@vfdev-5 yes, by the looks of it, it does seem to solve our problems. That’s why I was checking up on the official 0.4 release when I came across this in one fo the kanban boards 😃

I have yet to actually try it, but if I can find some time next week, I will install the rc and give it a go.