question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

composing dataloaders

See original GitHub issue

Hi,

I have a datafetcher where I use 2 dataloaders in sequence: the first to translate from 1 ID to another, the second to fetch data corresponding to the second ID.

loader1.load(id1).thenCompose(id2 -> loader2.load(id2))

This hangs because dispatchAll() is not called again after loader1 completes. I can work around that by adding that call inside the thenCompose() lambda but then it is called for every id2 which is ugly at the very least.

Is there a better way of doing this?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:17 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
bbakermancommented, Nov 12, 2021

This works to unstuck nested data loaders, at the cost of naively triggering dispatches too early for long running data loading tasks.

This is pretty much how the JS tick works for JavaScript data loaders. They can dispatch too early a well but never miss composed loaders because eventually control is passed back and tick will happen.

One thing I will say about the above us - since DataLoaders are per request, your scheduler Queue will grow to to the size of the number of concurrent requests * the number of dataloaders per request.

It’s good that you have a removeRegistry because otherwise this would get unwieldy quick with enough load

1reaction
MartinDevillerscommented, Nov 12, 2021

I’ve run into the same limitation and ScheduledDataLoaderRegistry didn’t work for me. I think ScheduledDataLoaderRegistry serves a different use case: to make the overall dispatching strategy less eager by pushing dispatch attempts into the future. This still relies on dispatchAll to be called first, which doesn’t happen in the scenario with nested loaders.

So my ugly hack current approach is to have a separate scheduled task periodically check all inflight data loaders and forcefully dispatch them if they haven’t been dispatched within a preset time window (e.g. 500ms). This works to unstuck nested data loaders, at the cost of naively triggering dispatches too early for long running data loading tasks. I am not sure what the implications of those are, but my API has been working fine so far so I’m happy 😎

@Component
@Slf4j
public class ScheduledDataLoaderDispatcher {

    Queue<DataLoaderRegistry> globalRegistries = new ConcurrentLinkedQueue<>();
    Duration timeToDispatch;

    public ScheduledDataLoaderDispatcher(@Value("${app.dataLoader.timeToDispatch:500}") Integer timeToDispatch) {
        this.timeToDispatch = Duration.ofMillis(timeToDispatch);
    }

    public void addRegistry(DataLoaderRegistry dataLoaderRegistry) {
        globalRegistries.add(dataLoaderRegistry);
    }

    public void removeRegistry(DataLoaderRegistry dataLoaderRegistry) {
        globalRegistries.remove(dataLoaderRegistry);
    }

    @Scheduled(fixedRateString = "${app.dataLoader.dispatchTickRate:100}")
    public void dispatchAll() {
        globalRegistries.stream()
                .map(DataLoaderRegistry::getDataLoaders)
                .flatMap(Collection::stream)
                .filter(this::isDispatchNeeded)
                .forEach(DataLoader::dispatch);
    }

    private boolean isDispatchNeeded(DataLoader dataLoader) {
        return timeToDispatch.compareTo(dataLoader.getTimeSinceDispatch()) < 0;
    }
}
Read more comments on GitHub >

github_iconTop Results From Across the Web

Writing Custom Datasets, DataLoaders and Transforms
Writing Custom Datasets, DataLoaders and Transforms · Dataset class · Transforms · Iterating through the dataset · Afterword: torchvision · Docs · Tutorials....
Read more >
DataLoaders - Composer
DataLoaders are used to pass in training or evaluation data to the Composer Trainer. There are three different ways of doing so: Passing...
Read more >
Complete Guide to the DataLoader Class in PyTorch
This post covers the PyTorch dataloader class. We'll show how to load built-in and custom datasets in PyTorch, plus how to transform and...
Read more >
Writing a Dataloader for a custom Dataset (Neural Network) in ...
Writing a Dataloader for a custom Dataset (Neural Network) in Pytorch. This blog is for programmers who have seen how Dataloaders are used ......
Read more >
An Introduction to Datasets and DataLoader in PyTorch - Wandb
For most cases, we can get away by writing some key functions.. Implementing A Custom Dataset In PyTorch. Now ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found