question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Faster Tabular Dataloader

See original GitHub issue

Feature request

What is the expected behavior? Load data more faster than use def __getitem__(self, index): x, y = self.x[index], self.y[index] return x, y

What is the motivation or use case for adding/changing the behavior? Fix the CPU to GPU data-loading bottleneck.

How should this be implemented in your opinion? Already have an implemented. https://github.com/hcarlens/pytorch-tabular/blob/master/fast_tensor_data_loader.py

Are you willing to work on this yourself? No

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
ghostcommented, Jan 1, 2021

I didn’t test with the Tabnet, but with some small models, it’s indeed can speed up 10~20 times. Some other people’s benchmark: https://towardsdatascience.com/better-data-loading-20x-pytorch-speed-up-for-tabular-data-e264b9e34352

0reactions
tim5gocommented, Jul 7, 2021

@Optimox Just curious, what is the size of the dataset being tested? Is it possible that the size is too small so that we can’t see the improvement?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Better Data Loading: 20x PyTorch Speed-Up for Tabular Data
Properly exploiting properties of tabular data allows significant speedups of PyTorch training. Here's an easy way to speed up training 20x.
Read more >
Faster FastAI Tabular Deep Learning | Kaggle
In this notebook, we show how to speed-up training of tabular deep learning models by ~2x with FastAI library using a customized NVTabular...
Read more >
Tabular data - fastai
Helper functions to get data in a DataLoaders in the tabular application and higher class TabularDataLoaders.
Read more >
Faster GPU-based Feature Engineering and Tabular Deep ...
In Faster FastAI Tabular Deep Learning, we show how to speed-up FastAI tabular deep learning models by ~2x using NVTabular data loader.
Read more >
Merlin Dataloader is 119x faster than my own PyTorch Dataset ...
Merlin Dataloader is 119x faster than my own PyTorch Dataset + Dataloader combo! This is revolutionary for tabular data Let's take a closer...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found