Faster Tabular Dataloader
See original GitHub issueFeature request
What is the expected behavior?
Load data more faster than use def __getitem__(self, index): x, y = self.x[index], self.y[index] return x, y
What is the motivation or use case for adding/changing the behavior? Fix the CPU to GPU data-loading bottleneck.
How should this be implemented in your opinion? Already have an implemented. https://github.com/hcarlens/pytorch-tabular/blob/master/fast_tensor_data_loader.py
Are you willing to work on this yourself? No
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Top Results From Across the Web
Better Data Loading: 20x PyTorch Speed-Up for Tabular Data
Properly exploiting properties of tabular data allows significant speedups of PyTorch training. Here's an easy way to speed up training 20x.
Read more >Faster FastAI Tabular Deep Learning | Kaggle
In this notebook, we show how to speed-up training of tabular deep learning models by ~2x with FastAI library using a customized NVTabular...
Read more >Tabular data - fastai
Helper functions to get data in a DataLoaders in the tabular application and higher class TabularDataLoaders.
Read more >Faster GPU-based Feature Engineering and Tabular Deep ...
In Faster FastAI Tabular Deep Learning, we show how to speed-up FastAI tabular deep learning models by ~2x using NVTabular data loader.
Read more >Merlin Dataloader is 119x faster than my own PyTorch Dataset ...
Merlin Dataloader is 119x faster than my own PyTorch Dataset + Dataloader combo! This is revolutionary for tabular data Let's take a closer...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I didn’t test with the Tabnet, but with some small models, it’s indeed can speed up 10~20 times. Some other people’s benchmark: https://towardsdatascience.com/better-data-loading-20x-pytorch-speed-up-for-tabular-data-e264b9e34352
@Optimox Just curious, what is the size of the dataset being tested? Is it possible that the size is too small so that we can’t see the improvement?