Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Faster Tabular Dataloader

See original GitHub issue

Feature request

What is the expected behavior? Load data more faster than use def __getitem__(self, index): x, y = self.x[index], self.y[index] return x, y

What is the motivation or use case for adding/changing the behavior? Fix the CPU to GPU data-loading bottleneck.

How should this be implemented in your opinion? Already have an implemented. https://github.com/hcarlens/pytorch-tabular/blob/master/fast_tensor_data_loader.py

Are you willing to work on this yourself? No

Issue Analytics

State:
Created 3 years ago
Comments:7

Top GitHub Comments

1reaction

ghostcommented, Jan 1, 2021

I didn’t test with the Tabnet, but with some small models, it’s indeed can speed up 10~20 times. Some other people’s benchmark: https://towardsdatascience.com/better-data-loading-20x-pytorch-speed-up-for-tabular-data-e264b9e34352

0reactions

tim5gocommented, Jul 7, 2021

@Optimox Just curious, what is the size of the dataset being tested? Is it possible that the size is too small so that we can’t see the improvement?

Top Results From Across the Web

Better Data Loading: 20x PyTorch Speed-Up for Tabular Data

Properly exploiting properties of tabular data allows significant speedups of PyTorch training. Here's an easy way to speed up training 20x.

Faster FastAI Tabular Deep Learning | Kaggle

In this notebook, we show how to speed-up training of tabular deep learning models by ~2x with FastAI library using a customized NVTabular...

Tabular data - fastai

Helper functions to get data in a DataLoaders in the tabular application and higher class TabularDataLoaders.

Faster GPU-based Feature Engineering and Tabular Deep ...

In Faster FastAI Tabular Deep Learning, we show how to speed-up FastAI tabular deep learning models by ~2x using NVTabular data loader.

Merlin Dataloader is 119x faster than my own PyTorch Dataset ...

Merlin Dataloader is 119x faster than my own PyTorch Dataset + Dataloader combo! This is revolutionary for tabular data Let's take a closer...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Faster Tabular Dataloader

Feature request

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Good example of tabnet on a regression problem

scale = torch.sqrt(torch.FloatTensor([0.5]).to(x.device)) RuntimeError: CUDA error: device-side assert triggered