Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to speed up the data loader?

See original GitHub issue

❓ Questions and Help

I followed the Tutorials to use custom datasets(DeepFashion 2). But I found it very slow when run trainer = DefaultTrainer(cfg). It seems that it takes much time in loading datas, maybe 20-30mins. I changed cfg.DATALOADER.NUM_WORKERS from 1 to 4 to 12, it didn’t help(my cpu have 12 cores), results shows even worse. So how to speed up the data loader? I write a script to test loading data with multi-thread, finding it only takes 4-5mins. Is it pytorch problems?

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:5 (3 by maintainers)

Top GitHub Comments

12reactions

ppwwyyxxcommented, Feb 11, 2020

The original code that loads COCO dataset is fast enough for training builtin models.

If you found the dataloader slow for your dataset, then the reason could be in your dataset, your custom dataloader, your machine. Without any details provided about what you did or what you observed, we cannot give a valid response. So closing.

1reaction

invisprintscommented, Nov 16, 2019

Maybe you read image matrix in custom datasets. In my case, I just write image file name in custom instead of image itself. It just loads slow but costs little memory. And I think the reasons why it runs slowly is that there’re too many small config files. I write a script to convert my datasets to coco format config (just one file). Now it run much faster than before.

Top Results From Across the Web

Tricks to Speed Up Data Loading with PyTorch - gists · GitHub

use Numpy Memmap to load array and say goodbye to HDF5. I used to relay on HDF5 to read/write data, especially when loading...

python 3.x - PyTorch: Speed up data loading - Stack Overflow

Basically you load data for the next iteration when your model trains. torch.utils.data.DataLoader does provide it, though there are some ...

How to speed up the data loader - vision - PyTorch Forums

Convert each image into .bmp format instead of .jpg . Then use your original loader to load the .bmp format which will decompress...

How to speed up Pytorch training - Medium

How to speed up Pytorch training · import time import multiprocessing · use_cuda = torch.cuda. · def loading_time(num_workers, pin_memory): kwargs = {'num_workers' ...

Better Data Loading: 20x PyTorch Speed-Up for Tabular Data

Properly exploiting properties of tabular data allows significant speedups of PyTorch training. Here's an easy way to speed up training 20x.