question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to speed up the data loader?

See original GitHub issue

❓ Questions and Help

I followed the Tutorials to use custom datasets(DeepFashion 2). But I found it very slow when run trainer = DefaultTrainer(cfg). It seems that it takes much time in loading datas, maybe 20-30mins. I changed cfg.DATALOADER.NUM_WORKERS from 1 to 4 to 12, it didn’t help(my cpu have 12 cores), results shows even worse. So how to speed up the data loader? I write a script to test loading data with multi-thread, finding it only takes 4-5mins. Is it pytorch problems?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:4
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

12reactions
ppwwyyxxcommented, Feb 11, 2020

The original code that loads COCO dataset is fast enough for training builtin models.

If you found the dataloader slow for your dataset, then the reason could be in your dataset, your custom dataloader, your machine. Without any details provided about what you did or what you observed, we cannot give a valid response. So closing.

1reaction
invisprintscommented, Nov 16, 2019

Maybe you read image matrix in custom datasets. In my case, I just write image file name in custom instead of image itself. It just loads slow but costs little memory. And I think the reasons why it runs slowly is that there’re too many small config files. I write a script to convert my datasets to coco format config (just one file). Now it run much faster than before.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tricks to Speed Up Data Loading with PyTorch - gists · GitHub
use Numpy Memmap to load array and say goodbye to HDF5. I used to relay on HDF5 to read/write data, especially when loading...
Read more >
python 3.x - PyTorch: Speed up data loading - Stack Overflow
Basically you load data for the next iteration when your model trains. torch.utils.data.DataLoader does provide it, though there are some ...
Read more >
How to speed up the data loader - vision - PyTorch Forums
Convert each image into .bmp format instead of .jpg . Then use your original loader to load the .bmp format which will decompress...
Read more >
How to speed up Pytorch training - Medium
How to speed up Pytorch training · import time import multiprocessing · use_cuda = torch.cuda. · def loading_time(num_workers, pin_memory): kwargs = {'num_workers' ...
Read more >
Better Data Loading: 20x PyTorch Speed-Up for Tabular Data
Properly exploiting properties of tabular data allows significant speedups of PyTorch training. Here's an easy way to speed up training 20x.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found