How to speed up the data loader?
See original GitHub issue❓ Questions and Help
I followed the Tutorials to use custom datasets(DeepFashion 2). But I found it very slow when run trainer = DefaultTrainer(cfg)
. It seems that it takes much time in loading datas, maybe 20-30mins. I changed cfg.DATALOADER.NUM_WORKERS
from 1 to 4 to 12, it didn’t help(my cpu have 12 cores), results shows even worse. So how to speed up the data loader?
I write a script to test loading data with multi-thread, finding it only takes 4-5mins. Is it pytorch problems?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:4
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Tricks to Speed Up Data Loading with PyTorch - gists · GitHub
use Numpy Memmap to load array and say goodbye to HDF5. I used to relay on HDF5 to read/write data, especially when loading...
Read more >python 3.x - PyTorch: Speed up data loading - Stack Overflow
Basically you load data for the next iteration when your model trains. torch.utils.data.DataLoader does provide it, though there are some ...
Read more >How to speed up the data loader - vision - PyTorch Forums
Convert each image into .bmp format instead of .jpg . Then use your original loader to load the .bmp format which will decompress...
Read more >How to speed up Pytorch training - Medium
How to speed up Pytorch training · import time import multiprocessing · use_cuda = torch.cuda. · def loading_time(num_workers, pin_memory): kwargs = {'num_workers' ...
Read more >Better Data Loading: 20x PyTorch Speed-Up for Tabular Data
Properly exploiting properties of tabular data allows significant speedups of PyTorch training. Here's an easy way to speed up training 20x.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The original code that loads COCO dataset is fast enough for training builtin models.
If you found the dataloader slow for your dataset, then the reason could be in your dataset, your custom dataloader, your machine. Without any details provided about what you did or what you observed, we cannot give a valid response. So closing.
Maybe you read image matrix in custom datasets. In my case, I just write image file name in custom instead of image itself. It just loads slow but costs little memory. And I think the reasons why it runs slowly is that there’re too many small config files. I write a script to convert my datasets to coco format config (just one file). Now it run much faster than before.