Error in prepared DataLoader with BatchSampler
See original GitHub issueSystem Info
accelerate: 0.12.0
OS: Linux 5.4.188+ (Colab)
Python: 3.7.13
numpy: 1.21.6
torch: 1.12.1+cu113
config: 1 CPU
Information
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - My own task or dataset (give details below)
Reproduction
MRE : https://colab.research.google.com/drive/17krCJCF_nWtNFSiMBo3oz12l7eX1bBZ6
First of all, thanks for this library and the great docs and examples that comes with it 😄!
I am using a custom torch Dataset that contains a Hugging Face Dataset (pyarrow) instance. Therefore, as indicated in the Datasets docs (https://huggingface.co/docs/datasets/v2.4.0/en/use_with_pytorch#use-a-batchsampler), I tried to use a BatchSampler to reduce the number of queries. However, I have not been able yet to make it work yet with accelerate.
I tried many different possibilities, one of which works one CPU or one GPU, but gets stuck when using distributed training.
Thanks for your help!
Issue Analytics
- State:
- Created a year ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Customizing the batch with specific elements - pytorch
1 Answer 1 · Default DataLoader only uses a sampler, not a batch sampler. · You can define a sampler, plus a batch...
Read more >Strange error when parsing JSON in multiple worker data loader
Hi, we have enabled the multi worker data loader to load 10K+ training data files, the speed is pretty good with multiple workers,...
Read more >Source code for gluoncv.data.dataloader
DataLoader ` with batchify functions listed in `gluoncv.data.batchify` directly. It loads data batches ... BatchSampler A sampler that returns mini-batches.
Read more >"ImportError: cannot import name 'BatchSampler'" - Part 1 (2018)
The error I'm getting is “ImportError: cannot import name ... import BatchSampler, DataLoader, Dataset, Sampler, TensorDataset
Read more >Inside Hugging Face's Accelerate! – Weights & Biases - Wandb
This process is error-prone and time consuming, especially if you're doing it ... Prepare the objects such as dataloader, optimizer & model: ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Everything seems to work great! Thank you so much @pacman100 and @sgugger 😄
Oh ok. I did not know about this use of a sampler as a batch sampler. It is indeed not supported by Accelerate. Will have a look on how to add support when I have a bit of time.