Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error in prepared DataLoader with BatchSampler

See original GitHub issue

System Info

accelerate: 0.12.0
OS: Linux 5.4.188+ (Colab)
Python: 3.7.13
numpy: 1.21.6
torch: 1.12.1+cu113
config: 1 CPU

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

MRE : https://colab.research.google.com/drive/17krCJCF_nWtNFSiMBo3oz12l7eX1bBZ6

First of all, thanks for this library and the great docs and examples that comes with it 😄!

I am using a custom torch Dataset that contains a Hugging Face Dataset (pyarrow) instance. Therefore, as indicated in the Datasets docs (https://huggingface.co/docs/datasets/v2.4.0/en/use_with_pytorch#use-a-batchsampler), I tried to use a BatchSampler to reduce the number of queries. However, I have not been able yet to make it work yet with accelerate.

I tried many different possibilities, one of which works one CPU or one GPU, but gets stuck when using distributed training.

Thanks for your help!

Issue Analytics

State:
Created a year ago
Comments:9 (4 by maintainers)

Top GitHub Comments

2reactions

etiennebeaulaccommented, Sep 16, 2022

Everything seems to work great! Thank you so much @pacman100 and @sgugger 😄

1reaction

sguggercommented, Sep 8, 2022

Oh ok. I did not know about this use of a sampler as a batch sampler. It is indeed not supported by Accelerate. Will have a look on how to add support when I have a bit of time.

Top Results From Across the Web

Customizing the batch with specific elements - pytorch

1 Answer 1 · Default DataLoader only uses a sampler, not a batch sampler. · You can define a sampler, plus a batch...

Strange error when parsing JSON in multiple worker data loader

Hi, we have enabled the multi worker data loader to load 10K+ training data files, the speed is pretty good with multiple workers,...

Source code for gluoncv.data.dataloader

DataLoader ` with batchify functions listed in `gluoncv.data.batchify` directly. It loads data batches ... BatchSampler A sampler that returns mini-batches.

"ImportError: cannot import name 'BatchSampler'" - Part 1 (2018)

The error I'm getting is “ImportError: cannot import name ... import BatchSampler, DataLoader, Dataset, Sampler, TensorDataset

Inside Hugging Face's Accelerate! – Weights & Biases - Wandb

This process is error-prone and time consuming, especially if you're doing it ... Prepare the objects such as dataloader, optimizer & model: ...