Could `join` replace `gather_for_metrics` and perform it automatically?
See original GitHub issueHi,
Great job with accelerate!
One persistent headache that I keep experiencing is the repeated samples during distributed evaluation. Although there is the gather_for_metrics
functionality this doesn’t always work depending on the output of the model; for example, Faster-RCNN in Torchvision, which outputs a list of dictionaries containing tensors.
I was wondering if it would be possible to remove the repeated behaviour completely, using something like the join context manager, which enables training on uneven outputs.
If there is appetite for this, I would be happy to help you explore options.
Thanks
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:14 (10 by maintainers)
Top Results From Across the Web
Quick tour - Hugging Face
You can perform regular evaluation in your training script, ... the gather_for_metrics() method to automatically remove the duplicated data while gathering.
Read more >Can we replace right join with left join - YouTube
Why is right join required when we have left join | use of right join | why use ... SQL Server Performance Tuning...
Read more >发布 · mirrors / huggingface / pytorch-pretrained-bert - GitCode
Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.
Read more >A simple way to train and use PyTorch models with multi-GPU ...
This will generate a config file that will be used automatically to ... around Join to enable training with uneven inputs when using...
Read more >A simple way to train and use NLP models with multi-GPU ...
As you can see on this example, by adding 5-lines to any standard ... This will generate a config file that will be...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Accelerate was built when there was no join contextmanager and the fact each datalaoder returns the same number of samples in all processes is pretty ingrained in the library. We could make this evolve in the future with a major release, but that would mean a lot of changes (and breaking functionality).
For now I’d rather investigate why your use case is not supported as is (all methods should support a list of dictionaries) rather than rewrite the whole library.
@Chris-hughes10 This is mostly used in the whole evaluation part of Accelerate (so
gather
andgather_for_metrics
) as for training we don’t really care (plus the dataloaders very often havedrop_last=True
during training so there is no problem there).I’m open to start exploring a different way to go, probably with a new flag in the accelerator. The first thing would be to have the batch sampler we have handle a non-fixed batch size (which would also be useful for training), then add this flag where we would not cycle through the dataset but return different lengths on different processes and finally add a wrapper around join.
Does that sound reasonable? If so, I can start a project summarizing the steps and we can share the work.