question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Need a data sampler for unevenly distributed labels

See original GitHub issue

We need a way to handle a dataset whose label distribution is highly skewed. For example, when we have 1000 positives and 100 negatives, we want to make sure each batch contains the same number of positives and negatives, oversampling negative examples 10 times more than positives.

Someone said pytorch has a way to do: Adding a sampler option for iterator as PyTorch can solve the problem I guess. http://pytorch.org/docs/data.html#torch.utils.data.DataLoader Source: https://chainer.slack.com/archives/C0LC5A6C9/p1497343348496751 Maybe it’s possible for chainer to have a wrapper for these pytoarch data preprocessing utilities. DataLorder is not involved in gradient computation so it should be easy and take much less time than implementing equivalent functions from scratch.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:3
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
hvycommented, May 17, 2018

Fyi, #3429 is now merged. It would be great if we could provide some sort of balanced sampler.

0reactions
stale[bot]commented, Apr 24, 2019

This issue is closed as announced. Feel free to re-open it if needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to deal with classification problem where labels are ...
The problem I am facing is that there are very high number of data sample which have B and C s output. They...
Read more >
A Gentle Introduction to Imbalanced Classification
Imbalanced classification is the problem of classification when there is an unequal distribution of classes in the training dataset.
Read more >
Having an Imbalanced Dataset? Here Is How You Can Fix It.
Data imbalance usually reflects an unequal distribution of classes within a dataset. For example, in a credit card fraud detection dataset, most of...
Read more >
Handling Data Imbalance in Multi-label Classification ...
So there are possibilities that a sample which contains minority label can also contain another label which is in majority so we also...
Read more >
Sampling data uniformly across labels - python
I want to sample n example the more uniformly across labels : the result of the sampling should have 100/k(1)% of n elements...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found