Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Need a data sampler for unevenly distributed labels

See original GitHub issue

We need a way to handle a dataset whose label distribution is highly skewed. For example, when we have 1000 positives and 100 negatives, we want to make sure each batch contains the same number of positives and negatives, oversampling negative examples 10 times more than positives.

Someone said pytorch has a way to do: Adding a sampler option for iterator as PyTorch can solve the problem I guess. http://pytorch.org/docs/data.html#torch.utils.data.DataLoader Source: https://chainer.slack.com/archives/C0LC5A6C9/p1497343348496751 Maybe it’s possible for chainer to have a wrapper for these pytoarch data preprocessing utilities. DataLorder is not involved in gradient computation so it should be easy and take much less time than implementing equivalent functions from scratch.

Issue Analytics

State:
Created 6 years ago
Reactions:3
Comments:15 (7 by maintainers)

Top GitHub Comments

2reactions

hvycommented, May 17, 2018

Fyi, #3429 is now merged. It would be great if we could provide some sort of balanced sampler.

0reactions

stale[bot]commented, Apr 24, 2019

This issue is closed as announced. Feel free to re-open it if needed.

Top Results From Across the Web

How to deal with classification problem where labels are ...

The problem I am facing is that there are very high number of data sample which have B and C s output. They...

A Gentle Introduction to Imbalanced Classification

Imbalanced classification is the problem of classification when there is an unequal distribution of classes in the training dataset.

Having an Imbalanced Dataset? Here Is How You Can Fix It.

Data imbalance usually reflects an unequal distribution of classes within a dataset. For example, in a credit card fraud detection dataset, most of...

Handling Data Imbalance in Multi-label Classification ...

So there are possibilities that a sample which contains minority label can also contain another label which is in majority so we also...

Sampling data uniformly across labels - python

I want to sample n example the more uniformly across labels : the result of the sampling should have 100/k(1)% of n elements...