Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to provide class weights for an imbalanced datasets?

See original GitHub issue

Hi,

I have a fairly imbalanced dataset. I am using this to learn a feature and am replying on dependency learning to get better results, i.e. as explained here, I have a coarse feature, who identification helps in identifying a fine feature.

Below is my model configuration for the above goal:

input_features:
    -
        name: nhl
        type: sequence
        encoder: rnn
        cell_type: lstm
        num_layers: 4
        reduce_output: null

output_features:
    -
        name: mode
        type: category
        num_fc_layers: 2
    -
        name: volpiano
        type: sequence
        decoder: generator
        cell_type: lstm
        attention: bahdanau
        num_fc_layers: 1
        dependencies:
            - mode
        loss:
            type: sampled_softmax_cross_entropy

The problem is (or at least in my opinion) that the mode output on which the volpiano output depends is very imabalanced. Below is the distribution of this feature:

image_2021-05-30_021653

As can be seen mode 8, 1 and 7 are much better represented than the other categories in the dataset.

Is there any way to perform weighted learning to reduce this class imbalance? I found this issue discussing this as well: https://github.com/ludwig-ai/ludwig/issues/615 and know that I can use class_weights to achieve this… but am not very use how to do this?

How do I find the class_weights values? and if I have this many classes, how can I be sure which weight to associate with which? I mean if I write: class_wrights: [8, 7, 6, 6, 3, ....] how can I be sure which weight is associated with which label?

Thanks

Issue Analytics

State:
Created 2 years ago
Comments:7

Top GitHub Comments

1reaction

w4nderlustcommented, May 31, 2021

Hi @farazk86 , you may be right and the dealing with the class imbalance of the mode can potentially improve your results.

My suggestions in that case would be to do either of two things:

use oversampling on the training set for obtaining more datapoints belonging to rare classes. This you should do outside Ludwig. You can make all classes equal in count for instance, or you may just oversample a bit without reaching parity. One issue with this is that in your case there is also another target, and so you may be influencing the distribution of the other target in unexpected ways.
1. as you noticed you can use the class_weights parameter. Regarding the association, you can provide them in the same order, from more to least frequent class, that Ludwig figures out when mapping string to integer. To recover it check the training_set_metadata.json file, which contains idx2str. An alternative is to provide a dictionary instead, like: {"class_1": 2, "class_2": 0.3, ...} so that the mapping is explicit. Regarding figuring out what those values should actually be, there’s no exact way. If you want to compensate for the long tail distribution, you could assign a min(frequencies)/class_frequency weight for each class, but it maybe a bit too strong for very frequent classes. I would say that ideally what you want to do though is to give smaller weights to more frequent classes.

Hopefully this helps!

0reactions

connor-mccormcommented, Jul 28, 2022

Closing this issue since original issue is resolved.

Top Results From Across the Web

How To Dealing With Imbalanced Classes in Machine Learning

The difference in weights will influence the classification of the classes during the training phase. The whole purpose is to penalize the ...

How to set class weights for imbalanced classes in Keras?

Let's import the module first from sklearn.utils import class_weight · In order to calculate the class weight do the following class_weights = class_weight....

Dealing with Imbalanced Data in TensorFlow: Class Weights

First, I will present you with a way to generate class weights from your dataset and next how to use them in both...

Handling imbalanced data with class weights in logistic ...

The imbalance of class weights accounts for faulty predictions and false interpretations from the model. So it is very important to balance the ......

How to deal with Class Imbalance in Python - Data Analytics

Using class weight: Using class weight is a common method used to address the class imbalance in machine learning models. Class imbalance occurs ......