Which loss function works in multi-label classification task?
See original GitHub issueI need to train a multi-label classifier for text topic classification task. Having searched around the internet, I follow the suggestion to use sigmoid + binary_crossentropy. But I can’t get good results (i.e. subset accuracy) on the validation set although the loss is very small. After reading the source codes in Keras, I find out that the binary_crossentropy loss is implemented like this,
def binary_crossentropy(y_true, y_pred):
return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)
My doubt is whether it makes sense to use the average in the case of multi-label classification task. Suppose that the dimension of label set is 30, and each training sample has only two or three of the labels. Since most of the labels are zeros in the most of the samples, I guess this loss will encourage the classifier to predict a tiny probability in each output dimension.
Following the idea here, https://github.com/keras-team/keras/issues/2826, I also give a try to categorial_crossentropy but still have no such luck.
Any tips on choosing the loss function for multi-label classification task is beyond welcome. Thanks in advance.
Issue Analytics
- State:
- Created 5 years ago
- Comments:15

Top Related StackOverflow Question
For the multi-label classification, you can try tanh+hinge with {-1, 1} values in labels like (1, -1, -1, 1). Or sigmoid + hamming loss with {0, 1} values in labels like (1, 0, 0, 1). In my case, sigmoid + focal loss with {0, 1} values in labels like (1, 0, 0, 1) worked well. You can check this paper https://arxiv.org/abs/1708.02002.
I found an implementation of multi-label focal loss here:
https://github.com/Umi-you/FocalLoss
EDIT: Seems like his implementation doesn’t work.