question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multilabel & non-mutually-exclusive classes

See original GitHub issue

Hi, I’m wondering if there is an elegant way to do so :

I have a network outputing a (9 x 9 x 9) matrix (ie. 9 rows by 9 cols by 9 channel, you can see it as a 81 x 9 Dense output layer). The thing is, I want to output non-mutually-exclusive classes on it. For each 9 by 9 dimension, I need to output the more likely of 9 possible classes (channels). But the 81 “pixels” can share same value.

Here is what I’ve figured out so far :

Nice architecture, but wrong classification task

model = Sequential()
model.add(Dense(64, activation='relu', input_shape=input_shape))
model.add(Dense(64, activation='relu'))
model.add(Flatten())
model.add(Dense(81*9, activation='softmax'))
model.add(Reshape((9, 9, 9)))

solver.compile(
    optimizer=Adadelta(),
    loss='categorical_crossentropy',  # Here, loss might be the whole problem ...
)

The thing is my predictions will sum to 1.0 on each output whereas I want them to sum at 81. This is because optimizer will consider 9 x 9 x 9 mutually-exclusive classes.

example of output of what I want to avoid

In[]: model.predict(Xtrain).sum((1, 2, 3))  # summing on all dimensions except 0
Out[]: array([1., 1., 1., ..., 1., 1., 1.])  # shape is len(Xtrain) 

I manage to workarround with :

classificaiton task is fine, but net structure is painful …

model = Sequential()
model.add(Dense(64, activation='relu', input_shape=input_shape))
model.add(Dense(64, activation='relu'))
model.add(Flatten())

grid = Input(shape=input_shape)  # define Input Layer to plug on model
features = model(grid)  # define features created by the Network

# define one Dense layer to plug on model
# Each layer is independant
digit_placeholders = [
    Dense(10, activation='softmax')(features)
    for i in range(81)
]

solver = Model(grid, digit_placeholders)

solver.compile(
    optimizer=Adadelta(),
    loss='categorical_crossentropy',  # This time, loss is fine
)

This way, each of my 9 x 9 “pixel” (ie 81 output) will look separately for the more likely classes between 9 (channels). This is what I want, however, I don’t find it quite elegant … Is there a way to achieve the same classification task without defining 81 independents Dense output layer. Is there a loss function that would help me achieving this ?

Thanks a lot for your help.

dithyrambe

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
gabrieldemarmiessecommented, Dec 30, 2017

Hi! You were right to worry about it, I read the source code, and while it seems that the keras categorical_crossentropy should work in your case with the backend being tensorflow, it’s just pure luck (the sum is performed on the last axis). It won’t work in other backends. (And it’s not documented to work with 4D tensors in tensorflow, so your code might break in a future keras updates). But you’re in luck because in keras it’s easy to do custom loss functions! So in your case, you want to compute categorical crossentropy on the last axis, and then average the results over all the other axis. What you can do is:

Import keras.backend as K def categorical_crossentropy_3rd_axis(y_true, y_pred) : clipped_y_pred = K.clip(y_pred, K.epsilon(), 1-K.epsilon()) product = y_true*K.log(clipped_y_pred) categorical_crossentropies = K.sum(product, axis=3) #same axis as the softmax return K.mean(categorical_crossentropies)

model.compile(loss=categorical_crossentropy_3rd_axis,…) Here you go. This should help.

0reactions
gabrieldemarmiessecommented, Jan 5, 2018

Hi! So multiple things.

  • The tensor that gets out of a softmax activation function is already normalized along the axis of which you’ve applied the softmax. Since categorical crossentropy expect normalized inputs, keras wants to make sure that you normalized what gets in the categorical crossentropy. In your case, you’ve used softmax, so you’re safe. btw if you do outputs /= K.sum(outputs, axis=3, keepdims=True) It’ll change nothing the sum will make a tensor of ones. If you didn’t specify the axis argument in your sum… Well, you broke categorical crossentropy because now along the 3rd axis, the sum is 1/81.

  • There are three main distinct components here. Layers, activations, and loss. With what I gave you, you only applied an activation and a loss. There is no layer, so no chances for your network to improve itself. So it’s normal that with dense layers it’s better. You can always add dense layers to our custom softmax function, it’s not incompatible. keras fuses layers and activations because it’s handy to write but mathematically, it’s totally distinct.

  • If you want the dense layers, you can, just don’t use any activation when creating the layers if you plan to use softmax just after.

  • ProTip: 81 Dense layers are equivalent to one LocallyConnected2D(nb_units, (1, 1)). I’ll let you read the documentation about it.

I advise you to look at moocs for the difference between layer, activation and loss. fast.ai is pretty good. You have also the stanford class about CV with deep leaning on youtube that is nice.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Non-mutually exclusive classification sum of probabilities
Just from the probabilistic side of things: When the classes are not mutually ... This is commonly known as "multi-label classification.
Read more >
How to establish a non-mutually exclusive classification sum ...
You're looking for multilabel classification. The convention is that classes must be mutually exclusive. For a problem where you could potentially assign more ......
Read more >
Non-Mutually Exclusive Deep Neural Network Classifier for ...
In this study, the faults belong to eight classes: BND, BCO, BCI, BCR, BCOI, BCOR, BCIR, and BCOIR. Each class is independent and...
Read more >
Merge annotations for multi label classification tasks (non ...
Merge annotations for multi label classification tasks (non mutually exclusive) ... I.e. when in a class hierachy a new label is created.
Read more >
Graph Neural Networks-Based Multilabel Classification of ...
Graph neural networks; Citation network; Multilabel classification ... of non-mutually exclusive categories, there are some classes that ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found