Multilabel & non-mutually-exclusive classes
See original GitHub issueHi, I’m wondering if there is an elegant way to do so :
I have a network outputing a (9 x 9 x 9) matrix (ie. 9 rows by 9 cols by 9 channel, you can see it as a 81 x 9 Dense
output layer).
The thing is, I want to output non-mutually-exclusive classes on it. For each 9 by 9 dimension, I need to output the more likely of 9 possible classes (channels). But the 81 “pixels” can share same value.
Here is what I’ve figured out so far :
Nice architecture, but wrong classification task
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=input_shape))
model.add(Dense(64, activation='relu'))
model.add(Flatten())
model.add(Dense(81*9, activation='softmax'))
model.add(Reshape((9, 9, 9)))
solver.compile(
optimizer=Adadelta(),
loss='categorical_crossentropy', # Here, loss might be the whole problem ...
)
The thing is my predictions will sum to 1.0 on each output whereas I want them to sum at 81. This is because optimizer will consider 9 x 9 x 9 mutually-exclusive classes.
example of output of what I want to avoid
In[]: model.predict(Xtrain).sum((1, 2, 3)) # summing on all dimensions except 0
Out[]: array([1., 1., 1., ..., 1., 1., 1.]) # shape is len(Xtrain)
I manage to workarround with :
classificaiton task is fine, but net structure is painful …
model = Sequential()
model.add(Dense(64, activation='relu', input_shape=input_shape))
model.add(Dense(64, activation='relu'))
model.add(Flatten())
grid = Input(shape=input_shape) # define Input Layer to plug on model
features = model(grid) # define features created by the Network
# define one Dense layer to plug on model
# Each layer is independant
digit_placeholders = [
Dense(10, activation='softmax')(features)
for i in range(81)
]
solver = Model(grid, digit_placeholders)
solver.compile(
optimizer=Adadelta(),
loss='categorical_crossentropy', # This time, loss is fine
)
This way, each of my 9 x 9 “pixel” (ie 81 output) will look separately for the more likely classes between 9 (channels). This is what I want, however, I don’t find it quite elegant …
Is there a way to achieve the same classification task without defining 81 independents Dense
output layer. Is there a loss
function that would help me achieving this ?
Thanks a lot for your help.
dithyrambe
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (4 by maintainers)
Hi! You were right to worry about it, I read the source code, and while it seems that the keras categorical_crossentropy should work in your case with the backend being tensorflow, it’s just pure luck (the sum is performed on the last axis). It won’t work in other backends. (And it’s not documented to work with 4D tensors in tensorflow, so your code might break in a future keras updates). But you’re in luck because in keras it’s easy to do custom loss functions! So in your case, you want to compute categorical crossentropy on the last axis, and then average the results over all the other axis. What you can do is:
Import keras.backend as K def categorical_crossentropy_3rd_axis(y_true, y_pred) : clipped_y_pred = K.clip(y_pred, K.epsilon(), 1-K.epsilon()) product = y_true*K.log(clipped_y_pred) categorical_crossentropies = K.sum(product, axis=3) #same axis as the softmax return K.mean(categorical_crossentropies)
model.compile(loss=categorical_crossentropy_3rd_axis,…) Here you go. This should help.
Hi! So multiple things.
The tensor that gets out of a softmax activation function is already normalized along the axis of which you’ve applied the softmax. Since categorical crossentropy expect normalized inputs, keras wants to make sure that you normalized what gets in the categorical crossentropy. In your case, you’ve used softmax, so you’re safe. btw if you do
outputs /= K.sum(outputs, axis=3, keepdims=True)
It’ll change nothing the sum will make a tensor of ones. If you didn’t specify the axis argument in your sum… Well, you broke categorical crossentropy because now along the 3rd axis, the sum is 1/81.There are three main distinct components here. Layers, activations, and loss. With what I gave you, you only applied an activation and a loss. There is no layer, so no chances for your network to improve itself. So it’s normal that with dense layers it’s better. You can always add dense layers to our custom softmax function, it’s not incompatible. keras fuses layers and activations because it’s handy to write but mathematically, it’s totally distinct.
If you want the dense layers, you can, just don’t use any activation when creating the layers if you plan to use softmax just after.
ProTip: 81 Dense layers are equivalent to one
LocallyConnected2D(nb_units, (1, 1))
. I’ll let you read the documentation about it.I advise you to look at moocs for the difference between layer, activation and loss. fast.ai is pretty good. You have also the stanford class about CV with deep leaning on youtube that is nice.