Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LogSumExp seems to be missing in SoftmaxCrossEntropyLoss (sparseLabel=false)

See original GitHub issue

Description

On latest MasterBranch (commit 4b516196) in SoftmaxCrossEntropyLoss (sparseLabel=false) line 85

            loss = pred.mul(lab).neg().sum(new int[] {classAxis}, true);

looks like the LogSumExp term (red) is missing. grafik

Proposed correction

            int[] axes = new int[] {classAxis};
            NDArray max = pred.max(axes, true);
            NDArray logSumExp = max.add((pred.sub(max)).exp().sum(axes, true).log());
            loss = logSumExp.sub(pred.mul(lab).sum(axes, true));

Issue Analytics

State:
Created 3 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

2reactions

enpasoscommented, Feb 6, 2021

Hi @roywei, thank you for looking into the issue. I suggest to reopen the issue to correct the “sparseLabel=false” part of the loss function.

Your unit test investigates the method “evaluate” in SoftmaxCrossEntropyLoss in a different context

fromLogit = false
sparseLabel = true

and does not test the problematic code part I was dealing with.

My context is

fromLogit = true
sparseLabel = false  // this is especially important, and maybe just a situation not concerned in the tests sofar

As you suggested let’s look at an example from the python apis as reference for a unit test:

import tensorflow as tf

logitsInput = [[-8.0, 10.0,  3.0],  [1.0, 2.0, 3.0]]
target = [[0.3, 0.3, 0.4], [0.1, 0.1, 0.8]]

output = tf.nn.softmax_cross_entropy_with_logits(labels=target, logits=logitsInput)
print(output)

gives out

tf.Tensor([8.2009115  0.70760596], shape=(2,), dtype=float32)

The unitTest with the current DJL code fails:

@Test
    public void softmaxCrossEntropyNonSparseLabelsFailTest() {
        boolean fromLogit = true;
        boolean sparseLabel = false;

        try (NDManager manager = NDManager.newBaseManager()) {

            NDArray pred = manager.create(new float[] {-8.0f, 10.0f, 3.0f, 1.0f, 2.0f, 3.0f}, new Shape(2,3));
            NDArray label = manager.create(new float[] {0.3f, 0.3f, 0.4f, 0.1f, 0.1f, 0.8f}, new Shape(2,3));
            NDArray expected = manager.create(new float[] {8.2009115f,  0.70760596f}).mean();

            NDArray output = Loss.softmaxCrossEntropyLoss("SoftmaxCrossEntropyLoss", 1, 1, sparseLabel, fromLogit).evaluate(new NDList(label), new NDList(pred));

            Assertions.assertAlmostEquals(output, expected);

        }
    }

The unit test for my corrected code passes:

 @Test
    public void softmaxCrossEntropyNonSparseLabelsPassTest() {
        boolean fromLogit = true;
        boolean sparseLabel = false;

        try (NDManager manager = NDManager.newBaseManager()) {

            NDArray pred = manager.create(new float[] {-8.0f, 10.0f, 3.0f, 1.0f, 2.0f, 3.0f}, new Shape(2,3));
            NDArray label = manager.create(new float[] {0.3f, 0.3f, 0.4f, 0.1f, 0.1f, 0.8f}, new Shape(2,3));
            NDArray expected = manager.create(new float[] {8.2009115f,  0.70760596f}).mean();

            NDArray output = new MySoftmaxCrossEntropyLoss("SoftmaxCrossEntropyLoss", 1, 1, sparseLabel, fromLogit).evaluate(new NDList(label), new NDList(pred));

            Assertions.assertAlmostEquals(output, expected);

        }
    }

I have adjusted the loss function in the following way

public class MySoftmaxCrossEntropyLoss extends Loss {

    private float weight;
    private int classAxis;
    private boolean sparseLabel;
    private boolean fromLogit;

    /**
     * Creates a new instance of {@code SoftmaxCrossEntropyLoss} with default parameters.
     */
    public MySoftmaxCrossEntropyLoss() {
        this("SoftmaxCrossEntropyLoss");
    }

    /**
     * Creates a new instance of {@code SoftmaxCrossEntropyLoss} with default parameters.
     *
     * @param name the name of the loss
     */
    public MySoftmaxCrossEntropyLoss(String name) {
        this(name, 1, -1, true, false);
    }

    /**
     * Creates a new instance of {@code SoftmaxCrossEntropyLoss} with the given parameters.
     *
     * @param name        the name of the loss
     * @param weight      the weight to apply on the loss value, default 1
     * @param classAxis   the axis that represents the class probabilities, default -1
     * @param sparseLabel whether labels are integer array or probabilities, default true
     * @param fromLogit   whether predictions are log probabilities or un-normalized numbers, default
     *                    false
     */
    public MySoftmaxCrossEntropyLoss(
            String name, float weight, int classAxis, boolean sparseLabel, boolean fromLogit) {
        super(name);
        this.weight = weight;
        this.classAxis = classAxis;
        this.sparseLabel = sparseLabel;
        this.fromLogit = fromLogit;
    }

    /**
     * {@inheritDoc}
     */
    @Override
    public NDArray evaluate(NDList label, NDList prediction) {
        NDArray pred = prediction.singletonOrThrow();
        if (!fromLogit) {
            pred = pred.logSoftmax(classAxis);
        }
        NDArray loss;
        NDArray lab = label.singletonOrThrow();
        if (sparseLabel) {
            NDIndex pickIndex =
                    new NDIndex()
                            .addAllDim(Math.floorMod(classAxis, pred.getShape().dimension()))
                            .addPickDim(lab);
            loss = pred.get(pickIndex).neg();
        } else {
            lab = lab.reshape(pred.getShape());
            int[] axes = new int[]{classAxis};
            NDArray max = pred.max(axes, true);
            NDArray predSubMax = pred.sub(max);
            loss = predSubMax.exp().sum(axes, true).log().sub(predSubMax.mul(lab).sum(axes, true));
        }
        if (weight != 1) {
            loss = loss.mul(weight);
        }
        return loss.mean();
    }
}

1reaction

enpasoscommented, Feb 12, 2021

… forget about my last comment on the gradient. Didn’t take acount the “mean” which makes the factor 2 in the gradient. Thanks again and sorry for the “gradient noise”.