Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add eager mode gradients for ops.

See original GitHub issue

From @nsthorat on January 18, 2018 15:56

The infrastructure for eager mode is now ready for gradient methods to be filled in!

Eager mode provides a new set of methods on NDArrayMath which allows the user to eagerly compute gradients. Most users will use an optimizer like this:

const weights = dl.variable(Array2D.randNormal([784, 10]));
const cost = optimizer.minimize(() => {
  const batch = data.nextTrainBatch(BATCH_SIZE);
  const y = math.matMul(batch.xs, weights);
  const loss = math.mean(math.softmaxCrossEntropyWithLogits(labels, ys));
  return loss;
});

You’ll notice that there is no use of the Graph, we simply use ops on NDArrayMath directly inside of an optimizer.minimize() method.

You can find a full example of training MNIST in eager mode here: https://github.com/PAIR-code/deeplearnjs/blob/master/demos/mnist_eager/model.ts

As part of NDArrayMath we expose several new methods. The important ones are these:

math.gradients(f: () => cost, xs) which executes f() (which produces a scalar value) and returns the gradient of the output of f with respect to xs (which can be an NDArray or a string => NDArray map).
math.valueAndGradients(f: () => cost, xs) which is the same as math.gradients() but also returns the output of f().
math.vjp(f: () => y, x, dy) which computes a vector-jacobian product - it is similar to gradients, but allows f() to produce a non-scalar value and lets the user provide a dy. This is useful to compute a subset of backpropagation, or to tests gradients of a single op with a provided dy (this is how we unit test).
math.customGradient(f: () => {value, gradients}, xs) which allows the user to provide a custom gradient of an arbitrary function closure instead of using the default gradients of the ops in the function. We use this for numerical stability for ops like softmaxCrossEntropy, and for mean / sum so we can compute a faster gradient (instead of the combination of gradients of the kernels they use). Most of the time, you shouldn’t need to use this.

Now that these methods exist and are relatively stable, we can flush out gradients for kernels and ops!

To add gradients for kernels, we simply need to add a derivative function to the executeKernel calls inside of NDArrayMath. An example:

const der = (dy: Array2D<'float32'>, y: Array2D) => {
  return {
    a: () => this.matMul(dy, b, MatrixOrientation.REGULAR, MatrixOrientation.TRANSPOSED),
    b: () => this.matMul(a, dy, MatrixOrientation.TRANSPOSED, MatrixOrientation.REGULAR)
  };
};
return this.backendEngine.executeKernel(
  'MatMul', {inputs: {a, b}, args: {aOrientation, bOrientation}}, der);

The derivative is an function that takes dy, and y, and returns an object whose keys are the inputs (as defined by the inputs argument to executeKernel) and returns a function that returns the derivative with respect to that input. These derivatives should not call executeKernel, rather call math ops directly (this is so we can compute second order gradients).

Two example PRs adding gradients: https://github.com/PAIR-code/deeplearnjs/pull/521 https://github.com/PAIR-code/deeplearnjs/pull/544

Note that we have lots of gradients in the Graph layer already, we just need to move them over to the gradients defined in eager mode.

Here is the list of ops and whether the gradient has been implemented: