question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add eager mode gradients for ops.

See original GitHub issue

From @nsthorat on January 18, 2018 15:56

The infrastructure for eager mode is now ready for gradient methods to be filled in!

Eager mode provides a new set of methods on NDArrayMath which allows the user to eagerly compute gradients. Most users will use an optimizer like this:

const weights = dl.variable(Array2D.randNormal([784, 10]));
const cost = optimizer.minimize(() => {
  const batch = data.nextTrainBatch(BATCH_SIZE);
  const y = math.matMul(batch.xs, weights);
  const loss = math.mean(math.softmaxCrossEntropyWithLogits(labels, ys));
  return loss;
});

You’ll notice that there is no use of the Graph, we simply use ops on NDArrayMath directly inside of an optimizer.minimize() method.

You can find a full example of training MNIST in eager mode here: https://github.com/PAIR-code/deeplearnjs/blob/master/demos/mnist_eager/model.ts

As part of NDArrayMath we expose several new methods. The important ones are these:

  • math.gradients(f: () => cost, xs) which executes f() (which produces a scalar value) and returns the gradient of the output of f with respect to xs (which can be an NDArray or a string => NDArray map).
  • math.valueAndGradients(f: () => cost, xs) which is the same as math.gradients() but also returns the output of f().
  • math.vjp(f: () => y, x, dy) which computes a vector-jacobian product - it is similar to gradients, but allows f() to produce a non-scalar value and lets the user provide a dy. This is useful to compute a subset of backpropagation, or to tests gradients of a single op with a provided dy (this is how we unit test).
  • math.customGradient(f: () => {value, gradients}, xs) which allows the user to provide a custom gradient of an arbitrary function closure instead of using the default gradients of the ops in the function. We use this for numerical stability for ops like softmaxCrossEntropy, and for mean / sum so we can compute a faster gradient (instead of the combination of gradients of the kernels they use). Most of the time, you shouldn’t need to use this.

Now that these methods exist and are relatively stable, we can flush out gradients for kernels and ops!

To add gradients for kernels, we simply need to add a derivative function to the executeKernel calls inside of NDArrayMath. An example:

const der = (dy: Array2D<'float32'>, y: Array2D) => {
  return {
    a: () => this.matMul(dy, b, MatrixOrientation.REGULAR, MatrixOrientation.TRANSPOSED),
    b: () => this.matMul(a, dy, MatrixOrientation.TRANSPOSED, MatrixOrientation.REGULAR)
  };
};
return this.backendEngine.executeKernel(
  'MatMul', {inputs: {a, b}, args: {aOrientation, bOrientation}}, der);

The derivative is an function that takes dy, and y, and returns an object whose keys are the inputs (as defined by the inputs argument to executeKernel) and returns a function that returns the derivative with respect to that input. These derivatives should not call executeKernel, rather call math ops directly (this is so we can compute second order gradients).

Two example PRs adding gradients: https://github.com/PAIR-code/deeplearnjs/pull/521 https://github.com/PAIR-code/deeplearnjs/pull/544

Note that we have lots of gradients in the Graph layer already, we just need to move them over to the gradients defined in eager mode.

Here is the list of ops and whether the gradient has been implemented:

  • abs
  • acos
  • add
  • argmax
  • argmin (not important)
  • asin
  • atan
  • avgPool
  • batchNormalization
  • cast
  • ceil
  • clip
  • clone
  • ceil
  • concat
  • conv1D
  • conv2D
  • conv2DDerBias (would be a second order der)
  • conv2DDerFilter (would be a second order der)
  • conv2DDerInput (would be a second order der)
  • conv2DTranspose
  • cos
  • cosh
  • depthwiseConv2D
  • divide
  • elu
  • eluDer
  • equal
  • exp
  • floor
  • greater
  • greaterEqual
  • leakyRelu
  • less
  • lessEqual
  • localResponseNormalization
  • log
  • logicalOr
  • logicalAnd
  • matMul (needs derivatives when using transposed bit)
  • max
  • maximum
  • maxPool
  • maxPoolBackprop (would be second order der)
  • min
  • minimum
  • minPool
  • multinomial
  • multiply
  • neg
  • notEqual
  • oneHot
  • pad
  • pow (half implemented, needs broadcast + derB)
  • prelu
  • preluDer (would be second order der)
  • relu
  • reshape
  • resizeBilinear3D
  • reverse
  • selu
  • sigmoid
  • sin
  • sinh
  • slice
  • softmax
  • softmaxCrossEntropyWithLogits
  • sqrt
  • square
  • step
  • sub
  • sum
  • tan
  • where
  • tanh
  • tile
  • topK
  • transpose

Copied from original issue: tensorflow/tfjs-core#561

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:16 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jgartmancommented, Apr 26, 2018

I’ll try localResponseNormalization

0reactions
generic-github-usercommented, Dec 14, 2018

Is there an updated list of which functions have gradients implemented and which do not?

Read more comments on GitHub >

github_iconTop Results From Across the Web

In TensorFlow 2.0 with eager-execution, how to compute the ...
GradientTape() as gtape block, then I get the following error: "LookupError: No gradient defined for operation 'IteratorGetNext' (op type: ...
Read more >
[TF 2.0] optimizer_v2.get_updates() eager mode problem
GradientTape every time I want to compute the gradient instances - instead I want to obtain the update ops so they can be...
Read more >
Eager Execution: An imperative, define-by-run interface to ...
When you enable eager execution, operations execute immediately and return ... After we've computed the gradients, we discard the tape.
Read more >
Introduction to gradients and automatic differentiation
In this guide, you will explore ways to compute gradients with TensorFlow, especially in eager execution. Setup. import numpy as np
Read more >
A brief guide to Tensorflow Eager Execution
Eager mode is moving out of contrib, using eager execution you can run your code without a session. · Easily customize gradient computation...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found