Hessian diagonal computation
See original GitHub issueHello,
I’ve been playing around with autograd
and I’m having a blast. However I’m having some difficulty with extracting the diagonal of the Hessian.
This is my current code:
from autograd import hessian
import autograd.numpy as np
y_pred = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
weights = np.array([1, 1, 1, 1, 1], dtype=float)
def softmax(x, axis=1):
z = np.exp(x)
return z / np.sum(z, axis=axis, keepdims=True)
def loss(y_pred):
y_true = np.array([
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1]
], dtype=float)
ys = np.sum(y_true, axis=0)
y_true = y_true / ys
ln_p = np.log(softmax(y_pred))
wll = np.sum(y_true * ln_p, axis=0)
loss = -np.dot(weights, wll)
return loss
hess = hessian(loss)(y_pred)
I understand that hessian
is simply jacobian
called twice and that hess
is an n * p * n * p
matrix. I can extract the diagonal manually and obtain my expected output which is:
[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
[0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
[0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
[0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
[0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]
I’ve checked this numerically and it’s fine. The problem is that this still requires computing the full Hessian before accessing the diagonal part, which is really expensive. Is there any better way to proceed? I think this is a common use case in machine learning optimization that could deserve a dedicated convenience function
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (2 by maintainers)
Top Results From Across the Web
Computing the diagonal elements of a Hessian #3801 - GitHub
Hi all, I would like to use Jax to compute the diagonal elelments of a Hessian matrix, i.e second partial derivatives \partial y^2 ......
Read more >Autodiff: calculate just the diagonal of the Hessian
Given a function f(x) from R^2 → R, is there an efficient way to calculate just the diagonal entries of the Hessian matrix...
Read more >HesScale: Scalable Computation of Hessian Diagonals - arXiv
In this paper, we develop HesScale, a scalable approach to approximating the diagonal of the Hessian matrix, to incorporate second-order ...
Read more >APPLYING A DIAGONAL HESSIAN APPROXIMATION FOR ...
Here, we apply the inverse of a diagonal Hessian approximation for preconditioning, which is a physically founded approach. However, its calculation is ...
Read more >Computing the diagonal approximation of the Hessian of the ...
Here we are going to calculate the elements along the diagonal of the Hessian matrix based on the gradient. ∂r(x,w) ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Unfortunately, I don’t think it’s possible to compute the diagonal of the Hessian other than by taking N separate Hessian-vector products, equivalent to instantiating the full Hessian and then taking the diagonal. People resort to all sorts of tricks to estimate the trace of the Hessian (e.g. https://arxiv.org/abs/1802.03451) precisely because it’s expensive to evaluate the diagonal.
Autograd’s
elementwise_grad
has a very important caveat: it only applies to functions for which the Jacobian is diagonal. All it does is a vector-Jacobian product with a vector of ones, which gives you the sum of each row of the Jacobian. If the Jacobian is diagonal, then that’s the same thing as the diagonal of the Jacobian.That caveat was in the docstring of an earlier version of
elementwise_grad
. But at some point we deleted the function (because it can be misleading!) and then later we reinstated it, without the docstring. I just added the caveat back in. Sorry for the confusion.How stupid of a work-around (not invoking
jacobian
elementwise_grad
/ trying to avoid the diagonal Jacobian restriction / trying to avoid computing the second cross-partials) would it be to loop over the input arguments and create the pure second partials one-at-a-time usinggrad
? Supposing the input arguments are unpacked - and computing the j^{th} pure second partial asgrad(grad(g,(j)),(j))
?I’m assuming pretty stupid, but for example
compare to the “incorrect” answer provided by composing
elementwise_grad
in an attempt to get at the pure partials