Inconsistent handling of complex functions
See original GitHub issueMy complex analysis is a bit rusty, and I’m getting confused by the handling of complex functions. Many functions are differentiable as complex functions (i.e. they are holomorphic) and their complex derivatives are implemented in autograd.
However there also seem to be functions, like abs
, var
, angle
and others, which are not differentiable as complex functions, but they also have derivatives implemented. I’m assuming these derivatives treat the complex inputs as if they were 2D real valued vectors? This seems inconsistent to me.
Users can fairly easily replicate the second behaviour without these derivatives being defined, by manually decomposing their numbers into real and imaginary parts, so I would tentatively propose removing these pseudo-derivatives…
Apologies if I’m making some dumb mistake here…
Issue Analytics
- State:
- Created 7 years ago
- Comments:13 (7 by maintainers)
I’ve done some scribbling and I think that the above approach won’t work for forward mode, because the incoming forward gradient won’t contain enough information to work out the next forward gradient…
However there is an analagous approach that will work, namely defining the forward derivative to be
I believe this will maintain the property that the derivative has the same type and shape as the output (which makes sense for forward mode).
Hi Jamie, thanks for bringing this up. It’s something we thought through carefully but never properly documented. I’ll have a go here. Consider a complex-to-complex function,
f
, expressed in terms of real-to-real components,u
andv
:We define
grad(f)
as(The second argument of
grad
specifies which argument we’re differentiating with respect to.)So we throw out
v
, the imaginary part of f, entirely. And that’s it.Why does this make sense? Well, it covers three important cases:
f
is holomorphic, then this gives the usual complex derivative of a holomorphic function (sincegrad(u, 0) == grad(v, 1)
andgrad(u, 1) == - grad(v, 0)
).f
is a real-valued loss function of a complex parameter,x
, then it gives a result that you can use in a gradient-based optimizer, by taking steps in the direction of the conjugate ofgrad(f)(x)
(not to be confused with the conjugate gradient method!).f
is a real-to-real function that happens to use complex primitives internally, some of which must necessarily be non-holomorphic (maybe you use FFTs to implement convolutions for example) then this gives the expected result.Since optimization is Autograd’s main intended application, I feel strongly that we should support these last two cases, even though it requires handling non-holomorphic functions.
Where our convention fails is if you have some non-holomorphic function and you care about all of du/dx, du/dy, dv/dx and dv/dy. But then the answer would have to contain four real values and there would be no way to express it as a single complex number. A guiding principle throughout the development of autograd has been that evaluating the gradient should give a result of the same type and shape as the input.
Chapter 4 of my PhD thesis goes into a bit more detail about how we define the primitive vector-Jacobian products.
Does that make any sense?