Broadcast behavior in `linalg.cross`
See original GitHub issueThe purpose of this issue is to determine how, or if, broadcasting behavior should be specified for linalg.cross
.
Overview
At the moment, linalg.cross
is specified to only compute the cross product between two arrays x
and y
having the same shape. Meaning, the current linalg.cross
behavior is specified to not support broadcasting.
The origin of the specified behavior comes from PyTorch (<= v1.10.x) and TensorFlow which required (at the time of specification) input arrays of the same shape. Not supporting broadcasting was thus the lowest common denominator across array libraries.
TensorFlow has not changed its behavior; PyTorch has.
NumPy, in its np.cross
API, supports a limited form of broadcasting, as well as 2
element vectors. The broadcasting is limited as the computation dimension must have 2
or 3
elements and is not allowed to broadcast. E.g.,
In [1]: x = np.random.randn(4,3)
In [2]: y = np.random.randn(1,3)
In [3]: z = np.random.randn(4,1)
In [3]: np.cross(x,y)
Out[3]:
array([[ 0.04541174, 0.07097804, -0.53846464],
[ 0.01457143, -0.74247471, -0.1795357 ],
[ 2.4163107 , -0.72177202, -5.98228811],
[ 1.14862403, -1.86202792, 0.32601926]])
In [4]: np.cross(x,z)
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)
In [5]: np.add(x,z)
Out[5]:
array([[ 1.66159175, 0.92220278, 0.93708491],
[ 0.26326781, -0.37688777, 1.26610177],
[ 1.34535177, 1.13112439, 0.38157179],
[-1.78861678, -1.34595513, -0.08110636]])
Even though x
and z
are broadcast compatible, np.cross
raises an exception.
Starting in PyTorch v1.11.0, linalg.cross
supports broadcasting. From the docs,
Also supports batches of vectors, for which it computes the product along the dimension dim. In this case, the output has the same batch dimensions as the inputs broadcast to a common shape.
If after broadcasting input.size(dim) != 3 or other.size(dim) != 3.
Thus implying full broadcast support.
Questions
- Should the spec add support for broadcasting to
linalg.cross
? If so, full broadcasting or NumPy-style (i.e., must have matching dimension sizes, similar to howvecdot
works)? - Should the spec add support for size
2
vectors similar to NumPy?
Issue Analytics
- State:
- Created a year ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
For what is worth, now PyTorch implements this behaviour. See https://github.com/pytorch/pytorch/pull/83798
My argument went something like this. Other linalg functions like
matmul
andtensordot
do not broadcast in the contracted dimensions.cross
doesn’t do contraction, but it’s a similar thing in the sense that the arrays combine with each other in a way that isn’t just element-by-corresponding-element. If you have a (4, 3) array cross a (1, 3) array, what that means is that the first array is a stack of four 3-vectors, and the second is a single 3-vector. Broadcasting implicitly treats the latter as a stack of the same 3-vector four times, so that it can be applied to the first. On the other hand, in the simplest case, if you were to take a (1,) array and cross with a (3,) array, if these were to broadcast, this would mean treating the single element array as a 3-vector with the same number for x, y, and z (e.g., [2] x [3, 4, 5] would mean [2, 2, 2] x [3, 4, 5]). This is morally quite a different thing from broadcasting the “stacking” dimensions. Personally when I think of broadcasting, I think of it in the stacking sense. An array is a stack of smaller arrays in each dimension, and broadcasting lets you implicitly treat a single array “stack” (in a dimension) as repeated as many times as necessary. It’s a generalization of combining a single scalar with an array by repeating the scalar across every element of the array. You could always manually stack to make the dimensions match, but it’s inefficient to repeat the data.But if you were to broadcast contracted or crossed dimensions, this is no longer simplified stacking. Instead you would be say saying that, e.g., an
n x 1
column vector is the same thing as anyn x k
matrix with that column repeatedk
times, wherek
is whatever happens to match up with what you are multiplying it with, or you would be saying that a single numbera
is the same as the 3-vector<a, a, a>
. You’d be treating a column vector as something that can be stacked into a matrix or a scalar as something that can be stacked into a vector. But it’s rare to have a matrix with a column that is repeated, or a vector that is just a multiple of<1, 1, 1>
. A much more likely scenario is that this represents a mistake by the user.That’s why I suggested at the meeting that this behavior is likely to be intentional in NumPy, although we can’t know that for sure unless we dig into the original NumPy discussions, or ask people who might remember. I do know that the matmul case, which I see as being very similar, appears to be very intentional and is even mentioned in PEP 465: “For inputs with more than 2 dimensions, we treat the last two dimensions as being the dimensions of the matrices to multiply, and ‘broadcast’ across the other dimensions” (emphasis mine).
In fact, we had a very similar discussion about
tensordot
broadcasting, which pytorch also broadcasts in the contracted dimensions, but we decided for the spec to not (similar to numpy). See https://github.com/data-apis/array-api/issues/294.