Copy-view behaviour and mutating arrays
See original GitHub issueContext:
- in https://github.com/data-apis/array-api/pull/20#issuecomment-679497162 we discussed the differences between libraries in returning a copy vs. a view from function calls.
- in https://github.com/data-apis/array-api/issues/8 we were discussing how to deal with mutability
That issue and PR were about unrelated topics, so I’ll try to summarize the copy-view and mutation topic here and we can continue the discussion.
Note that the two topics are fairly coupled, because copy/view differences only matter (for semantics, not for performance) when mixed with mutation.
Mutating arrays
There’s a number of things that may rely on mutation:
- In-place operators like
+=
,*=
- The
out=
keyword argument - Element and slice assignment with
__setitem__
Summary of the issue with mutation by @shoyer was: Mutation can be challenging to support in some execution models (at least without another layer of indirection), which is why several projects currently don’t support it (TensorFlow and JAX) or only support it half-heartedly (e.g., Dask). The commonality between these libraries is that they build up abstract computations, which is then transformed (e.g., for autodiff) and/or executed in parallel. Even NumPy has “read only” arrays. I’m particularly concerned about new projects that implement this API, which might find the need to support mutation burdensome.
@alextp said: TensorFlow was planning to add mutability and didn’t see a real issue with supporting out=
.
@shoyer said: It’s definitely always possible to support mutation at the Python level via some sort of wrapper layer.
dask.array
is perhaps a good example of this. It supports mutating operations and out in some cases, but its support for mutation is still rather limited. For example, it doesn’t support assignment like x[:2, :] = some_other_array
.
Working around limitations of no support for mutation can usually be done by one of:
- Use
where
for selection, e.g.,where(arange(4) == 2, 1, 0)
- Calculate the “inverse” of the assignment operator in terms of indexing, e.g.,
y = array([0, 1]); x = y[[0, 0, 1, 0]]
in this case
Some version of (2) always works, though it can be tricky to work out (especially with current APIs). The duality between indexing and assignment is the difference between specifying where elements come from or where they end up.
The JAX syntax for slice assignment is: x.at[idx].set(y) vs x[idx] = y
One advantage of the non-mutating version is that JAX can have reliable assigning arithmetic on array slices with x.at[idx].add(y)
(x[idx] += y
doesn’t work if x[idx]
returns a copy).
A disadvantage is that doing this sort thing inside a loop is almost always a bad idea unless you have a JIT compiler, because every indexing assignment operation makes a full copy. So the naive translation of an efficient Python loop that fills out an array row by row would now make a copy in each step. Instead, you’d have to rewrite that loop to use something like concatenate instead (which in my experience is already about as efficient as using indexing assignment).
Copy-view behaviour
Libraries like NumPy and PyTorch return views where possible from function calls. It’s sometimes hard to predict when a view will be returned vs. when a copy - it not only depends on the function in question, but also on whether the input array is contiguous, and sometimes even on input dtype.
This is one place where it’s hard to avoid implementation choices leaking into the API:
- Static graph based implementations like TensorFlow and MXNet, or a functional implementation like JAX with immutable arrays, will return a copy for a function like
transpose()
. - Implementations which support strides and/or use a dynamic graph are able to, and therefore often will, return a view when they can (which is the case for
transpose()
).
The above copy vs. view difference starts leaking into the API - i.e., the same code starts giving different results for different implementations - when it is combined with an operation that performs in-place mutation of an array (either the base array or the view on it). In the absence of that combination, views are simply a performance optimization that’s invisible to the user.
The question is whether copy-view differences should be allowed, and if so how to deal with the semantics that vary between libraries.
To answer whether is should be allowed, let’s first ask how often the combination of views and mutation is used. A few observations:
- It is normally considered a bug if a library function (e.g. a SciPy or scikit-learn one) mutates any of its input arguments - unless the function is explicitly documented as doing so, which is rare. So the main concern is use inside functions, with arrays that are either created inside the function or use a copy of the input array.
- A search for patterns like
*=
,+=
and] =
in SciPy and scikit-learn.py
files shows that in-place mutation inside functions is heavily used. - There’s a significant difference between mutating a complete array (e.g. with
+= 1
) and mutating part of an array (e.g. withx[:, :2] = y
). The former is a lot easier to support for array libraries employing static graphs or a JIT than the latter. See the discussion at https://github.com/data-apis/array-api/issues/8#issuecomment-673202302 for details. - It’s harder to figure out how often the combination of mutating part of an array and that mutation affecting a view occurs. This could be tested though, with a patched NumPy to raise an exception on mutations affecting a view and then running test suites of downstream libraries.
Options for how to standardize
In https://github.com/data-apis/array-api/issues/8 @shoyer listed the following options for how to deal with mutability:
- Require support for in-place operations. Libraries that don’t support mutation fully will need to write a wrapper layer, even if it would be inefficient.
- Make support for in-place operations optional. Arrays can indicate whether they support mutation via some standard API, e.g., like NumPy’s
ndarray.flags.writeable
. (From later discussion, see https://github.com/data-apis/array-api/issues/8#issuecomment-674514340 for the implication of that for users of the API). - Don’t include support for in-place operations in the spec. This is a conservative choice, one which might have negative performance consequences (but it’s a little hard to say without looking carefully). At the very least, it might require a library like SciPy to retain a special path for numpy.ndarray objects.
To that I’d like to add a more granular option:
-
Require support for in-place operations that are unambiguous, and require raising an exception in case a view is mutated.
Rationale:
(a) This would require libraries that don’t support mutation to write a wrapper layer, but the behaviour would be unambiguous and in most cases the wrapper would not be inefficient. (b) In case inefficient mutation is detected (e.g. mutation a large array row-by-row in a loop), a warning may be emitted.
A variant of this option would be:
-
Require support for in-place operations that are unambiguous and mutate the whole array at once (i.e.
+=
andout=
must be supported, element/slice assignment must raise an exception), and require raising an exception in case a view is mutated.Trade-off here is ease of implementation for libraries like Dask and JAX vs. putting a rewrite burden on SciPy et al. and a usability burden on end users (the alternative to element/slice assignment is unintuitive).
Issue Analytics
- State:
- Created 3 years ago
- Comments:22 (22 by maintainers)
Top GitHub Comments
@mattip is going to help with an experiment, creating a NumPy branch which sets
flags.writeable
to False on creation of a view (without setting it back to True if the view goes out of scope). That seems feasible, because it should happen in the same place where the.base
attribute is set.A quick test to set a baseline, here are the results of running the test suites of some SciPy modules after making
numpy.asarray
andnumpy.random.random
return readonly arrays:stats
: 608 failed, 879 passedndimage
: 122 failed, 380 passedoptimize
: 345 failed, 1033 passedinterpolate
: 162 failed, 209 passedTesting the impact of the
mutable()
variant is a lot more difficult, but I’ll see if I can find a volunteer for that.