expand_dims() modifies numpy.ndarray.flags to write only, upon manually reverting this flag back, attempting to set a single inner value using .loc will instead set all of the inner array values
See original GitHub issueI am using the newly updated expand_dims API that was recently updated with this PR https://github.com/pydata/xarray/pull/2757. However the flag setting behaviour can also be observed using the old API syntax.
>>> expanded_da = xr.DataArray(np.random.rand(3,3), coords={'x': np.arange(3), 'y': np.arange(3)}, dims=('x', 'y')) # Create a 2D DataArray
>>> expanded_da
<xarray.DataArray (x: 3, y: 3)>
array([[0.148579, 0.463005, 0.224993],
[0.633511, 0.056746, 0.28119 ],
[0.390596, 0.298519, 0.286853]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
>>> expanded_da.data.flags # Check current state of numpy flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> expanded_da.loc[0, 0] = 2.22 # Set a single value before expanding
>>> expanded_da # It works, the single value is set
<xarray.DataArray (x: 3, y: 3)>
array([[2.22 , 0.463005, 0.224993],
[0.633511, 0.056746, 0.28119 ],
[0.390596, 0.298519, 0.286853]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
>>> expanded_da = expanded_da.expand_dims({'z': 3}, -1) # Add a new dimension 'z'
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22 , 2.22 , 2.22 ],
[0.463005, 0.463005, 0.463005],
[0.224993, 0.224993, 0.224993]],
[[0.633511, 0.633511, 0.633511],
[0.056746, 0.056746, 0.056746],
[0.28119 , 0.28119 , 0.28119 ]],
[[0.390596, 0.390596, 0.390596],
[0.298519, 0.298519, 0.298519],
[0.286853, 0.286853, 0.286853]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
Dimensions without coordinates: z
>>> expanded_da['z'] = np.arange(3) # Add new coordinates to the new dimension 'z'
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22 , 2.22 , 2.22 ],
[0.463005, 0.463005, 0.463005],
[0.224993, 0.224993, 0.224993]],
[[0.633511, 0.633511, 0.633511],
[0.056746, 0.056746, 0.056746],
[0.28119 , 0.28119 , 0.28119 ]],
[[0.390596, 0.390596, 0.390596],
[0.298519, 0.298519, 0.298519],
[0.286853, 0.286853, 0.286853]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
* z (z) int64 0 1 2
>>> expanded_da.loc[0, 0, 0] = 9.99 # Attempt to set a single value, get 'read-only' error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/dataarray.py", line 113, in __setitem__
self.data_array[pos_indexers] = value
File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/dataarray.py", line 494, in __setitem__
self.variable[key] = value
File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/variable.py", line 714, in __setitem__
indexable[index_tuple] = value
File "/Users/dhemming/.ve/unidata_notebooks/lib/python3.6/site-packages/xarray/core/indexing.py", line 1174, in __setitem__
array[key] = value
ValueError: assignment destination is read-only
>>> expanded_da.data.flags # Check flags on the DataArray, notice they have changed
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : False
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> expanded_da.data.setflags(write = 1) # Make array writeable again
>>> expanded_da.data.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> expanded_da.loc[0, 0, 0] # Check the value I want to overwrite
<xarray.DataArray ()>
array(2.22)
Coordinates:
x int64 0
y int64 0
z int64 0
>>> expanded_da.loc[0, 0, 0] = 9.99 # Attempt to overwrite single value, instead it overwrites all values in the array located at [0, 0]
>>> expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[9.99 , 9.99 , 9.99 ],
[0.463005, 0.463005, 0.463005],
[0.224993, 0.224993, 0.224993]],
[[0.633511, 0.633511, 0.633511],
[0.056746, 0.056746, 0.056746],
[0.28119 , 0.28119 , 0.28119 ]],
[[0.390596, 0.390596, 0.390596],
[0.298519, 0.298519, 0.298519],
[0.286853, 0.286853, 0.286853]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
* z (z) int64 0 1 2
Problem description
When applying the operation ‘expand_dims({‘z’: 3}, -1)’ on a DataArray the underlying Numpy array flags are changed. ‘C_CONTIGUOUS’ is set to False, and ‘WRITEABLE’ is set to False, and ‘OWNDATA’ is set to False. Upon changing ‘WRITEABLE’ back to True, when I try to set a single value in the DataArray using the ‘.loc’ operator it will instead set all the values in that selected inner array.
I am new to Xarray so I can’t be entirely sure if this expected behaviour. Regardless I would expect that adding a new dimension to the array would not make that array ‘read-only’. I would also not expect the ‘.loc’ method to work differently to how it would otherwise.
It’s also not congruent with the Numpy ‘expand_dims’ operation. Because when I call the operation ‘np.expand_dims(np_arr, axis=-1)’ the 'C_CONTIGUOUS ’ and 'WRITEABLE ’ flags will not be modified.
Expected Output
Here is a similar flow of operations that demonstrates the behaviour I would expect from the DataArray after applying ‘expand_dims’:
>>> non_expanded_da = xr.DataArray(np.random.rand(3,3,3), coords={'x': np.arange(3), 'y': np.arange(3)}, dims=('x', 'y', 'z')) # Create the new DataArray to be in the same state as I would expect it to be in after applying the operation 'expand_dims({'z': 3}, -1)'
>>> non_expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[0.017221, 0.374267, 0.231979],
[0.678884, 0.512903, 0.737573],
[0.985872, 0.1373 , 0.4603 ]],
[[0.764227, 0.825059, 0.847694],
[0.482841, 0.708206, 0.486576],
[0.726265, 0.860627, 0.435101]],
[[0.117904, 0.40569 , 0.274288],
[0.079321, 0.647562, 0.847459],
[0.57494 , 0.578745, 0.125309]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
Dimensions without coordinates: z
>>> non_expanded_da.data.flags # Check flags
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> non_expanded_da['z'] = np.arange(3) # Set coordinate for dimension 'z'
>>> non_expanded_da
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[0.017221, 0.374267, 0.231979],
[0.678884, 0.512903, 0.737573],
[0.985872, 0.1373 , 0.4603 ]],
[[0.764227, 0.825059, 0.847694],
[0.482841, 0.708206, 0.486576],
[0.726265, 0.860627, 0.435101]],
[[0.117904, 0.40569 , 0.274288],
[0.079321, 0.647562, 0.847459],
[0.57494 , 0.578745, 0.125309]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
* z (z) int64 0 1 2
>>> non_expanded_da.loc[0, 0, 0] = 2.22 # Set value using .loc method
>>> non_expanded_da # The single value referenced is set which is what I expect to happen
<xarray.DataArray (x: 3, y: 3, z: 3)>
array([[[2.22 , 0.374267, 0.231979],
[0.678884, 0.512903, 0.737573],
[0.985872, 0.1373 , 0.4603 ]],
[[0.764227, 0.825059, 0.847694],
[0.482841, 0.708206, 0.486576],
[0.726265, 0.860627, 0.435101]],
[[0.117904, 0.40569 , 0.274288],
[0.079321, 0.647562, 0.847459],
[0.57494 , 0.578745, 0.125309]]])
Coordinates:
* x (x) int64 0 1 2
* y (y) int64 0 1 2
* z (z) int64 0 1 2
Output of xr.show_versions()
xarray: 0.12.1 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: 1.5.0 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: 0.9.6.1.post1 iris: None bottleneck: None dask: None distributed: None matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: None setuptools: 39.0.1 pip: 10.0.1 conda: None pytest: None IPython: None sphinx: None
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
As you’ve noticed, these arrays are “read only” because otherwise indexing can modify more than the original values, e.g., consider:
Both “y” values were set to 1!
The work around is to call
.copy()
on the array after callingexpand_dims()
, e.g.,Now the correct result is printed:
This is indeed intended behavior: by making the result read-only, we can expand dimensions without copying the original data, and without needing to worry about indexing modifying the wrong values.
That said, we could certainly improve the usability of this feature in xarray. Some options:
.copy()
in the error message xarray prints when an array is read-only.copy
argument toexpand_dims
, so users can writecopy=True
if they want a writeable result.expand_dims()
is sometimes but not always writeable?I have just realized that
.copy(deep=True)
is a possible fix for datasets.