Avoid loading any data for reprs
See original GitHub issueWhat happened?
For “small” datasets, we load in to memory when displaying the repr. For cloud backed datasets with large number of “small” variables, this can use a lot of time sequentially loading O(100) variables just for a repr.
What did you expect to happen?
Fast reprs!
Minimal Complete Verifiable Example
This dataset has 48 “small” variables
import xarray as xr
dc1 = xr.open_dataset('s3://its-live-data/datacubes/v02/N40E080/ITS_LIVE_vel_EPSG32645_G0120_X250000_Y4750000.zarr', engine= 'zarr', storage_options = {'anon':True})
dc1._repr_html_()
On 2022.03.0
this repr takes 36.4s
If I comment the array.size
condition I get 6μs.
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
No response
Anything else we need to know?
No response
Environment
xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.4 scipy: 1.8.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2022.05.2 distributed: None matplotlib: 3.5.2 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None setuptools: 62.3.2 pip: 22.1.2 conda: None pytest: None IPython: 8.4.0 sphinx: 4.5.0
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:5 (5 by maintainers)
I think the best thing to do is to not load anything unless asked to. So delete the
array.size < 1e5
condition.Is the print still slow if somewhere just before the load the array was masked to only show a few start and end elements,
array[[0, 1, -2, -1]]
?