question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Avoid loading any data for reprs

See original GitHub issue

What happened?

For “small” datasets, we load in to memory when displaying the repr. For cloud backed datasets with large number of “small” variables, this can use a lot of time sequentially loading O(100) variables just for a repr.

https://github.com/pydata/xarray/blob/6c8db5ed005e000b35ad8b6ea9080105e608e976/xarray/core/formatting.py#L548-L549

What did you expect to happen?

Fast reprs!

Minimal Complete Verifiable Example

This dataset has 48 “small” variables

import xarray as xr

dc1 = xr.open_dataset('s3://its-live-data/datacubes/v02/N40E080/ITS_LIVE_vel_EPSG32645_G0120_X250000_Y4750000.zarr', engine= 'zarr', storage_options = {'anon':True})
dc1._repr_html_()

On 2022.03.0 this repr takes 36.4s If I comment the array.size condition I get 6μs.

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:43:32) [Clang 12.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None

xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.22.4 scipy: 1.8.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.3 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: None dask: 2022.05.2 distributed: None matplotlib: 3.5.2 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None setuptools: 62.3.2 pip: 22.1.2 conda: None pytest: None IPython: 8.4.0 sphinx: 4.5.0

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
dcheriancommented, Jun 24, 2022

I think the best thing to do is to not load anything unless asked to. So delete the array.size < 1e5 condition.

0reactions
Illviljancommented, Jun 26, 2022

Is the print still slow if somewhere just before the load the array was masked to only show a few start and end elements, array[[0, 1, -2, -1]]?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is there a way to prevent a report from automatica...
I'm wondering if there is a way to prevent a report that is embedded on a dashboard from automatically running on load until...
Read more >
Need to Prevent Report Load on Page Load
Currently, on page page in which a report is present, when the page loads, the report query is executed unconditionally. Is there ANY...
Read more >
How to prevent data table from loading on load
Hi, I am trying to create a report with 3 parameters. The 3 parameters are required because they are used as a part...
Read more >
How prevent report from running on page load
Once all data is retrieved, check the number of records. On a trial-error basis you can determine what amount of data is suitable...
Read more >
How we avoid the slow loading and opening reports in ...
2)Filter selection : Loading the report tableau tries to load dashbaord with all data that is matching selected filter criteria.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found