question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Truncate long lines in repr of Dataset.attrs

See original GitHub issue

When loading from NetCDF, Dataset.attrs often has a few long strings, which may even have embedded newlines (eg a multi-paragraph summary or references section). It’s lovely that these are available, but they tend to make the repr very long and poorly formatted - to the point that many Jupyter notebooks begin by discarding the attrs, which makes it rather pointless to store or display metadata at all!

Given that these values are already truncated at 500 characters (including the indicative ..., but not the start point), I propose that they should instead be truncated to 80 characters including the indentation and key (as values are). For the sake of pretty-printing, this should also replace newlines or tabs with spaces and truncate early if an empty line is encountered.

Another solution would be add appropriate indentation following newlines or wrapping, so that the structure remains clear. However, I think that it is better to print a fairly minimal representation of the metadata by default.

>>> xr.open_dataset('http://dapds00.nci.org.au/thredds/dodsC/uc0/rs0_dev/20170215-stacked_sample/LS7_ETM_NBART_3577_15_-40.ncml')

<xarray.Dataset>
Dimensions:  (time: 246, x: 4000, y: 4000)
Coordinates:
  * y        (y) float64 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 ...
  * x        (x) float64 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 ...
  * time     (time) datetime64[ns] 1999-07-16T23:49:39 1999-07-25T23:43:07 ...
Data variables:
    crs      int32 ...
    blue     (time, y, x) float64 ...
    green    (time, y, x) float64 ...
    red      (time, y, x) float64 ...
    nir      (time, y, x) float64 ...
    swir1    (time, y, x) float64 ...
    swir2    (time, y, x) float64 ...
Attributes:
    date_created: 2017-03-07T11:57:26.511217
    Conventions: CF-1.6, ACDD-1.3
    history: 2017-03-07T11:57:26.511307+11:00 adh547 datacube-ncml (1.2.2+23.gd1f3512.dirty) ls7_nbart_albers.yaml, 1.0.6a, /short/v10/datacube/002/LS7_ETM_NBART/LS7_ETM_NBART_3577_15_-40.ncml, (15, -40)  # Created NCML file to aggregate multiple NetCDF files along the time dimension
    geospatial_bounds: POLYGON ((148.49626113888138 -34.828378308133452,148.638689676063308 -35.720318326735864,149.734176111491877 -35.599556747691196,149.582601578289143 -34.708911907843387,148.49626113888138 -34.828378308133452))
    geospatial_bounds_crs: EPSG:4326
    geospatial_lat_min: -35.7203183267
    geospatial_lat_max: -34.7089119078
    geospatial_lat_units: degrees_north
    geospatial_lon_min: 148.496261139
    geospatial_lon_max: 149.734176111
    geospatial_lon_units: degrees_east
    comment: -	Ground Control Points (GCP): new GCP chips released by USGS in Dec 2015 are used for re-processing
-	Geometric QA: each product undergoes geometric assessment and the assessment result will be recorded within v2 AGDC for filtering/masking purposes.
-	Processing parameter settings: the minimum number of GCPs for Ortho-rectified product generation has been reduced from 30 to 10.
-	DEM: 1 second SRTM DSM is used for Ortho-rectification.
-	Updated Calibration Parameter File (CPF): the latest/cu...
    product_suite: Surface Reflectance NBAR+T 25m
    publisher_email: earth.observation@ga.gov.au
    keywords_vocabulary: GCMD
    product_version: 2
    cdm_data_type: Grid
    references: -	Berk, A., Anderson, G.P., Acharya, P.K., Hoke, M.L., Chetwynd, J.H., Bernstein, L.S., Shettle, E.P., Matthew, M.W., and Adler-Golden, S.M. (2003) Modtran 4 Version 3 Revision 1 User s manual. Airforce Research Laboratory, Hanscom, MA, USA.
-	Chander, G., Markham, B.L., and Helder, D.L. (2009) Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sensing of Environment 113, 893-903.
-	Edberg, R., and Oliver, S. (2013) Projection-Indep...
    platform: LANDSAT-7
    keywords: AU/GA,NASA/GSFC/SED/ESD/LANDSAT,REFLECTANCE,ETM+,TM,OLI,EARTH SCIENCE
    publisher_name: Section Leader, Operations Section, NEMO, Geoscience Australia
    institution: Commonwealth of Australia (Geoscience Australia)
    acknowledgment: Landsat data is provided by the United States Geological Survey (USGS) through direct reception of the data at Geoscience Australias satellite reception facility or download.
    license: CC BY Attribution 4.0 International License
    title: Surface Reflectance NBAR+T 25 v2
    summary: Surface Reflectance (SR) is a suite of Earth Observation (EO) products from GA. The SR product suite provides standardised optical surface reflectance datasets using robust 
physical models to correct for variations in image radiance values due to atmospheric properties, and sun and sensor geometry. The resulting stack of surface reflectance
grids are consistent over space and time which is instrumental in identifying and quantifying environmental change. SR is based on radiance data from the...
    instrument: ETM
    source: LANDSAT 7 ETM+ surface observation
    publisher_url: http://www.ga.gov.au

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Zac-HDcommented, Mar 24, 2017

Sure, I’d be happy to. The above example will look much nicer, especially in wrapping environments:

<xarray.Dataset>
Dimensions:  (time: 246, x: 4000, y: 4000)
Coordinates:
  * y        (y) float64 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 ...
  * x        (x) float64 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 ...
  * time     (time) datetime64[ns] 1999-07-16T23:49:39 1999-07-25T23:43:07 ...
Data variables:
    crs      int32 ...
    blue     (time, y, x) float64 ...
    green    (time, y, x) float64 ...
    red      (time, y, x) float64 ...
    nir      (time, y, x) float64 ...
    swir1    (time, y, x) float64 ...
    swir2    (time, y, x) float64 ...
Attributes:
    date_created:           2017-03-07T11:57:26.511217
    Conventions:            CF-1.6, ACDD-1.3
    history:                2017-03-07T11:57:26.511307+11:00 adh547 datacube...
    geospatial_bounds:      POLYGON ((148.49626113888138 -34.828378308133452...
    geospatial_bounds_crs:  EPSG:4326
    geospatial_lat_min:     -35.7203183267
    geospatial_lat_max:     -34.7089119078
    geospatial_lat_units:   degrees_north
    geospatial_lon_min:     148.496261139
    geospatial_lon_max:     149.734176111
    geospatial_lon_units:   degrees_east
    comment:                -    Ground Control Points (GCP): new GCP chips ...
    product_suite:          Surface Reflectance NBAR+T 25m
    publisher_email:        earth.observation@ga.gov.au
    keywords_vocabulary:    GCMD
    product_version:        2
    cdm_data_type:          Grid
    references:             -    Berk, A., Anderson, G.P., Acharya, P.K., Ho...
    platform:               LANDSAT-7
    keywords:               AU/GA,NASA/GSFC/SED/ESD/LANDSAT,REFLECTANCE,ETM+...
    publisher_name:         Section Leader, Operations Section, NEMO, Geosci...
    institution:            Commonwealth of Australia (Geoscience Australia)
    acknowledgment:         Landsat data is provided by the United States Ge...
    license:                CC BY Attribution 4.0 International License
    title:                  Surface Reflectance NBAR+T 25 v2
    summary:                Surface Reflectance (SR) is a suite of Earth Obs...
    instrument:             ETM
    source:                 LANDSAT 7 ETM+ surface observation
    publisher_url:          http://www.ga.gov.au
0reactions
shoyercommented, Mar 23, 2017

Sounds like there is support here.

@Zac-HD Any interest in putting together a pull request? See here for the existing logic: https://github.com/pydata/xarray/blob/b3fc6c4e4fafdf4f075b791594633970a787ad79/xarray/core/formatting.py#L255

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python truncate a long string - Stack Overflow
I would change the condition perhaps to len(data) > 77 to account for the double dots (it's pointless to a truncate only the...
Read more >
11 Dimensionality reduction techniques you should know in ...
Dimensionality reduction simply refers to the process of reducing the number of attributes in a dataset while keeping as much of the variation ......
Read more >
Data Structures - Xarray
To make an Dataset from scratch, supply dictionaries for any variables ( data_vars ), coordinates ( coords ) and attributes ( attrs )....
Read more >
3. Data model — Python 3.11.1 documentation
An object's type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for...
Read more >
6 Using the DMU to Cleanse Data - Oracle Help Center
You can use the Database Migration Assistant for Unicode (DMU) to perform cleansing tasks. The cleansing actions ... You can also wrap or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found