Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

flat-table structure for geojson structured json

See original GitHub issue

Since #1316 is closed and I did not had the time yet to work out an real use-case example/proces for which I opened previous issue. Which is btw slightly off-topic compare to #1316.

Assuming we want to make a bar chart using using three variables (country, population and registered road vehicles per 1000 inhabitants): complicated_chart

And we’ve prepared our data in a pandas DataFrame within python (eventually all of this should be possible through altair/vega-lite). This DataFrame then looks as follow:

	country	gdp	population	reg_veh_per_1000_inh
0	Belgium	389300.0	10414336.0	678
1	Luxembourg	39370.0	491775.0	508
2	Netherlands	672000.0	16715999.0	477

which is converted to row-oriented JSON for usage within vega:

[  
   {  
      "country":"Belgium",
      "gdp":389300,
      "population":10414336,
      "reg_veh_per_1000_inh":508
   },
   {  
      "country":"Luxembourg",
      "gdp":39370,
      "population":491775,
      "reg_veh_per_1000_inh":678
   },
   {  
      "country":"Netherlands",
      "gdp":672000,
      "population":16715999,
      "reg_veh_per_1000_inh":477
   }
]

Where part of my generated vega specification includes the following:

...
"encode": {
  "enter": {
    "x": {
      "scale": "xscale",
      "field": "country"
    },
    "y": {
      "scale": "yscale",
      "field": "population"
    },   
    "fill": {
      "scale": "color",
      "field": "reg_veh_per_1000_inh"
    },
...

but in pandas the same DataFrame might also contain a geometry column:

	country	gdp	population	reg_veh_per_1000_inh	geometry
0	Belgium	389300.0	10414336.0	678	POLYGON ((3.3 51.3, 4 51.3, 5 51.5, 5.6 51, 6....
1	Luxembourg	39370.0	491775.0	508	POLYGON ((6 50.1, 6.2 49.9, 6.2 49.5, 5.9 49.4...
2	Netherlands	672000.0	16715999.0	477	POLYGON ((6.1 53.5, 6.9 53.5, 7.1 53.1, 6.8 52...

In this case the (Geo)DataFrame will be using the geospatial data format GeoJSON as interchange data format for usage in vega, which in this case looks as follow:

{  
    "type":"FeatureCollection",
    "crs":{  
       "type":"name",
       "properties":{  
          "name":"urn:ogc:def:crs:OGC:1.3:CRS84"
       }
    },
    "features":[  
       { "type": "Feature", "properties": { "country": "Belgium", "gdp": 389300.0, "population": 10414336.0, "reg_veh_per_1000_inh": 508 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 3.3, 51.3 ], [ 4.0, 51.3 ], [ 5.0, 51.5 ], [ 5.6, 51.0 ], [ 6.2, 50.8 ], [ 6.0, 50.1 ], [ 5.8, 50.1 ], [ 5.7, 49.5 ], [ 4.8, 50.0 ], [ 4.3, 49.9 ], [ 3.6, 50.4 ], [ 3.1, 50.8 ], [ 2.7, 50.8 ], [ 2.5, 51.1 ], [ 3.3, 51.3 ] ] ] } },
       { "type": "Feature", "properties": { "country": "Luxembourg", "gdp": 39370.0, "population": 491775.0, "reg_veh_per_1000_inh": 678 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.0, 50.1 ], [ 6.2, 49.9 ], [ 6.2, 49.5 ], [ 5.9, 49.4 ], [ 5.7, 49.5 ], [ 5.8, 50.1 ], [ 6.0, 50.1 ] ] ] } },
       { "type": "Feature", "properties": { "country": "Netherlands", "gdp": 672000.0, "population": 16715999.0, "reg_veh_per_1000_inh": 477 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.1, 53.5 ], [ 6.9, 53.5 ], [ 7.1, 53.1 ], [ 6.8, 52.2 ], [ 6.6, 51.9 ], [ 6.0, 51.9 ], [ 6.2, 50.8 ], [ 5.6, 51.0 ], [ 5.0, 51.5 ], [ 4.0, 51.3 ], [ 3.3, 51.3 ], [ 3.8, 51.6 ], [ 4.7, 53.1 ], [ 6.1, 53.5 ] ] ] } }
    ]
 }

and you feel it coming… now to create the same chart, parts of the vega specification has to be defined as follow:

...
"encode": {
  "enter": {
    "x": {
      "scale": "xscale",
      "field": "properties.country"
    },
    "y": {
      "scale": "yscale",
      "field": "properties.population"
    },
    "fill": {
      "scale": "color",
      "field": "properties.reg_veh_per_1000_inh"
    }
...

if "format": {"type": "json", "property": "features"} is set in the data property.

Without knowing the underlying difference between row-oriented JSON and GeoJSON this is very confusing. Especially if there was NO usage of the geoshape capabilities in Vega/Vega-lite.

@iliatimofeev came up with an interesting idea in https://github.com/altair-viz/altair/pull/818 to register the values of the member "properties" in the Feature object as top-level Foreign Members, so the array of features in the GeoJSON example becomes as follow:

[
      {
        "country": "Belgium",
        "gdp": 389300,
        "population": 10414336,
        "reg_veh_per_1000_inh": 508,
        "type": "Feature",
        "geometry": {
          "type": "Polygon",
          "coordinates": [ [ [3.3, 51.3], [4, 51.3], [5, 51.5], [5.6, 51], [6.2, 50.8], [6, 50.1], [5.8, 50.1], [5.7, 49.5], [4.8, 50], [4.3, 49.9], [3.6, 50.4], [3.1, 50.8], [2.7, 50.8], [2.5, 51.1], [3.3, 51.3] ] ]
        }
      },
      {
        "country": "Luxembourg",
        "gdp": 39370,
        "population": 491775,
        "reg_veh_per_1000_inh": 678,
        "type": "Feature",
        "geometry": {
          "type": "Polygon",
          "coordinates": [ [ [6, 50.1], [6.2, 49.9], [6.2, 49.5], [5.9, 49.4], [5.7, 49.5], [5.8, 50.1], [6, 50.1] ] ]
        }
      },
      {
        "country": "Netherlands",
        "gdp": 672000,
        "population": 16715999,
        "reg_veh_per_1000_inh": 477,
        "type": "Feature",
        "geometry": {
          "type": "Polygon",
          "coordinates": [ [ [6.1, 53.5], [6.9, 53.5], [7.1, 53.1], [6.8, 52.2], [6.6, 51.9], [6, 51.9], [6.2, 50.8], [5.6, 51], [4, 51.3], [3.3, 51.3], [3.8, 51.6], [4.7, 53.1], [6.1, 53.5] ] ]
        }
      }
]

While I think this is a great (mis?)usage of GeoJSON, I still hope we can use conventional GeoJSON as a data interchange format and somehow get this data transform within Vega and Vega-lite.

I didn’t knew we could use nested field names in the project transform so given standard GeoJSON as input data we can register the object values within the "properties" member as "foreign" member as such:

{
  "$schema": "https://vega.github.io/schema/vega/v4.json",
  "width": 400,
  "height": 200,
  "padding": 5,
  "data": [
    {
      "name": "table",
      "values": {  
        "type":"FeatureCollection",
        "crs":{  
          "type":"name",
          "properties":{  
              "name":"urn:ogc:def:crs:OGC:1.3:CRS84"
          }
        },
        "features":[  
          { "type": "Feature", "properties": { "country": "Belgium", "gdp": 389300.0, "population": 10414336.0, "reg_veh_per_1000_inh": 508 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 3.3, 51.3 ], [ 4.0, 51.3 ], [ 5.0, 51.5 ], [ 5.6, 51.0 ], [ 6.2, 50.8 ], [ 6.0, 50.1 ], [ 5.8, 50.1 ], [ 5.7, 49.5 ], [ 4.8, 50.0 ], [ 4.3, 49.9 ], [ 3.6, 50.4 ], [ 3.1, 50.8 ], [ 2.7, 50.8 ], [ 2.5, 51.1 ], [ 3.3, 51.3 ] ] ] } },
          { "type": "Feature", "properties": { "country": "Luxembourg", "gdp": 39370.0, "population": 491775.0, "reg_veh_per_1000_inh": 678 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.0, 50.1 ], [ 6.2, 49.9 ], [ 6.2, 49.5 ], [ 5.9, 49.4 ], [ 5.7, 49.5 ], [ 5.8, 50.1 ], [ 6.0, 50.1 ] ] ] } },
          { "type": "Feature", "properties": { "country": "Netherlands", "gdp": 672000.0, "population": 16715999.0, "reg_veh_per_1000_inh": 477 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.1, 53.5 ], [ 6.9, 53.5 ], [ 7.1, 53.1 ], [ 6.8, 52.2 ], [ 6.6, 51.9 ], [ 6.0, 51.9 ], [ 6.2, 50.8 ], [ 5.6, 51.0 ], [ 5.0, 51.5 ], [ 4.0, 51.3 ], [ 3.3, 51.3 ], [ 3.8, 51.6 ], [ 4.7, 53.1 ], [ 6.1, 53.5 ] ] ] } }]
      },
      "format": {"type": "json", "property": "features"},
      "transform": [
        {
          "type": "project",
          "fields": ["properties.country", "properties.gdp", "properties.population", "properties.reg_veh_per_1000_inh", "type", "geometry"],
          "as": ["country", "gdp", "population", "type", "geometry"]
        }
      ]         
    }
  ]
}

While this works in Vega (awesome btw), will it require from the user that she knows upfront all nested field-names (those being used and not being used) within "properties" and project these + type and geometry.

To conclude, I wish that a certain format or transform exist that can do this automagically given GeoJSON as input data. Maybe just with "format": {"type": "geojson"} as this specific type of project transform only applies to GeoJSON structured JSON.

Issue Analytics

State:
Created 5 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

mattijncommented, Jun 28, 2018

I consciously opened this issue at the Vega repo, since it aims to improve the serializing of the GeoJSON data format¹ into the basic data model of Vega.

GeoJSON¹ is the standard data interchange format for information containing geographic features and is adopted by numerous spatial software. Improved parsing support on the Vega abstraction level within the format property will provide consistency in higher abstraction levels like Vega-lite.

Within Altair @iliatimofeev has showed that it is possible to derive a valid flat JSON table containing geographic features that might work as a data interchange format. But since the wide adoption of GeoJSON¹ standard elsewhere, I would prefer the ability to adopt this standard as data interchange format and improve the serializing on the Vega side.

Therefore I support the proposed structure of

"format": {
      "type": "geojson",
      "feature": "features", // like topojson or jpath 
      "properties": "foreign" // default value "nested"
}

Where reserved keys ("type", "bbox", "coordinates", "geometries", "geometry", "properties", "features") might remain nested.

Please feel free to hammer out this issue into a location you feel it belongs to.

¹ and its extension TopoJSON.

0reactions

iliatimofeevcommented, Jun 26, 2018

I guess that key point here is that vega interprets GeoJSON as a graphic format for describing shapes to draw. But GeoJSON becomes data interchange format for geo-related applications with additional support of shapes. Generally GeoJSON is a folder tree with some objects inside and we could be interested in displaying some data (and associated shapes) that lay deep in hierarchy that why i suggested jpath as format for "feature" property. Current implementation of "property" in json format description doesn’t not allow select nested property values of all objects in array.

Top Results From Across the Web

GeoJSON

GeoJSON is a format for encoding a variety of geographic data structures. ... GeoJSON supports the following geometry types: Point , LineString ,...

Decoding structured JSON — iOS App Dev Tutorials

You'll need to guide the decoding of GeoJSON data into Swift structures by customizing the deserialization process.

How to flatten multilevel/nested JSON? - python - Stack Overflow

Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file. I tried to use pandas...

pandas.json_normalize — pandas 1.5.2 documentation

Normalize semi-structured JSON data into a flat table. Parameters. datadict or list of dicts. Unserialized JSON objects. record_pathstr or list of str, ...

How to convert JSON to SQL - the quick and easy way ... - Blog

Herein lies the problem: SQL is written in a “flat” structure so you need to somehow turn the hierarchical JSON data into a...