question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

flat-table structure for geojson structured json

See original GitHub issue

Since #1316 is closed and I did not had the time yet to work out an real use-case example/proces for which I opened previous issue. Which is btw slightly off-topic compare to #1316.

Assuming we want to make a bar chart using using three variables (country, population and registered road vehicles per 1000 inhabitants): complicated_chart

And we’ve prepared our data in a pandas DataFrame within python (eventually all of this should be possible through altair/vega-lite). This DataFrame then looks as follow:

country gdp population reg_veh_per_1000_inh
0 Belgium 389300.0 10414336.0 678
1 Luxembourg 39370.0 491775.0 508
2 Netherlands 672000.0 16715999.0 477

which is converted to row-oriented JSON for usage within vega:

[  
   {  
      "country":"Belgium",
      "gdp":389300,
      "population":10414336,
      "reg_veh_per_1000_inh":508
   },
   {  
      "country":"Luxembourg",
      "gdp":39370,
      "population":491775,
      "reg_veh_per_1000_inh":678
   },
   {  
      "country":"Netherlands",
      "gdp":672000,
      "population":16715999,
      "reg_veh_per_1000_inh":477
   }
]

Where part of my generated vega specification includes the following:

...
"encode": {
  "enter": {
    "x": {
      "scale": "xscale",
      "field": "country"
    },
    "y": {
      "scale": "yscale",
      "field": "population"
    },   
    "fill": {
      "scale": "color",
      "field": "reg_veh_per_1000_inh"
    },
...

but in pandas the same DataFrame might also contain a geometry column:

country gdp population reg_veh_per_1000_inh geometry
0 Belgium 389300.0 10414336.0 678 POLYGON ((3.3 51.3, 4 51.3, 5 51.5, 5.6 51, 6....
1 Luxembourg 39370.0 491775.0 508 POLYGON ((6 50.1, 6.2 49.9, 6.2 49.5, 5.9 49.4...
2 Netherlands 672000.0 16715999.0 477 POLYGON ((6.1 53.5, 6.9 53.5, 7.1 53.1, 6.8 52...

In this case the (Geo)DataFrame will be using the geospatial data format GeoJSON as interchange data format for usage in vega, which in this case looks as follow:

{  
    "type":"FeatureCollection",
    "crs":{  
       "type":"name",
       "properties":{  
          "name":"urn:ogc:def:crs:OGC:1.3:CRS84"
       }
    },
    "features":[  
       { "type": "Feature", "properties": { "country": "Belgium", "gdp": 389300.0, "population": 10414336.0, "reg_veh_per_1000_inh": 508 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 3.3, 51.3 ], [ 4.0, 51.3 ], [ 5.0, 51.5 ], [ 5.6, 51.0 ], [ 6.2, 50.8 ], [ 6.0, 50.1 ], [ 5.8, 50.1 ], [ 5.7, 49.5 ], [ 4.8, 50.0 ], [ 4.3, 49.9 ], [ 3.6, 50.4 ], [ 3.1, 50.8 ], [ 2.7, 50.8 ], [ 2.5, 51.1 ], [ 3.3, 51.3 ] ] ] } },
       { "type": "Feature", "properties": { "country": "Luxembourg", "gdp": 39370.0, "population": 491775.0, "reg_veh_per_1000_inh": 678 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.0, 50.1 ], [ 6.2, 49.9 ], [ 6.2, 49.5 ], [ 5.9, 49.4 ], [ 5.7, 49.5 ], [ 5.8, 50.1 ], [ 6.0, 50.1 ] ] ] } },
       { "type": "Feature", "properties": { "country": "Netherlands", "gdp": 672000.0, "population": 16715999.0, "reg_veh_per_1000_inh": 477 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.1, 53.5 ], [ 6.9, 53.5 ], [ 7.1, 53.1 ], [ 6.8, 52.2 ], [ 6.6, 51.9 ], [ 6.0, 51.9 ], [ 6.2, 50.8 ], [ 5.6, 51.0 ], [ 5.0, 51.5 ], [ 4.0, 51.3 ], [ 3.3, 51.3 ], [ 3.8, 51.6 ], [ 4.7, 53.1 ], [ 6.1, 53.5 ] ] ] } }
    ]
 }

and you feel it coming… now to create the same chart, parts of the vega specification has to be defined as follow:

...
"encode": {
  "enter": {
    "x": {
      "scale": "xscale",
      "field": "properties.country"
    },
    "y": {
      "scale": "yscale",
      "field": "properties.population"
    },
    "fill": {
      "scale": "color",
      "field": "properties.reg_veh_per_1000_inh"
    }
...

if "format": {"type": "json", "property": "features"} is set in the data property.

Without knowing the underlying difference between row-oriented JSON and GeoJSON this is very confusing. Especially if there was NO usage of the geoshape capabilities in Vega/Vega-lite.

@iliatimofeev came up with an interesting idea in https://github.com/altair-viz/altair/pull/818 to register the values of the member "properties" in the Feature object as top-level Foreign Members, so the array of features in the GeoJSON example becomes as follow:

[
      {
        "country": "Belgium",
        "gdp": 389300,
        "population": 10414336,
        "reg_veh_per_1000_inh": 508,
        "type": "Feature",
        "geometry": {
          "type": "Polygon",
          "coordinates": [ [ [3.3, 51.3], [4, 51.3], [5, 51.5], [5.6, 51], [6.2, 50.8], [6, 50.1], [5.8, 50.1], [5.7, 49.5], [4.8, 50], [4.3, 49.9], [3.6, 50.4], [3.1, 50.8], [2.7, 50.8], [2.5, 51.1], [3.3, 51.3] ] ]
        }
      },
      {
        "country": "Luxembourg",
        "gdp": 39370,
        "population": 491775,
        "reg_veh_per_1000_inh": 678,
        "type": "Feature",
        "geometry": {
          "type": "Polygon",
          "coordinates": [ [ [6, 50.1], [6.2, 49.9], [6.2, 49.5], [5.9, 49.4], [5.7, 49.5], [5.8, 50.1], [6, 50.1] ] ]
        }
      },
      {
        "country": "Netherlands",
        "gdp": 672000,
        "population": 16715999,
        "reg_veh_per_1000_inh": 477,
        "type": "Feature",
        "geometry": {
          "type": "Polygon",
          "coordinates": [ [ [6.1, 53.5], [6.9, 53.5], [7.1, 53.1], [6.8, 52.2], [6.6, 51.9], [6, 51.9], [6.2, 50.8], [5.6, 51], [4, 51.3], [3.3, 51.3], [3.8, 51.6], [4.7, 53.1], [6.1, 53.5] ] ]
        }
      }
]

While I think this is a great (mis?)usage of GeoJSON, I still hope we can use conventional GeoJSON as a data interchange format and somehow get this data transform within Vega and Vega-lite.

I didn’t knew we could use nested field names in the project transform so given standard GeoJSON as input data we can register the object values within the "properties" member as "foreign" member as such:

{
  "$schema": "https://vega.github.io/schema/vega/v4.json",
  "width": 400,
  "height": 200,
  "padding": 5,
  "data": [
    {
      "name": "table",
      "values": {  
        "type":"FeatureCollection",
        "crs":{  
          "type":"name",
          "properties":{  
              "name":"urn:ogc:def:crs:OGC:1.3:CRS84"
          }
        },
        "features":[  
          { "type": "Feature", "properties": { "country": "Belgium", "gdp": 389300.0, "population": 10414336.0, "reg_veh_per_1000_inh": 508 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 3.3, 51.3 ], [ 4.0, 51.3 ], [ 5.0, 51.5 ], [ 5.6, 51.0 ], [ 6.2, 50.8 ], [ 6.0, 50.1 ], [ 5.8, 50.1 ], [ 5.7, 49.5 ], [ 4.8, 50.0 ], [ 4.3, 49.9 ], [ 3.6, 50.4 ], [ 3.1, 50.8 ], [ 2.7, 50.8 ], [ 2.5, 51.1 ], [ 3.3, 51.3 ] ] ] } },
          { "type": "Feature", "properties": { "country": "Luxembourg", "gdp": 39370.0, "population": 491775.0, "reg_veh_per_1000_inh": 678 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.0, 50.1 ], [ 6.2, 49.9 ], [ 6.2, 49.5 ], [ 5.9, 49.4 ], [ 5.7, 49.5 ], [ 5.8, 50.1 ], [ 6.0, 50.1 ] ] ] } },
          { "type": "Feature", "properties": { "country": "Netherlands", "gdp": 672000.0, "population": 16715999.0, "reg_veh_per_1000_inh": 477 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.1, 53.5 ], [ 6.9, 53.5 ], [ 7.1, 53.1 ], [ 6.8, 52.2 ], [ 6.6, 51.9 ], [ 6.0, 51.9 ], [ 6.2, 50.8 ], [ 5.6, 51.0 ], [ 5.0, 51.5 ], [ 4.0, 51.3 ], [ 3.3, 51.3 ], [ 3.8, 51.6 ], [ 4.7, 53.1 ], [ 6.1, 53.5 ] ] ] } }]
      },
      "format": {"type": "json", "property": "features"},
      "transform": [
        {
          "type": "project",
          "fields": ["properties.country", "properties.gdp", "properties.population", "properties.reg_veh_per_1000_inh", "type", "geometry"],
          "as": ["country", "gdp", "population", "type", "geometry"]
        }
      ]         
    }
  ]
}

While this works in Vega (awesome btw), will it require from the user that she knows upfront all nested field-names (those being used and not being used) within "properties" and project these + type and geometry.

To conclude, I wish that a certain format or transform exist that can do this automagically given GeoJSON as input data. Maybe just with "format": {"type": "geojson"} as this specific type of project transform only applies to GeoJSON structured JSON.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
mattijncommented, Jun 28, 2018

I consciously opened this issue at the Vega repo, since it aims to improve the serializing of the GeoJSON data format¹ into the basic data model of Vega.

GeoJSON¹ is the standard data interchange format for information containing geographic features and is adopted by numerous spatial software. Improved parsing support on the Vega abstraction level within the format property will provide consistency in higher abstraction levels like Vega-lite.

Within Altair @iliatimofeev has showed that it is possible to derive a valid flat JSON table containing geographic features that might work as a data interchange format. But since the wide adoption of GeoJSON¹ standard elsewhere, I would prefer the ability to adopt this standard as data interchange format and improve the serializing on the Vega side.

Therefore I support the proposed structure of

"format": {
      "type": "geojson",
      "feature": "features", // like topojson or jpath 
      "properties": "foreign" // default value "nested"
}

Where reserved keys ("type", "bbox", "coordinates", "geometries", "geometry", "properties", "features") might remain nested.

Please feel free to hammer out this issue into a location you feel it belongs to.

¹ and its extension TopoJSON.

0reactions
iliatimofeevcommented, Jun 26, 2018

I guess that key point here is that vega interprets GeoJSON as a graphic format for describing shapes to draw. But GeoJSON becomes data interchange format for geo-related applications with additional support of shapes. Generally GeoJSON is a folder tree with some objects inside and we could be interested in displaying some data (and associated shapes) that lay deep in hierarchy that why i suggested jpath as format for "feature" property. Current implementation of "property" in json format description doesn’t not allow select nested property values of all objects in array.

Read more comments on GitHub >

github_iconTop Results From Across the Web

GeoJSON
GeoJSON is a format for encoding a variety of geographic data structures. ... GeoJSON supports the following geometry types: Point , LineString ,...
Read more >
Decoding structured JSON — iOS App Dev Tutorials
You'll need to guide the decoding of GeoJSON data into Swift structures by customizing the deserialization process.
Read more >
How to flatten multilevel/nested JSON? - python - Stack Overflow
Issue with my structure is that I have quite some nested dict/lists when I convert my JSON file. I tried to use pandas...
Read more >
pandas.json_normalize — pandas 1.5.2 documentation
Normalize semi-structured JSON data into a flat table. Parameters. datadict or list of dicts. Unserialized JSON objects. record_pathstr or list of str, ...
Read more >
How to convert JSON to SQL - the quick and easy way ... - Blog
Herein lies the problem: SQL is written in a “flat” structure so you need to somehow turn the hierarchical JSON data into a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found