flat-table structure for geojson structured json
See original GitHub issueSince #1316 is closed and I did not had the time yet to work out an real use-case example/proces for which I opened previous issue. Which is btw slightly off-topic compare to #1316.
Assuming we want to make a bar chart using using three variables (country, population and registered road vehicles per 1000 inhabitants):

And we’ve prepared our data in a pandas DataFrame within python (eventually all of this should be possible through altair/vega-lite).
This DataFrame then looks as follow:
| country | gdp | population | reg_veh_per_1000_inh | |
|---|---|---|---|---|
| 0 | Belgium | 389300.0 | 10414336.0 | 678 |
| 1 | Luxembourg | 39370.0 | 491775.0 | 508 |
| 2 | Netherlands | 672000.0 | 16715999.0 | 477 |
which is converted to row-oriented JSON for usage within vega:
[
{
"country":"Belgium",
"gdp":389300,
"population":10414336,
"reg_veh_per_1000_inh":508
},
{
"country":"Luxembourg",
"gdp":39370,
"population":491775,
"reg_veh_per_1000_inh":678
},
{
"country":"Netherlands",
"gdp":672000,
"population":16715999,
"reg_veh_per_1000_inh":477
}
]
Where part of my generated vega specification includes the following:
...
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "country"
},
"y": {
"scale": "yscale",
"field": "population"
},
"fill": {
"scale": "color",
"field": "reg_veh_per_1000_inh"
},
...
but in pandas the same DataFrame might also contain a geometry column:
| country | gdp | population | reg_veh_per_1000_inh | geometry | |
|---|---|---|---|---|---|
| 0 | Belgium | 389300.0 | 10414336.0 | 678 | POLYGON ((3.3 51.3, 4 51.3, 5 51.5, 5.6 51, 6.... |
| 1 | Luxembourg | 39370.0 | 491775.0 | 508 | POLYGON ((6 50.1, 6.2 49.9, 6.2 49.5, 5.9 49.4... |
| 2 | Netherlands | 672000.0 | 16715999.0 | 477 | POLYGON ((6.1 53.5, 6.9 53.5, 7.1 53.1, 6.8 52... |
In this case the (Geo)DataFrame will be using the geospatial data format GeoJSON as interchange data format for usage in vega, which in this case looks as follow:
{
"type":"FeatureCollection",
"crs":{
"type":"name",
"properties":{
"name":"urn:ogc:def:crs:OGC:1.3:CRS84"
}
},
"features":[
{ "type": "Feature", "properties": { "country": "Belgium", "gdp": 389300.0, "population": 10414336.0, "reg_veh_per_1000_inh": 508 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 3.3, 51.3 ], [ 4.0, 51.3 ], [ 5.0, 51.5 ], [ 5.6, 51.0 ], [ 6.2, 50.8 ], [ 6.0, 50.1 ], [ 5.8, 50.1 ], [ 5.7, 49.5 ], [ 4.8, 50.0 ], [ 4.3, 49.9 ], [ 3.6, 50.4 ], [ 3.1, 50.8 ], [ 2.7, 50.8 ], [ 2.5, 51.1 ], [ 3.3, 51.3 ] ] ] } },
{ "type": "Feature", "properties": { "country": "Luxembourg", "gdp": 39370.0, "population": 491775.0, "reg_veh_per_1000_inh": 678 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.0, 50.1 ], [ 6.2, 49.9 ], [ 6.2, 49.5 ], [ 5.9, 49.4 ], [ 5.7, 49.5 ], [ 5.8, 50.1 ], [ 6.0, 50.1 ] ] ] } },
{ "type": "Feature", "properties": { "country": "Netherlands", "gdp": 672000.0, "population": 16715999.0, "reg_veh_per_1000_inh": 477 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.1, 53.5 ], [ 6.9, 53.5 ], [ 7.1, 53.1 ], [ 6.8, 52.2 ], [ 6.6, 51.9 ], [ 6.0, 51.9 ], [ 6.2, 50.8 ], [ 5.6, 51.0 ], [ 5.0, 51.5 ], [ 4.0, 51.3 ], [ 3.3, 51.3 ], [ 3.8, 51.6 ], [ 4.7, 53.1 ], [ 6.1, 53.5 ] ] ] } }
]
}
and you feel it coming… now to create the same chart, parts of the vega specification has to be defined as follow:
...
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "properties.country"
},
"y": {
"scale": "yscale",
"field": "properties.population"
},
"fill": {
"scale": "color",
"field": "properties.reg_veh_per_1000_inh"
}
...
if "format": {"type": "json", "property": "features"} is set in the data property.
Without knowing the underlying difference between row-oriented JSON and GeoJSON this is very confusing. Especially if there was NO usage of the geoshape capabilities in Vega/Vega-lite.
@iliatimofeev came up with an interesting idea in https://github.com/altair-viz/altair/pull/818 to register the values of the member "properties" in the Feature object as top-level Foreign Members, so the array of features in the GeoJSON example becomes as follow:
[
{
"country": "Belgium",
"gdp": 389300,
"population": 10414336,
"reg_veh_per_1000_inh": 508,
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [ [ [3.3, 51.3], [4, 51.3], [5, 51.5], [5.6, 51], [6.2, 50.8], [6, 50.1], [5.8, 50.1], [5.7, 49.5], [4.8, 50], [4.3, 49.9], [3.6, 50.4], [3.1, 50.8], [2.7, 50.8], [2.5, 51.1], [3.3, 51.3] ] ]
}
},
{
"country": "Luxembourg",
"gdp": 39370,
"population": 491775,
"reg_veh_per_1000_inh": 678,
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [ [ [6, 50.1], [6.2, 49.9], [6.2, 49.5], [5.9, 49.4], [5.7, 49.5], [5.8, 50.1], [6, 50.1] ] ]
}
},
{
"country": "Netherlands",
"gdp": 672000,
"population": 16715999,
"reg_veh_per_1000_inh": 477,
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [ [ [6.1, 53.5], [6.9, 53.5], [7.1, 53.1], [6.8, 52.2], [6.6, 51.9], [6, 51.9], [6.2, 50.8], [5.6, 51], [4, 51.3], [3.3, 51.3], [3.8, 51.6], [4.7, 53.1], [6.1, 53.5] ] ]
}
}
]
While I think this is a great (mis?)usage of GeoJSON, I still hope we can use conventional GeoJSON as a data interchange format and somehow get this data transform within Vega and Vega-lite.
I didn’t knew we could use nested field names in the project transform so given standard GeoJSON as input data we can register the object values within the "properties" member as "foreign" member as such:
{
"$schema": "https://vega.github.io/schema/vega/v4.json",
"width": 400,
"height": 200,
"padding": 5,
"data": [
{
"name": "table",
"values": {
"type":"FeatureCollection",
"crs":{
"type":"name",
"properties":{
"name":"urn:ogc:def:crs:OGC:1.3:CRS84"
}
},
"features":[
{ "type": "Feature", "properties": { "country": "Belgium", "gdp": 389300.0, "population": 10414336.0, "reg_veh_per_1000_inh": 508 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 3.3, 51.3 ], [ 4.0, 51.3 ], [ 5.0, 51.5 ], [ 5.6, 51.0 ], [ 6.2, 50.8 ], [ 6.0, 50.1 ], [ 5.8, 50.1 ], [ 5.7, 49.5 ], [ 4.8, 50.0 ], [ 4.3, 49.9 ], [ 3.6, 50.4 ], [ 3.1, 50.8 ], [ 2.7, 50.8 ], [ 2.5, 51.1 ], [ 3.3, 51.3 ] ] ] } },
{ "type": "Feature", "properties": { "country": "Luxembourg", "gdp": 39370.0, "population": 491775.0, "reg_veh_per_1000_inh": 678 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.0, 50.1 ], [ 6.2, 49.9 ], [ 6.2, 49.5 ], [ 5.9, 49.4 ], [ 5.7, 49.5 ], [ 5.8, 50.1 ], [ 6.0, 50.1 ] ] ] } },
{ "type": "Feature", "properties": { "country": "Netherlands", "gdp": 672000.0, "population": 16715999.0, "reg_veh_per_1000_inh": 477 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.1, 53.5 ], [ 6.9, 53.5 ], [ 7.1, 53.1 ], [ 6.8, 52.2 ], [ 6.6, 51.9 ], [ 6.0, 51.9 ], [ 6.2, 50.8 ], [ 5.6, 51.0 ], [ 5.0, 51.5 ], [ 4.0, 51.3 ], [ 3.3, 51.3 ], [ 3.8, 51.6 ], [ 4.7, 53.1 ], [ 6.1, 53.5 ] ] ] } }]
},
"format": {"type": "json", "property": "features"},
"transform": [
{
"type": "project",
"fields": ["properties.country", "properties.gdp", "properties.population", "properties.reg_veh_per_1000_inh", "type", "geometry"],
"as": ["country", "gdp", "population", "type", "geometry"]
}
]
}
]
}
While this works in Vega (awesome btw), will it require from the user that she knows upfront all nested field-names (those being used and not being used) within "properties" and project these + type and geometry.
To conclude, I wish that a certain format or transform exist that can do this automagically given GeoJSON as input data. Maybe just with "format": {"type": "geojson"} as this specific type of project transform only applies to GeoJSON structured JSON.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)

Top Related StackOverflow Question
I consciously opened this issue at the Vega repo, since it aims to improve the serializing of the GeoJSON data format¹ into the basic data model of Vega.
GeoJSON¹ is the standard data interchange format for information containing geographic features and is adopted by numerous spatial software. Improved parsing support on the Vega abstraction level within the
formatproperty will provide consistency in higher abstraction levels like Vega-lite.Within Altair @iliatimofeev has showed that it is possible to derive a valid flat JSON table containing geographic features that might work as a data interchange format. But since the wide adoption of GeoJSON¹ standard elsewhere, I would prefer the ability to adopt this standard as data interchange format and improve the serializing on the Vega side.
Therefore I support the proposed structure of
Where reserved keys (
"type", "bbox", "coordinates", "geometries", "geometry", "properties", "features") might remain nested.Please feel free to hammer out this issue into a location you feel it belongs to.
¹ and its extension TopoJSON.
I guess that key point here is that vega interprets GeoJSON as a graphic format for describing shapes to draw. But GeoJSON becomes data interchange format for geo-related applications with additional support of shapes. Generally GeoJSON is a folder tree with some objects inside and we could be interested in displaying some data (and associated shapes) that lay deep in hierarchy that why i suggested
jpathas format for"feature"property. Current implementation of"property"in json format description doesn’t not allow select nested property values of all objects in array.