flat-table structure for geojson structured json
See original GitHub issueSince #1316 is closed and I did not had the time yet to work out an real use-case example/proces for which I opened previous issue. Which is btw slightly off-topic compare to #1316.
Assuming we want to make a bar chart using using three variables (country
, population
and registered road vehicles per 1000 inhabitants
):
And we’ve prepared our data in a pandas
DataFrame
within python
(eventually all of this should be possible through altair
/vega-lite
).
This DataFrame
then looks as follow:
country | gdp | population | reg_veh_per_1000_inh | |
---|---|---|---|---|
0 | Belgium | 389300.0 | 10414336.0 | 678 |
1 | Luxembourg | 39370.0 | 491775.0 | 508 |
2 | Netherlands | 672000.0 | 16715999.0 | 477 |
which is converted to row-oriented JSON
for usage within vega
:
[
{
"country":"Belgium",
"gdp":389300,
"population":10414336,
"reg_veh_per_1000_inh":508
},
{
"country":"Luxembourg",
"gdp":39370,
"population":491775,
"reg_veh_per_1000_inh":678
},
{
"country":"Netherlands",
"gdp":672000,
"population":16715999,
"reg_veh_per_1000_inh":477
}
]
Where part of my generated vega
specification includes the following:
...
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "country"
},
"y": {
"scale": "yscale",
"field": "population"
},
"fill": {
"scale": "color",
"field": "reg_veh_per_1000_inh"
},
...
but in pandas
the same DataFrame
might also contain a geometry
column:
country | gdp | population | reg_veh_per_1000_inh | geometry | |
---|---|---|---|---|---|
0 | Belgium | 389300.0 | 10414336.0 | 678 | POLYGON ((3.3 51.3, 4 51.3, 5 51.5, 5.6 51, 6.... |
1 | Luxembourg | 39370.0 | 491775.0 | 508 | POLYGON ((6 50.1, 6.2 49.9, 6.2 49.5, 5.9 49.4... |
2 | Netherlands | 672000.0 | 16715999.0 | 477 | POLYGON ((6.1 53.5, 6.9 53.5, 7.1 53.1, 6.8 52... |
In this case the (Geo
)DataFrame
will be using the geospatial data format GeoJSON
as interchange data format for usage in vega
, which in this case looks as follow:
{
"type":"FeatureCollection",
"crs":{
"type":"name",
"properties":{
"name":"urn:ogc:def:crs:OGC:1.3:CRS84"
}
},
"features":[
{ "type": "Feature", "properties": { "country": "Belgium", "gdp": 389300.0, "population": 10414336.0, "reg_veh_per_1000_inh": 508 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 3.3, 51.3 ], [ 4.0, 51.3 ], [ 5.0, 51.5 ], [ 5.6, 51.0 ], [ 6.2, 50.8 ], [ 6.0, 50.1 ], [ 5.8, 50.1 ], [ 5.7, 49.5 ], [ 4.8, 50.0 ], [ 4.3, 49.9 ], [ 3.6, 50.4 ], [ 3.1, 50.8 ], [ 2.7, 50.8 ], [ 2.5, 51.1 ], [ 3.3, 51.3 ] ] ] } },
{ "type": "Feature", "properties": { "country": "Luxembourg", "gdp": 39370.0, "population": 491775.0, "reg_veh_per_1000_inh": 678 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.0, 50.1 ], [ 6.2, 49.9 ], [ 6.2, 49.5 ], [ 5.9, 49.4 ], [ 5.7, 49.5 ], [ 5.8, 50.1 ], [ 6.0, 50.1 ] ] ] } },
{ "type": "Feature", "properties": { "country": "Netherlands", "gdp": 672000.0, "population": 16715999.0, "reg_veh_per_1000_inh": 477 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.1, 53.5 ], [ 6.9, 53.5 ], [ 7.1, 53.1 ], [ 6.8, 52.2 ], [ 6.6, 51.9 ], [ 6.0, 51.9 ], [ 6.2, 50.8 ], [ 5.6, 51.0 ], [ 5.0, 51.5 ], [ 4.0, 51.3 ], [ 3.3, 51.3 ], [ 3.8, 51.6 ], [ 4.7, 53.1 ], [ 6.1, 53.5 ] ] ] } }
]
}
and you feel it coming… now to create the same chart, parts of the vega
specification has to be defined as follow:
...
"encode": {
"enter": {
"x": {
"scale": "xscale",
"field": "properties.country"
},
"y": {
"scale": "yscale",
"field": "properties.population"
},
"fill": {
"scale": "color",
"field": "properties.reg_veh_per_1000_inh"
}
...
if "format": {"type": "json", "property": "features"}
is set in the data
property.
Without knowing the underlying difference between row-oriented JSON
and GeoJSON
this is very confusing. Especially if there was NO usage of the geoshape
capabilities in Vega
/Vega-lite
.
@iliatimofeev came up with an interesting idea in https://github.com/altair-viz/altair/pull/818 to register the values of the member "properties"
in the Feature object
as top-level Foreign Members
, so the array of features
in the GeoJSON
example becomes as follow:
[
{
"country": "Belgium",
"gdp": 389300,
"population": 10414336,
"reg_veh_per_1000_inh": 508,
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [ [ [3.3, 51.3], [4, 51.3], [5, 51.5], [5.6, 51], [6.2, 50.8], [6, 50.1], [5.8, 50.1], [5.7, 49.5], [4.8, 50], [4.3, 49.9], [3.6, 50.4], [3.1, 50.8], [2.7, 50.8], [2.5, 51.1], [3.3, 51.3] ] ]
}
},
{
"country": "Luxembourg",
"gdp": 39370,
"population": 491775,
"reg_veh_per_1000_inh": 678,
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [ [ [6, 50.1], [6.2, 49.9], [6.2, 49.5], [5.9, 49.4], [5.7, 49.5], [5.8, 50.1], [6, 50.1] ] ]
}
},
{
"country": "Netherlands",
"gdp": 672000,
"population": 16715999,
"reg_veh_per_1000_inh": 477,
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [ [ [6.1, 53.5], [6.9, 53.5], [7.1, 53.1], [6.8, 52.2], [6.6, 51.9], [6, 51.9], [6.2, 50.8], [5.6, 51], [4, 51.3], [3.3, 51.3], [3.8, 51.6], [4.7, 53.1], [6.1, 53.5] ] ]
}
}
]
While I think this is a great (mis?)usage of GeoJSON
, I still hope we can use conventional GeoJSON
as a data interchange format and somehow get this data transform
within Vega
and Vega-lite
.
I didn’t knew we could use nested field names in the project
transform
so given standard GeoJSON
as input data we can register the object values within the "properties"
member as "foreign"
member as such:
{
"$schema": "https://vega.github.io/schema/vega/v4.json",
"width": 400,
"height": 200,
"padding": 5,
"data": [
{
"name": "table",
"values": {
"type":"FeatureCollection",
"crs":{
"type":"name",
"properties":{
"name":"urn:ogc:def:crs:OGC:1.3:CRS84"
}
},
"features":[
{ "type": "Feature", "properties": { "country": "Belgium", "gdp": 389300.0, "population": 10414336.0, "reg_veh_per_1000_inh": 508 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 3.3, 51.3 ], [ 4.0, 51.3 ], [ 5.0, 51.5 ], [ 5.6, 51.0 ], [ 6.2, 50.8 ], [ 6.0, 50.1 ], [ 5.8, 50.1 ], [ 5.7, 49.5 ], [ 4.8, 50.0 ], [ 4.3, 49.9 ], [ 3.6, 50.4 ], [ 3.1, 50.8 ], [ 2.7, 50.8 ], [ 2.5, 51.1 ], [ 3.3, 51.3 ] ] ] } },
{ "type": "Feature", "properties": { "country": "Luxembourg", "gdp": 39370.0, "population": 491775.0, "reg_veh_per_1000_inh": 678 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.0, 50.1 ], [ 6.2, 49.9 ], [ 6.2, 49.5 ], [ 5.9, 49.4 ], [ 5.7, 49.5 ], [ 5.8, 50.1 ], [ 6.0, 50.1 ] ] ] } },
{ "type": "Feature", "properties": { "country": "Netherlands", "gdp": 672000.0, "population": 16715999.0, "reg_veh_per_1000_inh": 477 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 6.1, 53.5 ], [ 6.9, 53.5 ], [ 7.1, 53.1 ], [ 6.8, 52.2 ], [ 6.6, 51.9 ], [ 6.0, 51.9 ], [ 6.2, 50.8 ], [ 5.6, 51.0 ], [ 5.0, 51.5 ], [ 4.0, 51.3 ], [ 3.3, 51.3 ], [ 3.8, 51.6 ], [ 4.7, 53.1 ], [ 6.1, 53.5 ] ] ] } }]
},
"format": {"type": "json", "property": "features"},
"transform": [
{
"type": "project",
"fields": ["properties.country", "properties.gdp", "properties.population", "properties.reg_veh_per_1000_inh", "type", "geometry"],
"as": ["country", "gdp", "population", "type", "geometry"]
}
]
}
]
}
While this works in Vega
(awesome btw), will it require from the user that she knows upfront all nested field-names (those being used and not being used) within "properties"
and project these + type
and geometry
.
To conclude, I wish that a certain format
or transform
exist that can do this automagically given GeoJSON
as input data. Maybe just with "format": {"type": "geojson"}
as this specific type of project
transform
only applies to GeoJSON
structured JSON
.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
I consciously opened this issue at the Vega repo, since it aims to improve the serializing of the GeoJSON data format¹ into the basic data model of Vega.
GeoJSON¹ is the standard data interchange format for information containing geographic features and is adopted by numerous spatial software. Improved parsing support on the Vega abstraction level within the
format
property will provide consistency in higher abstraction levels like Vega-lite.Within Altair @iliatimofeev has showed that it is possible to derive a valid flat JSON table containing geographic features that might work as a data interchange format. But since the wide adoption of GeoJSON¹ standard elsewhere, I would prefer the ability to adopt this standard as data interchange format and improve the serializing on the Vega side.
Therefore I support the proposed structure of
Where reserved keys (
"type", "bbox", "coordinates", "geometries", "geometry", "properties", "features"
) might remain nested.Please feel free to hammer out this issue into a location you feel it belongs to.
¹ and its extension TopoJSON.
I guess that key point here is that vega interprets GeoJSON as a graphic format for describing shapes to draw. But GeoJSON becomes data interchange format for geo-related applications with additional support of shapes. Generally GeoJSON is a folder tree with some objects inside and we could be interested in displaying some data (and associated shapes) that lay deep in hierarchy that why i suggested
jpath
as format for"feature"
property. Current implementation of"property"
in json format description doesn’t not allow select nested property values of all objects in array.