question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Loading in geojson through read_file misses certain entries in the output GDF

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of geopandas.

  • (optional) I have confirmed this bug exists on the master branch of geopandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example


import json
## JSON Version 2.0.9 
import geopandas as gpd
import urllib


## I have uploaded a file with 65 entries 
url = "https://github.com/FDenker/GeoPandas-Geojson-Issue/raw/main/geopandas_not_found.geojson"


## This is the heart of the issue
## This is not due to the download (this gives the same result locally)
## This will be empty
empty_gdf=gpd.read_file(url)


## However, if we load it in 
file = urllib.request.urlopen(url)
loaded_json = json.load(file)
## This returns a GeoDataFrame with the right information
correct_gdf=gpd.GeoDataFrame.from_features(loaded_json['features'])

Problem description

When reading in a specific kind of GeoJSON (output of an osmium-tool export to be exact) the read_file function skips over specific elements. However, it does not return an error but rather an empty GeoDataFrame. It is important to mention that this only occurs for a low number of entries and the GeoJSON I have linked above only includes entries in which the read_file function does not work. When I normally import GeoJSON files that are exported from the osmium-tool about 99 % of the entries are reflected in the GeoDataFrame.

At the same time, if I load in the GeoJSON as simple JSON and then pass the ‘features’ to the from_features function it returns proper GeoDataFrame with all the data that is in the GeoJSON.

The error persists both on my local windows machine (running python 3.8.3) and on an Ubuntu 18.04 machine (running 3.7.10 and the GitHub version of the geopandas). I have therefore also posted both system info below.

Expected Output

GeoDataFrame with 65 rows containing attributes and valid geometries.

Output of geopandas.show_versions()

Windows machine:

SYSTEM INFO

python : 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)] executable : C:\Users$USER\anaconda3\python.exe machine : Windows-10-10.0.21390-SP0

GEOS, GDAL, PROJ INFO

GEOS : None GEOS lib : None GDAL : 3.3.0 GDAL data dir: None PROJ : 7.2.1 PROJ data dir: C:\Users$USER\anaconda3\lib\site-packages\pyproj\proj_dir\share\proj

PYTHON DEPENDENCIES

geopandas : 0.9.0 pandas : 1.0.5 fiona : 1.8.20 numpy : 1.18.5 shapely : 1.7.1 rtree : 0.9.4 pyproj : 3.1.0 matplotlib : None mapclassify: None geopy : None psycopg2 : 2.8.6 (dt dec pq3 ext lo64) geoalchemy2: None pyarrow : 2.0.0 pygeos : 0.10

Linux machine:

SYSTEM INFO

python : 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] executable : /opt/tljh/user/bin/python machine : Linux-4.15.0-143-generic-x86_64-with-debian-buster-sid

GEOS, GDAL, PROJ INFO

GEOS : 3.8.0 GEOS lib : /usr/lib/x86_64-linux-gnu/libgeos_c.so GDAL : 2.4.4 GDAL data dir: /opt/tljh/user/lib/python3.7/site-packages/fiona/gdal_data PROJ : 7.0.1 PROJ data dir: /opt/tljh/user/lib/python3.7/site-packages/pyproj/proj_dir/share/proj

PYTHON DEPENDENCIES

geopandas : 0.9.0+36.gcb88dd4 pandas : 0.25.3 fiona : 1.8.17 numpy : 1.19.1 shapely : 1.7.1 rtree : 0.9.7 pyproj : 2.6.1.post1 matplotlib : 3.3.2 mapclassify: None geopy : 2.1.0 psycopg2 : 2.8.6 (dt dec pq3 ext lo64) geoalchemy2: None pyarrow : 0.17.1 pygeos : 0.10

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jdmcbrcommented, Aug 8, 2021

@FDenker @nguyenlienviet This issue is arising because of using GDAL’s geojson driver to read the file (as compared to the from_features route that you showed working as expected). There’s an environment variable, OGR_GEOJSON_MAX_OBJ_SIZE, that sets the maximum size of individual features (https://gdal.org/drivers/vector/geojson.html). Some of the features in the dataset you have here are sufficiently complex that they’re bumping up against whatever that is set to on your system. I’m able to get the behavior you experience by setting that environment variable to a lower value. For you, this should work:

import geopandas as gpd 
import fiona 
url = "https://github.com/FDenker/GeoPandas-Geojson-Issue/raw/main/geopandas_not_found.geojson" 
 
with fiona.Env(OGR_GEOJSON_MAX_OBJ_SIZE=2000):  
    no_longer_empty_gdf = gpd.read_file(url)

I won’t tell you how long it took me to get to the bottom of this one. 😅

0reactions
jorisvandenbosschecommented, Aug 8, 2021

@jdmcbr Thanks a lot for getting to the bottom of this!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reading and Writing Files - GeoPandas
geopandas can read almost any vector-based spatial data format including ESRI shapefile, GeoJSON files and more using the command: geopandas.read_file().
Read more >
Problem with loading GeoJSON file in OpenLayers generated ...
The problem seems to be that the features-array in your featurecollection contains three features, two with geometry:none and one with just ...
Read more >
Why can't Python parse this JSON data? - Stack Overflow
import json from pprint import pprint with open('data.json') as f: data = json.load(f) pprint(data). With data, you can now also find values like...
Read more >
ReadFile function (fileapi.h) - Win32 apps - Microsoft Learn
Reads data from the specified file or input/output (I/O) device. Reads occur at the position specified by the file pointer if supported by...
Read more >
Reading and Writing Files in Python - DataCamp
Learn how to open, load, & save data to and from binary files with Python. ... just outputs the entire file if the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found