BUG: Loading in geojson through read_file misses certain entries in the output GDF
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of geopandas.
-
(optional) I have confirmed this bug exists on the master branch of geopandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import json
## JSON Version 2.0.9
import geopandas as gpd
import urllib
## I have uploaded a file with 65 entries
url = "https://github.com/FDenker/GeoPandas-Geojson-Issue/raw/main/geopandas_not_found.geojson"
## This is the heart of the issue
## This is not due to the download (this gives the same result locally)
## This will be empty
empty_gdf=gpd.read_file(url)
## However, if we load it in
file = urllib.request.urlopen(url)
loaded_json = json.load(file)
## This returns a GeoDataFrame with the right information
correct_gdf=gpd.GeoDataFrame.from_features(loaded_json['features'])
Problem description
When reading in a specific kind of GeoJSON (output of an osmium-tool export to be exact) the read_file
function skips over specific elements. However, it does not return an error but rather an empty GeoDataFrame. It is important to mention that this only occurs for a low number of entries and the GeoJSON I have linked above only includes entries in which the read_file
function does not work. When I normally import GeoJSON files that are exported from the osmium-tool about 99 % of the entries are reflected in the GeoDataFrame.
At the same time, if I load in the GeoJSON as simple JSON and then pass the ‘features’ to the from_features
function it returns proper GeoDataFrame with all the data that is in the GeoJSON.
The error persists both on my local windows machine (running python 3.8.3) and on an Ubuntu 18.04 machine (running 3.7.10 and the GitHub version of the geopandas). I have therefore also posted both system info below.
Expected Output
GeoDataFrame with 65 rows containing attributes and valid geometries.
Output of geopandas.show_versions()
Windows machine:
SYSTEM INFO
python : 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)] executable : C:\Users$USER\anaconda3\python.exe machine : Windows-10-10.0.21390-SP0
GEOS, GDAL, PROJ INFO
GEOS : None GEOS lib : None GDAL : 3.3.0 GDAL data dir: None PROJ : 7.2.1 PROJ data dir: C:\Users$USER\anaconda3\lib\site-packages\pyproj\proj_dir\share\proj
PYTHON DEPENDENCIES
geopandas : 0.9.0 pandas : 1.0.5 fiona : 1.8.20 numpy : 1.18.5 shapely : 1.7.1 rtree : 0.9.4 pyproj : 3.1.0 matplotlib : None mapclassify: None geopy : None psycopg2 : 2.8.6 (dt dec pq3 ext lo64) geoalchemy2: None pyarrow : 2.0.0 pygeos : 0.10
Linux machine:
SYSTEM INFO
python : 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0] executable : /opt/tljh/user/bin/python machine : Linux-4.15.0-143-generic-x86_64-with-debian-buster-sid
GEOS, GDAL, PROJ INFO
GEOS : 3.8.0 GEOS lib : /usr/lib/x86_64-linux-gnu/libgeos_c.so GDAL : 2.4.4 GDAL data dir: /opt/tljh/user/lib/python3.7/site-packages/fiona/gdal_data PROJ : 7.0.1 PROJ data dir: /opt/tljh/user/lib/python3.7/site-packages/pyproj/proj_dir/share/proj
PYTHON DEPENDENCIES
geopandas : 0.9.0+36.gcb88dd4 pandas : 0.25.3 fiona : 1.8.17 numpy : 1.19.1 shapely : 1.7.1 rtree : 0.9.7 pyproj : 2.6.1.post1 matplotlib : 3.3.2 mapclassify: None geopy : 2.1.0 psycopg2 : 2.8.6 (dt dec pq3 ext lo64) geoalchemy2: None pyarrow : 0.17.1 pygeos : 0.10
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
@FDenker @nguyenlienviet This issue is arising because of using GDAL’s geojson driver to read the file (as compared to the
from_features
route that you showed working as expected). There’s an environment variable,OGR_GEOJSON_MAX_OBJ_SIZE
, that sets the maximum size of individual features (https://gdal.org/drivers/vector/geojson.html). Some of the features in the dataset you have here are sufficiently complex that they’re bumping up against whatever that is set to on your system. I’m able to get the behavior you experience by setting that environment variable to a lower value. For you, this should work:I won’t tell you how long it took me to get to the bottom of this one. 😅
@jdmcbr Thanks a lot for getting to the bottom of this!