BUG: `convert_dtypes()` converts GeoDataFrame to DataFrame
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of geopandas.
-
(optional) I have confirmed this bug exists on the master branch of geopandas.
Code Sample, a copy-pastable example
import pandas as pd
import fiona # AttributeError if not importing fiona before gpd
import geopandas as gpd
from geopandas.testing import assert_geodataframe_equal
def df():
return pd.DataFrame(
{
"City": ["Buenos Aires", "Brasilia", "Santiago", "Bogota", "Caracas"],
"Country": ["Argentina", "Brazil", "Chile", "Colombia", "Venezuela"],
"Latitude": [-34.58, -15.78, -33.45, 4.60, 10.48],
"Longitude": [-58.66, -47.91, -70.66, -74.08, -66.86],
}
)
def gdf_points_from_xy(df):
return gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))
def test_convert_dtypes_before_gdf():
result = df().convert_dtypes().pipe(gdf_points_from_xy)
assert isinstance(result, gpd.GeoDataFrame)
# --> no error
def test_convert_dtypes_after_gdf():
result = df().pipe(gdf_points_from_xy).convert_dtypes()
assert isinstance(result, gpd.GeoDataFrame)
# --> AssertionError
def test_convert_dtypes_expectation():
expected = df().convert_dtypes().pipe(gdf_points_from_xy)
result = df().pipe(gdf_points_from_xy).convert_dtypes()
assert_geodataframe_equal(result, expected)
#--> AssertionError
Problem description
Calling .convert_dtypes()
on a GeoDataFrame turns it into a regular DataFrame.
AssertionError: assert isinstance(result, GeoDataFrame)
Expected Output
I expect convert_dtypes
to not change frame type.
The problem may be with the original method. I suspect it is because of return concat(results, axis=1, copy=False)
. As the pandas repo does not know about other frame types, I suspect the fix should lie in this repo.
Output of geopandas.show_versions()
SYSTEM INFO
python : 3.8.4 (default, Jan 11 2021, 16:58:12) [Clang 12.0.0 (clang-1200.0.32.28)] executable : /Users/adriantofting/Library/Caches/pypoetry/virtualenvs/gpd-test-qc5veAeh-py3.8/bin/python machine : macOS-11.1-x86_64-i386-64bit
GEOS, GDAL, PROJ INFO
GEOS : 3.9.1 GEOS lib : /usr/local/Cellar/geos/3.9.1/lib/libgeos_c.dylib GDAL : 3.2.1 GDAL data dir: None PROJ : 7.2.1 PROJ data dir: /usr/local/share/proj
PYTHON DEPENDENCIES
geopandas : 0.9.0 pandas : 1.2.3 fiona : 1.8.18 numpy : 1.20.1 shapely : 1.7.1 rtree : None pyproj : 3.0.0.post1 matplotlib : None mapclassify: None geopy : None psycopg2 : None geoalchemy2: None pyarrow : None pygeos : None
Environment (poetry)
#pyproject.toml
[tool.poetry]
name = "gpd_test"
version = "0.1.0"
description = ""
[tool.poetry.dependencies]
python = "^3.8"
geopandas = "0.9.0"
[tool.poetry.dev-dependencies]
jupyterlab = "^3.0.9"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (6 by maintainers)
Indeed, only a
__finalize__
won’t be enough. But if we could somehow have pandas call the finalize of the class ofself
, that could solve it (to avoid that the return type ofconvert_dtypes
depends on the return type ofconcat
, since the use ofconcat
should be an implementation detail).Although maybe just passing it to the constructor
return self._constructor(result).__finalize__(self, method="convert_dtypes")
might be easier (whereresult
is the output of concat).I opened an issue for this on the pandas side: https://github.com/pandas-dev/pandas/issues/43668
I think this can actually also be considered a bug in pandas itself, as it should use
_constructor
to recreate the resulting dataframe. Of course, on the short-term, we can also fix it in geopandas by overriding the method. PR welcome for that!@damanad regarding pandas denoting the method as
@final
, pandas uses that to indicate that the method doesn’t get overriden internally in pandas itself. We have several methods in geopandas that we override from pandas (that already might have this decorator, didn’t check). So in general I think we can ignore this. If it would give problems with typing (validating type annotations in geopandas), we might need to ask pandas to not use it like that (but no typing expert).Although, when putting the geometry column first, it now returns a GeoDataFrame, it’s apparently still an “invalid” GeoDataFrame, in the sense that the
_geometry_column_name
has not been set properly. I think this can be seen as a separate bug inpd.concat([geoseries, series, ..], axis=1)