question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Value error using sjoin with pandas v0.23

See original GitHub issue

I use the sjoin function to add the region name (polygons) to every point within the region. Some points are not in any region, therefore I filter these points and buffer them step by step. So the points layer without intersection becomes smaller and smaller. If there is only one row left I get the following error in pandas v0.23 which I did not get before (pandas < v0.23). Using geopandas v0.3.0.

My call:

new = gpd.sjoin(rest_points, polygons, how='left', op='intersects')

Error message:

ValueError: You are trying to merge on object and int64 columns.
If you wish to proceed you should use pd.concat

class: GeoDataFrame method: merge(self, *args, **kwargs) line: result = DataFrame.merge(self, *args, **kwargs)

I do not understand the error and why it happens only with the last point (last row) and only with the newest pandas version. I had a look at “What’s New” but could not find anything.

Full message:

  File "virtualenv/lib/python3.5/site-packages/geopandas/tools/sjoin.py", line 140,
    in sjoin suffixes=('_%s' % lsuffix, '_%s' % rsuffix))
  File "virtualenv/lib/python3.5/site-packages/geopandas/geodataframe.py", line 418,
     in merge result = DataFrame.merge(self, *args, **kwargs)
  File "virtualenv/lib/python3.5/site-packages/pandas/core/frame.py", line 6379,
     in merge copy=copy, indicator=indicator, validate=validate)
  File "virtualenv/lib/python3.5/site-packages/pandas/core/reshape/merge.py", line 60,
     in mergevalidate=validate)
  File "virtualenv/lib/python3.5/site-packages/pandas/core/reshape/merge.py", line 554,
     in __init__self._maybe_coerce_merge_keys()
  File "virtualenv/lib/python3.5/site-packages/pandas/core/reshape/merge.py", line 980,
        in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and int64 columns.
If you wish to proceed you should use pd.concat

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
bnaulcommented, Jul 2, 2018

Does anyone have a proposed solution for this? I would say geopandas is basically incompatible with the latest pandas since this bug affects a common core use case.

One approach I can see is to just add a temporary column w/ the right dtype enforced to use for the join. Kinda gross but would get the job done:

result = result.set_index('_key_left')
joined = (
          left_df
          .merge(result, left_index=True, right_index=True, how='left')
          )
right_df['_key'] = right_df.index.values.astype(joined_df._key_right.dtype)  # tmp key
joined = (
              joined
              .merge(right_df.drop(right_df.geometry.name, axis=1),
              how='left', left_on='_key_right', right_on='_key',
              suffixes=('_%s' % lsuffix, '_%s' % rsuffix))
         )
right_df.drop('_key', axis=1, inplace=True)
joined = joined.set_index(index_left).drop(['_key_right'], axis=1)
2reactions
jorisvandenbosschecommented, Jul 2, 2018

I think a fix would be:

--- a/geopandas/tools/sjoin.py
+++ b/geopandas/tools/sjoin.py
@@ -114,7 +114,7 @@ def sjoin(left_df, right_df, how='inner', op='intersects',
 
     else:
         # when output from the join has no overlapping geometries
-        result = pd.DataFrame(columns=['_key_left', '_key_right'])
+        result = pd.DataFrame(columns=['_key_left', '_key_right'], dtype=float)
 
     if op == "within":
         # within implemented as the inverse of contains; swap names

Can you check if that solves the issue for you?

Read more comments on GitHub >

github_iconTop Results From Across the Web

What's New — pandas 0.23.0 documentation - PyData |
Bug in indexing a datetimelike Index that raised ValueError instead of IndexError (GH18386). Index.to_series() now accepts index and name kwargs (GH18699) ...
Read more >
Combining two Series into a DataFrame in pandas
In one instance, it seems to be telling me 'ValueError: The truth value of an array with more than one element is ambiguous....
Read more >
Changelog — GeoPandas 0.12.2+0.gefcb367.dirty ...
explore() for recent Matplotlib versions (#2596). Bug fixes: Fix cryptic error message in geopandas.clip() when clipping with an empty geometry (#2589) ...
Read more >
Functions - pyjanitor documentation
:returns: A pandas DataFrame with added columns. """ # Note: error checking can pretty much be handled in `add_column` for col_name, values in...
Read more >
What's new in 0.24.0 (January 25, 2019) - Pandas 中文
Pandas has gained the ability to hold integer dtypes with missing values. This long requested feature is enabled through the use of extension ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found