DOC: Breaking change for join, int to float type coercion
See original GitHub issuePandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html
Documentation problem
The behaviour of join appears to have changed somewhere between pandas version 1.2.4 and 1.4.2.
If if perform an inner join on a float col with a int col then in 1.2.4 the result is ant int64 column. In 1.42 it’s a float64. The below test passes in 1.2.4 but fails in 1.4.2.
I couldn’t find any documentation of this behaviour although it seems reasonable it caused me a bit of pain to get to the bottom of and is a breaking change. Is this behaviour documented anywhere and if not could it be?
import pandas as pd
from pandas.testing import assert_frame_equal
def test_join_type_coercion():
left = [
{"id": 1, "name": "Chase"},
{"id": 2, "name": "Sky"},
{"id": 3, "name": "Marhsall"},
{"id": None, "name": "Daring Danny X"},
]
right = [
{"id": 1, "color": "Blue"},
{"id": 2, "color": "Pink"},
{"id": 3, "color": "Red"},
]
left_df = pd.DataFrame(left)
right_df = pd.DataFrame(right)
joined = (
left_df.set_index("id")
.join(right_df.set_index("id"), how="inner")
.reset_index()
)
assert_frame_equal(pd.DataFrame([
{"id": 1, "name": "Chase", "color": "Blue"},
{"id": 2, "name": "Sky", "color": "Pink"},
{"id": 3, "name": "Marhsall", "color": "Red"},
]), joined)
Suggested fix for documentation
I’m not too sure what the pattern is here but maybe a note to describe the change in behaviour and at what version it was introduced.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
When does Python perform type conversion when comparing ...
Another answer pointed to source code that shows us the approach of simply converting one type to another does not fulfill the specified ......
Read more >Implicit conversions - cppreference.com
Implicit conversions are performed whenever an expression of some type T1 is used in context that does not accept that type, but accepts...
Read more >Type conversion functions - IBM
Use the type conversion functions to change the type of an argument.
Read more >How To Convert Data Types in Go - DigitalOcean
Converting Number Types. Go has several numeric types to choose from. Primarily they break out into two general types: integers and floating- ...
Read more >Built-in Types — Python 3.11.1 documentation
Two methods support conversion to and from hexadecimal strings. Since Python's floats are stored internally as binary numbers, converting a float to or...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Sorry for misreading that. This change happened between 1.2.5 and 1.3.0 and nothing in 1.3.0 change log would have obviously caused it. So, that’s the right place to include a note on the breaking change. I’m happy to put together right message and submit PR (doing a little investigation first to understand when the change happens - e.g. I know it only occurs when you use indices in the underlying merge).
How has this changed in pandas 1.2.4 vs 1.4.2? My point is really that the behaviour of
join
has changed in a breaking way. This (the change in behaviour) should surely be documented somewhere?To hopefully add clarity. Although the
None
does cause the first DataFrame to coalesce the id to a float that’s unrelated to what I’m trying to demonstrate here. I could (and perhaps should) have typed out the ids as floats.Either way, a
join
on a float with int used to result in an int, now it doesn’t.