BUG: Unexpected change of behavior on DataFrame type float32 between pandas versions.
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
import json
def check_error(x, r):
df = pd.DataFrame(data=x, dtype="float32")
for i in df.index:
for j in range(len(list(df.iloc[i]))):
# Here is the difference, it"s converting str to float32 in pandas 1.3.5
# On pandas 1.4.0 no longer happens, instead it's keeping the value intact.
df.iloc[i][j] = r[j][int(df.iloc[i][j])]
return df
x = [[0, 0.], [1., 0.], [0., 1.], [1., 1.]]
r = np.array([["10", "20"], ["50", "40"]])
result = check_error(x, r).to_json()
result_json = expected = json.loads(result)
## Pandas 1.3.5
if pd.__version__=="1.3.5":
expected_json_pandas_1_3_5 = json.loads("""
{"0":{"0":10.0,"1":20.0,"2":10.0,"3":20.0},"1":{"0":50.0,"1":50.0,"2":40.0,"3":40.0}}
""")
print(sorted(result_json.items()) == sorted(expected_json_pandas_1_3_5.items()))
else:
expected_json_pandas_1_4_0 = json.loads("""
{"0": {"0": 0.0, "1": 1.0, "2": 0.0, "3": 1.0}, "1": {"0": 0.0, "1": 0.0, "2": 1.0, "3": 1.0}}
""")
## Pandas 1.4.0
print(sorted(result_json.items()) == sorted(expected_json_pandas_1_4_0.items()))
Issue Description
There is a change of behavior that is not mentioned in the documentation that could cause issues in existing libraries when assigning variables in a data frame with float32, I haven’t checked if in other types of data frames the same could occur.
Pretty when using data frames with type float32 the assignation is not reacting in the same way between 1.3.5 pandas version and 1.4.0. In 1.3.5 str type is getting transformed into a float type, in 1.4.0 this is no longer occurring and instead, the assignation is not occurring anymore, but this is not throwing an error either which is causing as in the example two data frames to contain different information based on pandas version.
Expected Behavior
Change should be called out in the documentation of 1.4.0 or throw an error instead to alert users about incorrect types during the assignation of variables. Overall it might be worthy to explore if this is not affecting other dataframes types.
Installed Versions
Pandas 1.3.5
Pandas 1.4.0
Pandas 1.4.1
Issue Analytics
- State:
- Created a year ago
- Comments:18 (17 by maintainers)
Top GitHub Comments
Yes
ok. removing from 1.4 milestone. I will leave open as some discussion about warnings.