Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Unexpected change of behavior on DataFrame type float32 between pandas versions.

See original GitHub issue

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
import json

def check_error(x, r):
    df = pd.DataFrame(data=x, dtype="float32")
    for i in df.index:
        for j in range(len(list(df.iloc[i]))):
            # Here is the difference, it"s converting str to float32 in pandas 1.3.5
            # On pandas 1.4.0 no longer happens, instead it's keeping the value intact.
            df.iloc[i][j] = r[j][int(df.iloc[i][j])]
    return df


x = [[0, 0.], [1., 0.], [0., 1.], [1., 1.]]
r = np.array([["10", "20"], ["50", "40"]])

result = check_error(x, r).to_json()
result_json = expected = json.loads(result)
## Pandas 1.3.5
if pd.__version__=="1.3.5":
    expected_json_pandas_1_3_5 = json.loads("""
    {"0":{"0":10.0,"1":20.0,"2":10.0,"3":20.0},"1":{"0":50.0,"1":50.0,"2":40.0,"3":40.0}}
    """)
    print(sorted(result_json.items()) == sorted(expected_json_pandas_1_3_5.items()))
else:
    expected_json_pandas_1_4_0 = json.loads("""
    {"0": {"0": 0.0, "1": 1.0, "2": 0.0, "3": 1.0}, "1": {"0": 0.0, "1": 0.0, "2": 1.0, "3": 1.0}}
    """)
    ## Pandas 1.4.0
    print(sorted(result_json.items()) == sorted(expected_json_pandas_1_4_0.items()))

Issue Description

There is a change of behavior that is not mentioned in the documentation that could cause issues in existing libraries when assigning variables in a data frame with float32, I haven’t checked if in other types of data frames the same could occur.

Pretty when using data frames with type float32 the assignation is not reacting in the same way between 1.3.5 pandas version and 1.4.0. In 1.3.5 str type is getting transformed into a float type, in 1.4.0 this is no longer occurring and instead, the assignation is not occurring anymore, but this is not throwing an error either which is causing as in the example two data frames to contain different information based on pandas version.

Expected Behavior

Change should be called out in the documentation of 1.4.0 or throw an error instead to alert users about incorrect types during the assignation of variables. Overall it might be worthy to explore if this is not affecting other dataframes types.

Installed Versions

Pandas 1.3.5

Pandas 1.4.0

Pandas 1.4.1

Issue Analytics

State:
Created a year ago
Comments:18 (17 by maintainers)

Top GitHub Comments

1reaction

jbrockmendelcommented, Jun 16, 2022

This changes na, because we are doing a Series.setitem call which is a view on the DataFrame column. Is this expected?

Yes

0reactions

simonjayhawkinscommented, Aug 30, 2022

ok. removing from 1.4 milestone. I will leave open as some discussion about warnings.

Top Results From Across the Web

BUG: Unexpected change of behavior on DataFrame type float32 ...

... to R data.frame objects, statistical functions, and much more - BUG: Unexpected change of behavior on DataFrame type float32 between pandas versions....

What's new in 1.4.0 (January 22, 2022) - Pandas

These are bug fixes that might have notable behavior changes. ... class instead specifying the data type (which will also work on older...

What's new in 1.5.0 (September 19, 2022) - Pandas

These are bug fixes that might have notable behavior changes. Using dropna=True with groupby transforms#. A transform is an operation whose result ...

What's new in 1.3.0 (July 2, 2021) - Pandas

Improved integer type mapping from pandas to SQLAlchemy when using DataFrame.to_sql() ... These are bug fixes that might have notable behavior changes.

Frequently Asked Questions (FAQ) - Pandas

The memory usage of a DataFrame (including the index) is shown when calling ... if the UDF mutates (changes) the DataFrame , unexpected...