question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Unexpected change of behavior on DataFrame type float32 between pandas versions.

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
import json

def check_error(x, r):
    df = pd.DataFrame(data=x, dtype="float32")
    for i in df.index:
        for j in range(len(list(df.iloc[i]))):
            # Here is the difference, it"s converting str to float32 in pandas 1.3.5
            # On pandas 1.4.0 no longer happens, instead it's keeping the value intact.
            df.iloc[i][j] = r[j][int(df.iloc[i][j])]
    return df


x = [[0, 0.], [1., 0.], [0., 1.], [1., 1.]]
r = np.array([["10", "20"], ["50", "40"]])

result = check_error(x, r).to_json()
result_json = expected = json.loads(result)
## Pandas 1.3.5
if pd.__version__=="1.3.5":
    expected_json_pandas_1_3_5 = json.loads("""
    {"0":{"0":10.0,"1":20.0,"2":10.0,"3":20.0},"1":{"0":50.0,"1":50.0,"2":40.0,"3":40.0}}
    """)
    print(sorted(result_json.items()) == sorted(expected_json_pandas_1_3_5.items()))
else:
    expected_json_pandas_1_4_0 = json.loads("""
    {"0": {"0": 0.0, "1": 1.0, "2": 0.0, "3": 1.0}, "1": {"0": 0.0, "1": 0.0, "2": 1.0, "3": 1.0}}
    """)
    ## Pandas 1.4.0
    print(sorted(result_json.items()) == sorted(expected_json_pandas_1_4_0.items()))

Issue Description

There is a change of behavior that is not mentioned in the documentation that could cause issues in existing libraries when assigning variables in a data frame with float32, I haven’t checked if in other types of data frames the same could occur.

Pretty when using data frames with type float32 the assignation is not reacting in the same way between 1.3.5 pandas version and 1.4.0. In 1.3.5 str type is getting transformed into a float type, in 1.4.0 this is no longer occurring and instead, the assignation is not occurring anymore, but this is not throwing an error either which is causing as in the example two data frames to contain different information based on pandas version.

Expected Behavior

Change should be called out in the documentation of 1.4.0 or throw an error instead to alert users about incorrect types during the assignation of variables. Overall it might be worthy to explore if this is not affecting other dataframes types.

Installed Versions

Pandas 1.3.5

Pandas 1.4.0

Pandas 1.4.1

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:18 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
jbrockmendelcommented, Jun 16, 2022

This changes na, because we are doing a Series.setitem call which is a view on the DataFrame column. Is this expected?

Yes

0reactions
simonjayhawkinscommented, Aug 30, 2022

ok. removing from 1.4 milestone. I will leave open as some discussion about warnings.

Read more comments on GitHub >

github_iconTop Results From Across the Web

BUG: Unexpected change of behavior on DataFrame type float32 ...
... to R data.frame objects, statistical functions, and much more - BUG: Unexpected change of behavior on DataFrame type float32 between pandas versions....
Read more >
What's new in 1.4.0 (January 22, 2022) - Pandas
These are bug fixes that might have notable behavior changes. ... class instead specifying the data type (which will also work on older...
Read more >
What's new in 1.5.0 (September 19, 2022) - Pandas
These are bug fixes that might have notable behavior changes. Using dropna=True with groupby transforms#. A transform is an operation whose result ...
Read more >
What's new in 1.3.0 (July 2, 2021) - Pandas
Improved integer type mapping from pandas to SQLAlchemy when using DataFrame.to_sql() ... These are bug fixes that might have notable behavior changes.
Read more >
Frequently Asked Questions (FAQ) - Pandas
The memory usage of a DataFrame (including the index) is shown when calling ... if the UDF mutates (changes) the DataFrame , unexpected...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found