question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: uint8 silently converted to int8 during dataframe creation

See original GitHub issue

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

# !pip install -U pandas
import pandas as pd
import numpy as np

# Create arbitrary 3-channel 8-bit image
data = np.random.randint(0, 256, size=(50,50, 3), dtype='uint8')

# Replicate this image and treat each image as a row of pixel features
rows = [data.reshape(-1)]*2
assert all(r.dtype == np.uint8 for r in rows)

print(pd.DataFrame(rows).dtypes)
# int8???

print(pd.DataFrame(np.vstack(rows)).dtypes)
# uint8 (expected)

Issue Description

Apologies if this issue already exists – type-related issues are hard for me to navigate in github since there are too many to easily parse.

uint8 subarray dtypes are silently converted to int8 when constructing a dataframe from lists.

I tried testing on master but Windows has a build error for me (stack trace below),

building 'pandas._libs.algos' extension
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  ----------------------------------------
  ERROR: Failed building wheel for pandas
Failed to build pandas
ERROR: Could not build wheels for pandas which use PEP 517 and cannot be installed directly

Expected Behavior

The dtype of each unit in each column is a uint8, so I would expect uint8 resulting columns

Installed Versions

INSTALLED VERSIONS

commit : 73c68257545b5f8530b7044f56647bd2db92e2ba python : 3.9.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252

pandas : 1.3.3 numpy : 1.19.5 pytz : 2021.1 dateutil : 2.8.1 pip : 21.1.3 setuptools : 52.0.0.post20210125 Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.4.3 lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.23.1 pandas_datareader: None bs4 : 4.9.3 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.3 sqlalchemy : 1.4.15 tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None numba : None

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ntjesscommented, Oct 2, 2021

@mroeschke I have updated to only rely on numpy and pandas per your request. The issue still persists on my end.

0reactions
mroeschkecommented, Aug 1, 2022

I think this was solved by https://github.com/pandas-dev/pandas/pull/47475 which will be apart of the 1.5 release, so closing

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas dataframe to TTree. Problem with types 'uint8' and 'int8'
When trying to convert a python array into a TTree, the method ROOT.RDF.MakeNumpyDataFrame(array) does not work if the columns have dtypes ...
Read more >
Stop Pandas from converting int to float due to an insertion in ...
However, when I insert None into the str column, Pandas converts all my int to float as well. This doesn't make sense to...
Read more >
Dataframe column dtype changed from int8 to int64 ... - GitHub
Dataframe column dtype changed from int8 to int64 when setting complete column ... BUG #11638 return correct dtype for int and float #11644....
Read more >
IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
Dict of functions for converting values in certain columns. Keys can either be integers or column labels. true_valueslist, default None. Values to consider...
Read more >
Essential basic functionality — pandas 1.5.2 documentation
Here we discuss a lot of the essential functionality common to the pandas data structures. To begin, let's create some example objects like...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found