BUG: uint8 silently converted to int8 during dataframe creation
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
# !pip install -U pandas
import pandas as pd
import numpy as np
# Create arbitrary 3-channel 8-bit image
data = np.random.randint(0, 256, size=(50,50, 3), dtype='uint8')
# Replicate this image and treat each image as a row of pixel features
rows = [data.reshape(-1)]*2
assert all(r.dtype == np.uint8 for r in rows)
print(pd.DataFrame(rows).dtypes)
# int8???
print(pd.DataFrame(np.vstack(rows)).dtypes)
# uint8 (expected)
Issue Description
Apologies if this issue already exists – type-related issues are hard for me to navigate in github since there are too many to easily parse.
uint8
subarray dtypes are silently converted to int8
when constructing a dataframe from lists.
I tried testing on master but Windows has a build error for me (stack trace below),
building 'pandas._libs.algos' extension
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
----------------------------------------
ERROR: Failed building wheel for pandas
Failed to build pandas
ERROR: Could not build wheels for pandas which use PEP 517 and cannot be installed directly
Expected Behavior
The dtype of each unit in each column is a uint8
, so I would expect uint8
resulting columns
Installed Versions
INSTALLED VERSIONS
commit : 73c68257545b5f8530b7044f56647bd2db92e2ba python : 3.9.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252
pandas : 1.3.3 numpy : 1.19.5 pytz : 2021.1 dateutil : 2.8.1 pip : 21.1.3 setuptools : 52.0.0.post20210125 Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 1.4.3 lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.23.1 pandas_datareader: None bs4 : 4.9.3 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : None odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.6.3 sqlalchemy : 1.4.15 tables : None tabulate : None xarray : None xlrd : 2.0.1 xlwt : None numba : None
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
@mroeschke I have updated to only rely on
numpy
andpandas
per your request. The issue still persists on my end.I think this was solved by https://github.com/pandas-dev/pandas/pull/47475 which will be apart of the 1.5 release, so closing