Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

df.to_stata fails when a column of type object contains only None

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'a': ['a', None]})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, 'a']})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, None]})
df.to_stata('test.dta')
# ValueError: Writing general object arrays is not supported

Problem description

The df.to_stata() method writes columns containing None without error when there is at least one string value in the column, but fails if the column contains only None. It’s unclear what data type to write a column of None as, so maybe that’s why this isn’t supported? I would propose that a column with values of only None be written as str1 with empty strings.

I came across this error because I read in a Parquet file with pd.read_parquet() and was unable to write the file to Stata format. In the Parquet schema, the column had type BYTE_ARRAY UTF8, but since the column had only missing values, it was read into Pandas as only None.

Expected Output

Stata file written to disk with missing values for the column with None.

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.3.2
pip: 18.0
setuptools: 40.0.0
Cython: 0.28.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.11.1
xarray: None
IPython: 6.5.0
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.1.14
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

Issue Analytics

State:
Created 5 years ago
Comments:20 (17 by maintainers)

Top GitHub Comments

1reaction

kylebarroncommented, Nov 12, 2018

So, just to be clear, you’re only talking about raising in the all None case, right?

Above I was referencing the all None case.

No strong opinion on writing out a column full of None with df.to_stata().

I agree with @bashtage raising an error is the best solution.

0reactions

mgao6767commented, May 23, 2021

Safer would be
finalDF.to_stata('myfile.dta', version=117, convert_strl = finalDF.columns[finalDF.isnull().all() & finalDF.dtypes==object].tolist())
so that you don’t try to write all missing value numeric columns.

Just want to add that there should be brackets around finalDF.dtypes==object since & has a higher priority than ==.

finalDF.to_stata('myfile.dta', version=117, convert_strl = finalDF.columns[finalDF.isnull().all() & (finalDF.dtypes==object)].tolist())

Top Results From Across the Web

python DataFrame data export to stata using to_stata() raise ...

1 Answer 1 ... See the answer via this link. Find out which columns are of the object type: list(df.select_dtypes(include=['object']) ...

File based Workflows — RSIT Workshop (Uni Tübingen) 2021

Only string-like object arrays containing all strings or a mix of strings and None can be exported. Object arrays containing only null values...

pandas.DataFrame — pandas 1.5.2 documentation

If data contains column labels, will perform column selection instead. dtypedtype, default None. Data type to force. Only a single dtype is allowed....

apache_beam.dataframe.io module - Apache Beam

If using 'zip', the ZIP file must contain only one data file to be read in. Set to None for no ... Write...

pandas.core.frame — Lux 0.1.2 documentation

The column will have a Categorical type with the value of "left_only" for observations whose ... None: """ Export DataFrame object to Stata...