question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

df.to_stata fails when a column of type object contains only None

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'a': ['a', None]})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, 'a']})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, None]})
df.to_stata('test.dta')
# ValueError: Writing general object arrays is not supported

Problem description

The df.to_stata() method writes columns containing None without error when there is at least one string value in the column, but fails if the column contains only None. It’s unclear what data type to write a column of None as, so maybe that’s why this isn’t supported? I would propose that a column with values of only None be written as str1 with empty strings.

I came across this error because I read in a Parquet file with pd.read_parquet() and was unable to write the file to Stata format. In the Parquet schema, the column had type BYTE_ARRAY UTF8, but since the column had only missing values, it was read into Pandas as only None.

Expected Output

Stata file written to disk with missing values for the column with None.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.3.2
pip: 18.0
setuptools: 40.0.0
Cython: 0.28.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.11.1
xarray: None
IPython: 6.5.0
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.1.14
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:20 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
kylebarroncommented, Nov 12, 2018

So, just to be clear, you’re only talking about raising in the all None case, right?

Above I was referencing the all None case.

No strong opinion on writing out a column full of None with df.to_stata().

I agree with @bashtage raising an error is the best solution.

0reactions
mgao6767commented, May 23, 2021

Safer would be

finalDF.to_stata('myfile.dta', version=117, convert_strl = finalDF.columns[finalDF.isnull().all() & finalDF.dtypes==object].tolist())

so that you don’t try to write all missing value numeric columns.

Just want to add that there should be brackets around finalDF.dtypes==object since & has a higher priority than ==.

finalDF.to_stata('myfile.dta', version=117, convert_strl = finalDF.columns[finalDF.isnull().all() & (finalDF.dtypes==object)].tolist())
Read more comments on GitHub >

github_iconTop Results From Across the Web

python DataFrame data export to stata using to_stata() raise ...
1 Answer 1 ... See the answer via this link. Find out which columns are of the object type: list(df.select_dtypes(include=['object']) ...
Read more >
File based Workflows — RSIT Workshop (Uni Tübingen) 2021
Only string-like object arrays containing all strings or a mix of strings and None can be exported. Object arrays containing only null values...
Read more >
pandas.DataFrame — pandas 1.5.2 documentation
If data contains column labels, will perform column selection instead. dtypedtype, default None. Data type to force. Only a single dtype is allowed....
Read more >
apache_beam.dataframe.io module - Apache Beam
If using 'zip', the ZIP file must contain only one data file to be read in. Set to None for no ... Write...
Read more >
pandas.core.frame — Lux 0.1.2 documentation
The column will have a Categorical type with the value of "left_only" for observations whose ... None: """ Export DataFrame object to Stata...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found