df.to_stata fails when a column of type object contains only None
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({'a': ['a', None]})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, 'a']})
df.to_stata('test.dta')
df = pd.DataFrame({'a': [None, None]})
df.to_stata('test.dta')
# ValueError: Writing general object arrays is not supported
Problem description
The df.to_stata()
method writes columns containing None
without error when there is at least one string value in the column, but fails if the column contains only None
. It’s unclear what data type to write a column of None
as, so maybe that’s why this isn’t supported? I would propose that a column with values of only None
be written as str1
with empty strings.
I came across this error because I read in a Parquet file with pd.read_parquet()
and was unable to write the file to Stata format. In the Parquet schema, the column had type BYTE_ARRAY UTF8
, but since the column had only missing values, it was read into Pandas as only None
.
Expected Output
Stata file written to disk with missing values for the column with None
.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-38-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.3.2
pip: 18.0
setuptools: 40.0.0
Cython: 0.28.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.11.1
xarray: None
IPython: 6.5.0
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.1.14
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:20 (17 by maintainers)
Top Results From Across the Web
python DataFrame data export to stata using to_stata() raise ...
1 Answer 1 ... See the answer via this link. Find out which columns are of the object type: list(df.select_dtypes(include=['object']) ...
Read more >File based Workflows — RSIT Workshop (Uni Tübingen) 2021
Only string-like object arrays containing all strings or a mix of strings and None can be exported. Object arrays containing only null values...
Read more >pandas.DataFrame — pandas 1.5.2 documentation
If data contains column labels, will perform column selection instead. dtypedtype, default None. Data type to force. Only a single dtype is allowed....
Read more >apache_beam.dataframe.io module - Apache Beam
If using 'zip', the ZIP file must contain only one data file to be read in. Set to None for no ... Write...
Read more >pandas.core.frame — Lux 0.1.2 documentation
The column will have a Categorical type with the value of "left_only" for observations whose ... None: """ Export DataFrame object to Stata...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Above I was referencing the all
None
case.I agree with @bashtage raising an error is the best solution.
Just want to add that there should be brackets around
finalDF.dtypes==object
since&
has a higher priority than==
.