Support writing unicode characters in df.to_stata()
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
df = pd.DataFrame({'a': ['丆']})
df.to_stata('test.dta')
# UnicodeEncodeError: 'latin-1' codec can't encode character '\u4e06' in position 0: ordinal not in range(256)
I picked an arbitrary CJK character to test this with.
Problem description
It would be possible to write Unicode strings to a Stata file by implementing a writer according to version 118 of the dta
format.
I’d be interested in trying to submit a PR for this. (Edit: I don’t use Stata anymore)
Expected Output
Stata file written to disk.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.18.7.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.10.0
xarray: None
IPython: 7.0.1
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:13 (11 by maintainers)
Top Results From Across the Web
pandas - Python cannot export to Stata due to unicode problem?
I'm trying to export a dataframe in ...
Read more >pandas.DataFrame.to_stata — pandas 1.5.2 documentation
Versions 118 and 119 support Unicode characters, and version 119 supports more than 32,767 variables. Version 119 should usually only be used when...
Read more >Unicode support - Stata
Did you know Stata supports Unicode? ... Unicode encodes all the world's characters, meaning we can write Hello, Здравствуйте, こんにちは, and a lot...
Read more >Unicode HOWTO — Python 3.11.1 documentation
Release, 1.12,. This HOWTO discusses Python's support for the Unicode specification for representing textual data, and explains various problems that people ...
Read more >Stata 14, Unicode, and extended ASCII. - Statalist
But there is a problem with the switch to Unicode for languages that so far used extended ASCII for some characters (German,
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes. Someone could write a format 118 or 119 writer that supports unicode.
Yeah, it should be reopened. It closed an issue in this issue, rather than the issue.