Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support writing unicode characters in df.to_stata()

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'a': ['丆']})
df.to_stata('test.dta')
# UnicodeEncodeError: 'latin-1' codec can't encode character '\u4e06' in position 0: ordinal not in range(256)

I picked an arbitrary CJK character to test this with.

Problem description

It would be possible to write Unicode strings to a Stata file by implementing a writer according to version 118 of the dta format.

~~I’d be interested in trying to submit a PR for this.~~ (Edit: I don’t use Stata anymore)

Expected Output

Stata file written to disk.

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.18.7.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.10.0
xarray: None
IPython: 7.0.1
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None

Issue Analytics

State:
Created 5 years ago
Reactions:3
Comments:13 (11 by maintainers)

Top GitHub Comments

1reaction

bashtagecommented, Dec 11, 2019

Yes. Someone could write a format 118 or 119 writer that supports unicode.

1reaction

bashtagecommented, Dec 18, 2018

Yeah, it should be reopened. It closed an issue in this issue, rather than the issue.

Top Results From Across the Web

pandas - Python cannot export to Stata due to unicode problem?

I'm trying to export a dataframe in ...

pandas.DataFrame.to_stata — pandas 1.5.2 documentation

Versions 118 and 119 support Unicode characters, and version 119 supports more than 32,767 variables. Version 119 should usually only be used when...

Unicode support - Stata

Did you know Stata supports Unicode? ... Unicode encodes all the world's characters, meaning we can write Hello, Здравствуйте, こんにちは, and a lot...

Unicode HOWTO — Python 3.11.1 documentation

Release, 1.12,. This HOWTO discusses Python's support for the Unicode specification for representing textual data, and explains various problems that people ...

Stata 14, Unicode, and extended ASCII. - Statalist

But there is a problem with the switch to Unicode for languages that so far used extended ASCII for some characters (German,