question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support writing unicode characters in df.to_stata()

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'a': ['丆']})
df.to_stata('test.dta')
# UnicodeEncodeError: 'latin-1' codec can't encode character '\u4e06' in position 0: ordinal not in range(256)

I picked an arbitrary CJK character to test this with.

Problem description

It would be possible to write Unicode strings to a Stata file by implementing a writer according to version 118 of the dta format.

I’d be interested in trying to submit a PR for this. (Edit: I don’t use Stata anymore)

Expected Output

Stata file written to disk.

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.18.7.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.10.0
xarray: None
IPython: 7.0.1
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:3
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
bashtagecommented, Dec 11, 2019

Yes. Someone could write a format 118 or 119 writer that supports unicode.

1reaction
bashtagecommented, Dec 18, 2018

Yeah, it should be reopened. It closed an issue in this issue, rather than the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas - Python cannot export to Stata due to unicode problem?
I'm trying to export a dataframe in ...
Read more >
pandas.DataFrame.to_stata — pandas 1.5.2 documentation
Versions 118 and 119 support Unicode characters, and version 119 supports more than 32,767 variables. Version 119 should usually only be used when...
Read more >
Unicode support - Stata
Did you know Stata supports Unicode? ... Unicode encodes all the world's characters, meaning we can write Hello, Здравствуйте, こんにちは, and a lot...
Read more >
Unicode HOWTO — Python 3.11.1 documentation
Release, 1.12,. This HOWTO discusses Python's support for the Unicode specification for representing textual data, and explains various problems that people ...
Read more >
Stata 14, Unicode, and extended ASCII. - Statalist
But there is a problem with the switch to Unicode for languages that so far used extended ASCII for some characters (German,
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found