to_stata: Fixed width strings in Stata .dta files are limited to 244 (or fewer)
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas
frame = pandas.DataFrame({'A':['h'*250,'hi','hola']})
frame.to_excel("text.xlsx", index=False)
frame.to_stata("test.dta")
Problem description
Raises the following error:
ValueError: Fixed width strings in Stata .dta files are limited to 244 (or fewer) characters. Column ‘A’ does not satisfy this restriction.
However this restriction seems to not exists in STATA, as the Excel file can be imported correctly Open STATA, import the Excel file
import excel "C:\data\tesi\software\text.xlsx", sheet("Sheet1") firstrow clear
Now we can get the type of data in column ‘A’, and as you can see, it’s str250. So STATA can store string longer than 244 characters
. describe A
A str250 %250s A
Expected Output
File gets exported with the correct format and without problems
Output of pd.show_versions()
pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 23.0.0 Cython: 0.24 numpy: 1.11.2 scipy: 0.18.1 statsmodels: None xarray: None IPython: 4.2.0 sphinx: 1.5.5 patsy: None dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: None tables: 3.2.3.1 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: 2.4.7 xlrd: 1.0.0 xlwt: None xlsxwriter: None lxml: 3.7.3 bs4: 4.5.3 html5lib: 0.999999999 httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.8 boto: None pandas_datareader: 0.2.1
Issue Analytics
- State:
- Created 6 years ago
- Comments:14 (11 by maintainers)
Top GitHub Comments
A mostly working implementation is here:
https://github.com/pandas-dev/pandas/compare/master...bashtage:strl-support
I noticed one larger potential advantage of this – when writing largish data files with strings > 8 characters StrLs can reduce file size significantly if there are many repeated values. They can also reduce files size when writing sparse strings again as long as the maximum string length is > 8 characters (this happens since blank strings are replaced with an 8 bute uinteger).
Thanks for bumping. Let’s re-open it.
Are there numbers on which versions of Stata are actually used? Should we care at all about anything older than Stata 15?
This also seems sensible. If it isn’t too much additional effort to implement and maintain, then that’s best. Otherwise, it’s best to just make a clean break.