read_sas crashes when reading time data
See original GitHub issueread_sas works just fine with my other tables. I suspect this may have something to do with converting timeseries or timedeltas, given the nature of the error message and the fact that this only happens with a table that contains time series.
> df = pd.read_sas(filename,encoding='utf-8')
File "<ipython-input-119-402090ec994d>", line 1, in <module>
vehicle = pd.read_sas('C:/Users/aliceell/Documents/schemas/new_mexico/dr527_veh_1214.sas7bdat',format='sas7bdat',encoding='ISO-8859-1')
File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\sas\sasreader.py", line 61, in read_sas
return reader.read()
File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\sas\sas7bdat.py", line 589, in read
rslt = self._chunk_to_dataframe()
File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\sas\sas7bdat.py", line 634, in _chunk_to_dataframe
rslt[name] = epoch + pd.to_timedelta(rslt[name], unit='d')
File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\tseries\timedeltas.py", line 90, in to_timedelta
values = _convert_listlike(arg._values, box=False, unit=unit)
File "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\tseries\timedeltas.py", line 78, in _convert_listlike
_ensure_object(arg), unit=unit, errors=errors)
File "pandas\tslib.pyx", line 2933, in pandas.tslib.array_to_timedelta64 (pandas\tslib.c:52393)
File "pandas\tslib.pyx", line 3253, in pandas.tslib.convert_to_timedelta64 (pandas\tslib.c:55946)
File "pandas\tslib.pyx", line 3598, in pandas.tslib.cast_from_unit (pandas\tslib.c:61059)
OverflowError: int too big to convert
Output of pd.show_versions()
warnings.warn(_use_error_msg)
INSTALLED VERSIONS
commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 37 Stepping 5, GenuineIntel byteorder: little LC_ALL: None LANG: en
pandas: 0.18.1 nose: 1.3.7 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.24.1 numpy: 1.11.1 scipy: 0.18.1 statsmodels: None xarray: None IPython: 5.1.0 sphinx: 1.4.6 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.6.1 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.3 lxml: 3.6.4 bs4: 4.5.1 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.42.0 pandas_datareader: None
I’m on Windows 7 64-bit.
Google’s giving me nothing. It’s difficult for me to provide sample data because I don’t actually have SAS and so anything I provide would have to be run through pandas or R or something similar to convert it to a CSV, potentially overwriting whatever’s causing the bug. But if you think it would be helpful, I can do that.
Issue Analytics
- State:
- Created 6 years ago
- Comments:9 (5 by maintainers)
Top GitHub Comments
Thanks! I had looked at the docs, but didn’t even know where to start…
Here’s the problem. I found the column where it breaks, it’s not the column I was thinking of (unsurprisingly). Apparently the data contains more than one datetime column, and it has several invalid dates like 2316-01-05, 3015-12-10, and 9988-09-07. So I guess I’ll just read it into R for now…
Thanks for your help! I would not have been able to find that problematic column without your help on the debugger.
Commandas are here but basically
u
a few times to go up frames to the interesting ones, I think aroundFile "C:\Users\aliceell\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\sas\sas7bdat.py", line 634, in _chunk_to_dataframe
should be informative. Press
l
to show the lines of code around there, andp <var>
to print variables. Likep name
to see the column, andp rslt
to see the dataframe. Make sure not to paste anything here that’s sensitive 😉