question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't open large csv file, error in "astropy/io/ascii/cparser.pyx" file

See original GitHub issue

Can not read a semi-large (200 Mb) .csv file (the survey_results_public.csv file from the 2018 SO survey)

from astropy.io import ascii
data = ascii.read('survey_results_public.csv', format='fast_csv', guess=False)

Traceback:

Traceback (most recent call last):
  File "astropy/io/ascii/cparser.pyx", line 611, in astropy.io.ascii.cparser.CParser._convert_data
  File "astropy/io/ascii/cparser.pyx", line 688, in astropy.io.ascii.cparser.CParser._convert_int
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "astropy/io/ascii/cparser.pyx", line 626, in astropy.io.ascii.cparser.CParser._convert_data
  File "astropy/io/ascii/cparser.pyx", line 743, in astropy.io.ascii.cparser.CParser._convert_float
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "SO_developer_survey.py", line 6, in <module>
    data = ascii.read('/home/gabriel/Descargas/survey_results_public.csv', format='fast_csv', guess=False)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/astropy/io/ascii/ui.py", line 409, in read
    dat = reader.read(table)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/astropy/io/ascii/fastbasic.py", line 128, in read
    data, comments = self.engine.read(try_int, try_float, try_string)
  File "astropy/io/ascii/cparser.pyx", line 397, in astropy.io.ascii.cparser.CParser.read
  File "astropy/io/ascii/cparser.pyx", line 635, in astropy.io.ascii.cparser.CParser._convert_data
  File "astropy/io/ascii/cparser.pyx", line 794, in astropy.io.ascii.cparser.CParser._convert_str
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Packages in conda:

$ conda list
# packages in environment at /home/gabriel/miniconda3/envs/py3:
#
# Name                    Version                   Build  Channel
asn1crypto                0.24.0                   py37_0  
astropy                   3.1.1            py37h7b6447c_0  
atomicwrites              1.2.1                    py37_0  
attrs                     18.2.0           py37h28b3542_0  
backcall                  0.1.0                    py37_0    anaconda
blas                      1.0                         mkl  
bleach                    3.0.2                    py37_0    anaconda
bzip2                     1.0.6             h14c3975_1002    conda-forge
ca-certificates           2018.12.5                     0  
celerite                  0.3.0           py37h637b7d7_1000    conda-forge
certifi                   2018.11.29               py37_0  
cffi                      1.11.5           py37he75722e_1  
chardet                   3.0.4                    py37_1  
cryptography              2.4.2            py37h1ba5d50_0  
cycler                    0.10.0                   py37_0  
dbus                      1.13.2               h714fa37_1  
decorator                 4.3.0                    py37_0    anaconda
emcee                     2.2.1              pyh24bf2e0_4    astropy
entrypoints               0.2.3                    py37_2    anaconda
expat                     2.2.6                he6710b0_0  
fontconfig                2.13.0               h9420a91_0  
freetype                  2.9.1                h8a8886c_1  
glib                      2.56.2               hd408876_0  
gmp                       6.1.2                hb3b607b_0    anaconda
gst-plugins-base          1.14.0               hbbd80ab_1  
gstreamer                 1.14.0               hb453b48_1  
icu                       58.2                 h9c2bf20_1  
idna                      2.8                      py37_0  
intel-openmp              2019.0                      118  
ipykernel                 5.1.0            py37h39e3cac_0    anaconda
ipython                   7.2.0            py37h39e3cac_0    anaconda
ipython_genutils          0.2.0                    py37_0    anaconda
ipywidgets                7.4.2                    py37_0    anaconda
jedi                      0.13.1                   py37_0    anaconda
jinja2                    2.10                     py37_0    anaconda
jpeg                      9b                   h024ee3a_2  
jsonschema                2.6.0                    py37_0    anaconda
jupyter                   1.0.0                    py37_7    anaconda
jupyter_client            5.2.3                    py37_0    anaconda
jupyter_console           6.0.0                    py37_0    anaconda
jupyter_core              4.4.0                    py37_0    anaconda
kiwisolver                1.0.1            py37hf484d3e_0  
libedit                   3.1.20170329         h6b74fdf_2  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 8.2.0                hdf63c60_1  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.35               hbc83047_0  
libsodium                 1.0.16               h1bed415_0    anaconda
libstdcxx-ng              8.2.0                hdf63c60_1  
libuuid                   1.0.3                h1bed415_2  
libxcb                    1.13                 h1bed415_1  
libxml2                   2.9.8                h26e45fe_1  
markupsafe                1.1.0            py37h7b6447c_0    anaconda
matplotlib                3.0.1            py37h5429711_0  
mistune                   0.8.4            py37h7b6447c_0    anaconda
mkl                       2019.1                      144  
mkl_fft                   1.0.10           py37ha843d7b_0  
mkl_random                1.0.2            py37h637b7d7_2    conda-forge
more-itertools            4.3.0                    py37_0  
nbconvert                 5.3.1                    py37_0    anaconda
nbformat                  4.4.0                    py37_0    anaconda
ncurses                   6.1                  hf484d3e_0  
notebook                  5.7.2                    py37_0    anaconda
numpy                     1.15.4           py37h7e9f1db_0  
numpy-base                1.15.4           py37hde5b4d6_0  
openssl                   1.1.1a               h7b6447c_0  
pandoc                    2.2.3.2                       0    anaconda
pandocfilters             1.4.2                    py37_1    anaconda
parso                     0.3.1                    py37_0    anaconda
pcre                      8.42                 h439df22_0  
pexpect                   4.6.0                    py37_0    anaconda
pickleshare               0.7.5                    py37_0    anaconda
pip                       10.0.1                   py37_0  
pluggy                    0.8.0                    py37_0  
prometheus_client         0.4.2                    py37_0    anaconda
prompt_toolkit            2.0.7                    py37_0    anaconda
psutil                    5.4.7            py37h14c3975_0  
ptyprocess                0.6.0                    py37_0    anaconda
py                        1.7.0                    py37_0  
pybind11                  2.2.4                     <pip>
pycparser                 2.19                     py37_0  
pygments                  2.2.0                    py37_0    anaconda
pyopenssl                 18.0.0                   py37_0  
pyparsing                 2.2.2                    py37_0  
pyqt                      5.6.0            py37h22d08a2_6    anaconda
pysocks                   1.6.8                    py37_0  
pytest                    3.9.1                    py37_0  
pytest-arraydiff          0.2              py37h39e3cac_0  
pytest-astropy            0.4.0                    py37_0  
pytest-doctestplus        0.1.3                    py37_0  
pytest-openfiles          0.3.0                    py37_0  
pytest-remotedata         0.3.0                    py37_0  
python                    3.7.1                h0371630_7  
python-dateutil           2.7.3                    py37_0  
pytz                      2018.5                   py37_0  
pyzmq                     17.1.2           py37h14c3975_0    anaconda
qt                        5.6.3                h8bf5577_3    anaconda
qtconsole                 4.4.2                    py37_0    anaconda
readline                  7.0                  h7b6447c_5  
requests                  2.21.0                   py37_0  
scipy                     1.1.0            py37hfa4b5c9_1  
send2trash                1.5.0                    py37_0    anaconda
setuptools                40.4.3                   py37_0  
sip                       4.18.1           py37hf484d3e_2    anaconda
six                       1.11.0                   py37_1  
sqlite                    3.25.3               h7b6447c_0    anaconda
terminado                 0.8.1                    py37_1    anaconda
testpath                  0.4.2                    py37_0    anaconda
tk                        8.6.8                hbc83047_0  
tornado                   5.1.1            py37h7b6447c_0  
traitlets                 4.3.2                    py37_0    anaconda
urllib3                   1.24.1                   py37_0  
wcwidth                   0.1.7                    py37_0    anaconda
webencodings              0.5.1                    py37_1    anaconda
wheel                     0.32.2                   py37_0  
widgetsnbextension        3.4.2                    py37_0    anaconda
xz                        5.2.4                h14c3975_4  
zeromq                    4.2.5                hf484d3e_1    anaconda
zlib                      1.2.11               ha838bed_2  

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
Gabriel-pcommented, Jan 25, 2019

With fast_reader=False it immediately eats up all the available RAM (~6 Gb) and hangs the entire system.

0reactions
Gabriel-pcommented, Jan 25, 2019

Ok, just wanted to know if astropy was expected to handle this. Thanks everyone.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Google Colab Can't Open Large CSV File - Stack Overflow
I have been trying to open a large .csv file in Google Colab for several hours now and keep getting the error: ParserError:...
Read more >
JR55180: DataLoad might fail on large CSV files that ... - IBM
Error description. The CSVReader class reads the CSV file one buffer at a time and the buffer size is hardcoded at 10000 characters....
Read more >
Error - unable to read the csv file in pandas
Hi Rishab,. Seems that pandas is not able to find the file, check if the file 'data.csv' is in same directory as the...
Read more >
How To Open Large CSV Files - Gigasheet
How to open big CSV files if the data set is too large for Excel. Gigasheet makes working with large files as easy...
Read more >
How to resolve a UnicodeDecodeError for a CSV file - Kaggle
_string_convert() pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found