Can't open large csv file, error in "astropy/io/ascii/cparser.pyx" file
See original GitHub issueCan not read a semi-large (200 Mb) .csv file (the survey_results_public.csv
file from the 2018 SO survey)
from astropy.io import ascii
data = ascii.read('survey_results_public.csv', format='fast_csv', guess=False)
Traceback:
Traceback (most recent call last):
File "astropy/io/ascii/cparser.pyx", line 611, in astropy.io.ascii.cparser.CParser._convert_data
File "astropy/io/ascii/cparser.pyx", line 688, in astropy.io.ascii.cparser.CParser._convert_int
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "astropy/io/ascii/cparser.pyx", line 626, in astropy.io.ascii.cparser.CParser._convert_data
File "astropy/io/ascii/cparser.pyx", line 743, in astropy.io.ascii.cparser.CParser._convert_float
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "SO_developer_survey.py", line 6, in <module>
data = ascii.read('/home/gabriel/Descargas/survey_results_public.csv', format='fast_csv', guess=False)
File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/astropy/io/ascii/ui.py", line 409, in read
dat = reader.read(table)
File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/astropy/io/ascii/fastbasic.py", line 128, in read
data, comments = self.engine.read(try_int, try_float, try_string)
File "astropy/io/ascii/cparser.pyx", line 397, in astropy.io.ascii.cparser.CParser.read
File "astropy/io/ascii/cparser.pyx", line 635, in astropy.io.ascii.cparser.CParser._convert_data
File "astropy/io/ascii/cparser.pyx", line 794, in astropy.io.ascii.cparser.CParser._convert_str
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
Packages in conda
:
$ conda list
# packages in environment at /home/gabriel/miniconda3/envs/py3:
#
# Name Version Build Channel
asn1crypto 0.24.0 py37_0
astropy 3.1.1 py37h7b6447c_0
atomicwrites 1.2.1 py37_0
attrs 18.2.0 py37h28b3542_0
backcall 0.1.0 py37_0 anaconda
blas 1.0 mkl
bleach 3.0.2 py37_0 anaconda
bzip2 1.0.6 h14c3975_1002 conda-forge
ca-certificates 2018.12.5 0
celerite 0.3.0 py37h637b7d7_1000 conda-forge
certifi 2018.11.29 py37_0
cffi 1.11.5 py37he75722e_1
chardet 3.0.4 py37_1
cryptography 2.4.2 py37h1ba5d50_0
cycler 0.10.0 py37_0
dbus 1.13.2 h714fa37_1
decorator 4.3.0 py37_0 anaconda
emcee 2.2.1 pyh24bf2e0_4 astropy
entrypoints 0.2.3 py37_2 anaconda
expat 2.2.6 he6710b0_0
fontconfig 2.13.0 h9420a91_0
freetype 2.9.1 h8a8886c_1
glib 2.56.2 hd408876_0
gmp 6.1.2 hb3b607b_0 anaconda
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
icu 58.2 h9c2bf20_1
idna 2.8 py37_0
intel-openmp 2019.0 118
ipykernel 5.1.0 py37h39e3cac_0 anaconda
ipython 7.2.0 py37h39e3cac_0 anaconda
ipython_genutils 0.2.0 py37_0 anaconda
ipywidgets 7.4.2 py37_0 anaconda
jedi 0.13.1 py37_0 anaconda
jinja2 2.10 py37_0 anaconda
jpeg 9b h024ee3a_2
jsonschema 2.6.0 py37_0 anaconda
jupyter 1.0.0 py37_7 anaconda
jupyter_client 5.2.3 py37_0 anaconda
jupyter_console 6.0.0 py37_0 anaconda
jupyter_core 4.4.0 py37_0 anaconda
kiwisolver 1.0.1 py37hf484d3e_0
libedit 3.1.20170329 h6b74fdf_2
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libgfortran-ng 7.3.0 hdf63c60_0
libpng 1.6.35 hbc83047_0
libsodium 1.0.16 h1bed415_0 anaconda
libstdcxx-ng 8.2.0 hdf63c60_1
libuuid 1.0.3 h1bed415_2
libxcb 1.13 h1bed415_1
libxml2 2.9.8 h26e45fe_1
markupsafe 1.1.0 py37h7b6447c_0 anaconda
matplotlib 3.0.1 py37h5429711_0
mistune 0.8.4 py37h7b6447c_0 anaconda
mkl 2019.1 144
mkl_fft 1.0.10 py37ha843d7b_0
mkl_random 1.0.2 py37h637b7d7_2 conda-forge
more-itertools 4.3.0 py37_0
nbconvert 5.3.1 py37_0 anaconda
nbformat 4.4.0 py37_0 anaconda
ncurses 6.1 hf484d3e_0
notebook 5.7.2 py37_0 anaconda
numpy 1.15.4 py37h7e9f1db_0
numpy-base 1.15.4 py37hde5b4d6_0
openssl 1.1.1a h7b6447c_0
pandoc 2.2.3.2 0 anaconda
pandocfilters 1.4.2 py37_1 anaconda
parso 0.3.1 py37_0 anaconda
pcre 8.42 h439df22_0
pexpect 4.6.0 py37_0 anaconda
pickleshare 0.7.5 py37_0 anaconda
pip 10.0.1 py37_0
pluggy 0.8.0 py37_0
prometheus_client 0.4.2 py37_0 anaconda
prompt_toolkit 2.0.7 py37_0 anaconda
psutil 5.4.7 py37h14c3975_0
ptyprocess 0.6.0 py37_0 anaconda
py 1.7.0 py37_0
pybind11 2.2.4 <pip>
pycparser 2.19 py37_0
pygments 2.2.0 py37_0 anaconda
pyopenssl 18.0.0 py37_0
pyparsing 2.2.2 py37_0
pyqt 5.6.0 py37h22d08a2_6 anaconda
pysocks 1.6.8 py37_0
pytest 3.9.1 py37_0
pytest-arraydiff 0.2 py37h39e3cac_0
pytest-astropy 0.4.0 py37_0
pytest-doctestplus 0.1.3 py37_0
pytest-openfiles 0.3.0 py37_0
pytest-remotedata 0.3.0 py37_0
python 3.7.1 h0371630_7
python-dateutil 2.7.3 py37_0
pytz 2018.5 py37_0
pyzmq 17.1.2 py37h14c3975_0 anaconda
qt 5.6.3 h8bf5577_3 anaconda
qtconsole 4.4.2 py37_0 anaconda
readline 7.0 h7b6447c_5
requests 2.21.0 py37_0
scipy 1.1.0 py37hfa4b5c9_1
send2trash 1.5.0 py37_0 anaconda
setuptools 40.4.3 py37_0
sip 4.18.1 py37hf484d3e_2 anaconda
six 1.11.0 py37_1
sqlite 3.25.3 h7b6447c_0 anaconda
terminado 0.8.1 py37_1 anaconda
testpath 0.4.2 py37_0 anaconda
tk 8.6.8 hbc83047_0
tornado 5.1.1 py37h7b6447c_0
traitlets 4.3.2 py37_0 anaconda
urllib3 1.24.1 py37_0
wcwidth 0.1.7 py37_0 anaconda
webencodings 0.5.1 py37_1 anaconda
wheel 0.32.2 py37_0
widgetsnbextension 3.4.2 py37_0 anaconda
xz 5.2.4 h14c3975_4
zeromq 4.2.5 hf484d3e_1 anaconda
zlib 1.2.11 ha838bed_2
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (14 by maintainers)
Top Results From Across the Web
Google Colab Can't Open Large CSV File - Stack Overflow
I have been trying to open a large .csv file in Google Colab for several hours now and keep getting the error: ParserError:...
Read more >JR55180: DataLoad might fail on large CSV files that ... - IBM
Error description. The CSVReader class reads the CSV file one buffer at a time and the buffer size is hardcoded at 10000 characters....
Read more >Error - unable to read the csv file in pandas
Hi Rishab,. Seems that pandas is not able to find the file, check if the file 'data.csv' is in same directory as the...
Read more >How To Open Large CSV Files - Gigasheet
How to open big CSV files if the data set is too large for Excel. Gigasheet makes working with large files as easy...
Read more >How to resolve a UnicodeDecodeError for a CSV file - Kaggle
_string_convert() pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
With
fast_reader=False
it immediately eats up all the available RAM (~6 Gb) and hangs the entire system.Ok, just wanted to know if
astropy
was expected to handle this. Thanks everyone.