question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to read in big csv file

See original GitHub issue

This is most probably related to #7628

Given a large, 19G csv file, and I’m unable to read it using astropy only.

When using the defaults naively, I run into a Segfault pretty quickly (within a minute):

from astropy.table import Table
testd=Table.read('test_set.csv')

Opting out of the fast reader, and turning of guessing etc, it’s still running now after 50 mins, already eating up substantial amount of memory, so I’m killing it:

119752 bsipocz   20   0  204.2g 197.6g  14628 R  99.3 39.2  49:00.38 ipython   
117282 bsipocz   20   0   33.9g  29.7g  19440 S   0.0  5.9   9:48.62 ipython     

The second session above is the one where the file was read in using pandas. Converting that DataFrame to Table then of course works nicely.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:14 (13 by maintainers)

github_iconTop GitHub Comments

3reactions
saimncommented, Oct 11, 2018

So with a small fix I’m able to load the file (PR to come):

❯ python test.py
324.1676182746887 sec.
453653104 lines

Quite fast actually! The issue is that the array storing the column length uses an integer. This file has a lot of lines, and the array size gets too big for a 32 bits integer.

0reactions
ajtanskanencommented, Oct 11, 2018

This is what trying to open test_set.csv on a Macbook Pro (2015, 16 GB, OSX 10.14) looks like:

Python(3685,0x11259b5c0) malloc: can’t allocate region *** mach_vm_map(size=18446744073608888320) failed (error code=3) Python(3685,0x11259b5c0) malloc: *** set a breakpoint in malloc_error_break to debug Segmentation fault: 11

That is a quite large malloc size.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issue while reading large CSV file - UiPath Community Forum
Hi, I am having problem in reading large CSV file. The file size is 260 MB. There are approximately 300,00 records. Some time...
Read more >
Reading a big csv file in php, can't read all the file
A file that big can't fit into memory, especially not in PHP, which stores a lot of additional data with every variable created....
Read more >
Optimized ways to Read Large CSVs in Python - Medium
Problem: Importing (reading) a large CSV file leads Out of Memory error. Not enough RAM to read the entire CSV at once crashes...
Read more >
How To Open Large CSV Files - Gigasheet
How to open big CSV files if the data set is too large for Excel. Gigasheet makes working with large files as easy...
Read more >
Issues reading big CSV file despite using CSV.Row - Data
If you didn't expect this maybe the delimiter has been incorrectly identified by CSV.jl - I believe the first 10 rows are used...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found