Large memory footprint in astropy.io.votable.parse_single_table
See original GitHub issueI was trying to read two columns from an ~ 1 GB votable file (demo Gaia dr2 data). The file itself contains ~ 96 columns. The code I used was:
from astropy.io.votable import parse_single_table
columns = ['phot_g_mean_mag', 'parallax']
table = parse_single_table("async_20190630210155.vot", columns=columns)
print("Done reading table")
Here’s the file info:
$ ls -alh async_20190630210155.vot
-rw-rw-rw- 1 msinha 1195219923 1.1G Jul 1 14:01 async_20190630210155.vot
Looking at the memory footprint, I saw that python was taking ~12 GB during the read and I cancelled the kernel (this is within a Jupyter notebook). Here’s my screenshot showing the memory usage:
While I know that there is a significant python overhead, it still seems like a lot of memory to read only 2 columns (out of 96). By my math, the (minimum) possible size is 2/96*1 GB ~ 0.02 GB
Since I am new to both astropy and votables, perhaps I am doing something incorrectly. Happy to provide further info or help debug, as necessary. In case there is something inherently wrong with the file itself, here’s a dropbox link to the file.
Cheers, Manodeep
Issue Analytics
- State:
- Created 4 years ago
- Comments:19 (19 by maintainers)
Top GitHub Comments
Probably wouldn’t be the case in the immigrant library 😉
I tried memory profiling by extracting the code that I thought was relevant into its own file for the profiler to crawl through. Rename this to
run_profiler.py
: run_profiler.py.txtThen I ran the command
python -m memory_profiler run_profiler.py
usingmemory-profiler
0.55.0. It took a good few hours! Here are the results that I got but I’ll have to come back and contemplate this later:VOTableFile.parse()
Higher level parse()