Arrow: large memory usage, error when opening files
See original GitHub issueI’m trying to open a rather large (14 GB) Arrow IPC stream file:
>>> import vaex
df = vaex.open("of.arrow")
# Python is now using 5-6 GB RAM
>>> df.head()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/me/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 3431, in head
return self[:min(n, len(self))]
File "/home/me/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 4604, in __getitem__
df = self.trim()
File "/home/me/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 3839, in trim
df = self if inplace else self.copy()
File "/home/me/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 5011, in copy
df.add_column(name, column, dtype=self._dtypes_override.get(name))
File "/home/me/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 6019, in add_column
super(DataFrameArrays, self).add_column(name, data, dtype=dtype)
File "/home/me/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 2928, in add_column
raise ValueError("array is of length %s, while the length of the DataFrame is %s" % (len(ar), self.length_original()))
ValueError: array is of length 206, while the length of the DataFrame is 5627352
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
issue with memory usage with arrow package in R
I try to use arrow as a package developed for manipulations with data over the RAM size. After reading the csv-file with read_csv_arrow...
Read more >memory consumption question #2874 - apache/arrow - GitHub
This takes a very long time and it consumes around 50 GB memory(using top command to check memory used) and sometimes fails with...
Read more >[Python] Why does reading an arrow file cause almost double ...
(Note that to minimize the memory usage, > you should also pass use_threads=False. In that case, the maximum memory > overhead should be...
Read more >Memory Management — Apache Arrow v10.0.1
Arrow provides a tree-based model for memory allocation. The RootAllocator is created first, then more allocators are created as children of an existing ......
Read more >Excel 2016 not opening few xlsx files - There is not enough ...
If the error only appear to specific files, I'd recommend you right click the file you want to open, select "Properties", then click...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
feather can have compressed data, and with the current implementation of how we read feather, it will decompress into memory. You could try saving to IPC arrow format, or hdf5 instead.
The memory usage is odd, you could try #517 if you feel like living on the edge. The next major version (or maybe sooner) will include this branch.