question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Variable deletion consumes a lot of memory

See original GitHub issue

Hi team,

I have been having issues with pandas memory management. Specifically, there is an (at least for me) unavoidable peak of memory which occurs when attempting to remove variables from a data set. It should be (almost) free! I am getting rid of part of the data, but it still needs to allocate a big amount of memory producing MemoryErrors.

Just to give you a little bit of context, I am working with a DataFrame which contains 33M of rows and 500 columns (just a big one!), almost all of them numeric, in a machine with 360GB of RAM. The whole data set fits in memory and I can successfully apply some transformations to the variables. The problem comes when I need to drop a 10% of the columns contained in the table. It just produces a big peak of memory leading to a MemoryError. Before performing this operation, there are more than 80GB of memory available!.

I tried to use the following methods for removing the columns and all of them failed.

  • drop() with or without inplace parameter
  • pop()
  • reindex()
  • reindex_axis()
  • del df[column] in a loop over the columns to be removed
  • __delitem__(column) in a loop over the columns to be removed
  • pop() and drop() in a loop over the columns to be removed.
  • I also tried to reasign the columns overwritting the data frame using indexing with loc() and iloc() but it does not help.

I found that the drop method with inplace is the most efficient one but it still generates a huge peak.

I would like to discuss if there is there any way of implementing (or is it already implemented by any chance) a method for more efficiently removing variables without generating more memory consumption…

Thank you Iván

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:6
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
alvarouccommented, Nov 13, 2019

Is there any update on this issue? SO far two contradicting solutions have been proposed.

You are much more likely though to release memory if you use a more idiomatic.

df = df.drop(…, axis=1) This removes the top-level reference to the original frame. Note that none of this actually will garbage collect (and nothing will release the memory back to the os).

and

You can do this.

df =…

df2 = df.drop(…, axis=1) del dfd

What is the best way to delete a column without running out of memory?

0reactions
ianozsvaldcommented, May 14, 2021

@giangdaotr I’ve made a demo to show the cost of using del df[col] vs df.drop(...), the del solution in my example is indeed very expensive. I wonder if the block manager is duplicating RAM under certain conditions (which @jreback notes above). Demo here https://github.com/ianozsvald/ipython_memory_usage/blob/master/src/ipython_memory_usage/examples/example_usage_np_pd.ipynb (see In[16] onwards).

Personally I’m keen to know more because reasoning about memory using in Pandas (and when/if you get a view or a copy) is pretty tricky, I’m using my ipython_memory_usage tool to try to build up some demos. I’m happy to collect use cases here: https://github.com/ianozsvald/ipython_memory_usage/issues/30

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deleting variable does not erase its memory from RAM memory
Because my application is memory consuming, I am trying find way to erase variable that I don't need any more in my programs,...
Read more >
Clinging to memory: how Python function calls can increase ...
Solution #1: No local variable at all. If there's no extra reference, the original array can be removed from memory as soon as...
Read more >
3 Ways to Delete a Variable in Python (& Why?) - FavTutor
Learn how to delete a variable in python. ... Python uses dynamic memory allocation, i.e. memory allocation occurs during runtime.
Read more >
Delete cluster/variables to free memory space - NI Community
Re: Delete cluster/variables to free memory space​​ Experience shows that this function is the most misunderstood function from LV. In most cases ...
Read more >
4 Types of Memory Leaks in JavaScript and How to Get Rid Of ...
Memory leaks are a problem every developer has to face eventually. Even when working with memory-managed languages there are cases where memory ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found