question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

memory leak in MultiIndex

See original GitHub issue

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
import numpy as np
import gc
import psutil

def totmem(p):
    # total memory used by process in MB
    info = p.memory_info()
    return 1e-6*(info.vms + info.rss)

dat_size = 6000000
# make dataset with no data
# uncomment for regular index
# dat = pd.DataFrame(index=np.arange(dat_size))
# uncomment for MultiIndex index
dat = pd.DataFrame(index=pd.MultiIndex.from_arrays((np.arange(dat_size), np.arange(dat_size))))
# make bool vector for subsetting
sub = np.ones(dat_size, dtype=bool)
# init psutil stuff
p = psutil.Process()
gc.collect()
ram = totmem(p)

for i in range(10):
    dat.iloc[sub, :]     # leak happens here
    gc.collect()
    print(int(totmem(p) - ram))
    ram = totmem(p)

Problem description

this program leaks at a rate of 191 MB / cycle; it eventually runs out of memory if the loop goes on indefinitely. In the program we use a bool vector to subset a dataset. If the dataset index is MultiIndex we observe a memory leak (as reported by the print statement). If the index is a regular mono-index no such leak is observed.

Expected Output

  • the program should not run out memory.
  • every output line should be zero.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

python: 3.6.6.final.0 python-bits: 64 OS: Linux pandas: 0.23.4 numpy: 1.15.1

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
mroeschkecommented, Jun 23, 2021

Numpy’s minimum version will be 1.17.3 in the next release (1.3) which shouldnt have this bug. Going to close but happy to reopen if this resurfaces

0reactions
Meta95commented, Jul 8, 2019

Can anyone confirm this issue is fixed for numpy 0.15.3 and above?

Update: I did an install of 0.15.3. OP’s script still reports the same memory leak. Will try some other versions of numpy and see if it got fixed anywhere.

Update 2: Turns out my environment was a mess. A proper upgrade to 0.15.3 does indeed solve the issue! My pandas version is 0.23.4 and python version 3.6.6 like the others up above. Please feel free to verify yourselves and close this ticket.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory leak in pandas multiIndex (with minmum reproducible ...
The issue cannot be due to lazy allocation of output , since that array is only 6Mb, as reported by output.nbytes , while...
Read more >
[Solved]-Memory leak from pyarrow?-Pandas,Python
Memory leak from pyarrow? · Memory leak when reading value from a Pandas Dataframe · How to delete multiple pandas (python) dataframes from...
Read more >
What's new in 0.25.0 (July 18, 2019) - Pandas
Printing of MultiIndex instances now shows tuples of each row and ensures that ... Fixed memory leak in DataFrame.to_json() when dealing with numeric...
Read more >
How to avoid Memory errors with Pandas
One strategy for solving this kind of problem is to decrease the amount of data by either reducing the number of rows or...
Read more >
Boost.MultiIndex Documentation - Tutorial - Index types - 1.75.0
Ranked indices provide the same interface as ordered indices plus several rank-related operations. The cost of this extra functionality is higher memory ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found