question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Leaks memory when input is not a numpy array

See original GitHub issue

If you run the following program you see that nansum leaks all the memory it are given when passed a Pandas object. If it is passed the ndarray underlying the Pandas object instead then there is no leak:

import psutil
import gc

def f():
    x = np.zeros(10*1024*1024, dtype='f4')

    # Leaks 40MB/iteration
    bottleneck.nansum(pd.Series(x))
    # No leak:
    #bottleneck.nansum(x)

process = psutil.Process(os.getpid())

def _get_usage():
    gc.collect()
    return process.memory_info().private / (1024*1024)

last_usage = _get_usage()
print(last_usage)

for _ in range(10):
    f()
    usage = _get_usage()
    print(usage - last_usage)
    last_usage = usage

This affects not just nansum, but apparently all the reduction functions (with or without axis specified), and at least some other functions like move_max.

I’m not completely sure why this happens, but maybe it’s because PyArray_FROM_O is allocating a new array in this case, and the ref count of that is not being decremented by anyone? https://github.com/kwgoodman/bottleneck/blob/master/bottleneck/src/reduce_template.c#L1237

I’m using Bottleneck 1.2.1 with Pandas 0.23.1. sys.version is 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)].

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:15 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
kwgoodmancommented, Jan 5, 2019

OK, I merged the memory leak fix into master.

0reactions
tensionheadcommented, Nov 16, 2022

Sorry, it’s actually fine… just if someone stumbles over this again I add this here: I underestimated how much memory np.sum and np.nansum use temporarily. Here is a profile for both sum operations, with either only numpy arrays or a mix of one array and one h5py.Dataset like np.sum([arr, dset]). A single array/dataset has 256MB, and we always create/operate on two of those:

Screenshot from 2022-11-16 13:59:41

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does setting numpy arrays to None free memory?
When you use del some_matrix as suggested by MSeifert, the memory is not freed immediately as opposed to what the answer says.
Read more >
NumPy views: saving memory, leaking memory, and subtle bugs
Leaking memory with views ... One consequence of using views is that you might leak memory, rather than saving memory. This is because...
Read more >
Advanced debugging tools — NumPy v1.24 Manual
However, not all dtypes are singletons, so this might leak memory for different input. In rare cases NumPy uses malloc and not the...
Read more >
Strange memory use with applications from Python using ...
It looks like when the application is "deleted" (Does that exists in python?) after the read_image() function returns, the numpy array isn't ......
Read more >
Python Lists VS Numpy Arrays - GeeksforGeeks
Numpy is not another programming language but a Python extension module. ... Memory consumption between Numpy array and lists.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found