Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallel=True in nopython mode seems to make memory leak

See original GitHub issue

Reporting a bug

So I am doing a ‘multiplication’ of some sort. It works fine without ‘parallel=True’ and even though it gives me the right results while in parallel, it eats up my RAM quite quickly.

macOS 12.4 (MacBook Pro 16 with M1 Pro)
Python 3.9.13
numba 0.56.0, numpy 1.19.5, llvmlite 0.39.0

So the first function multiplication works perfectly fine, but as I am using it over many iterations in my full code, I need to make it much faster. So I tried to make it run in parallel, but it seems to make a memory leak that makes my (previously working) program crash after some number of iterations.

Here is my simplified code:

import numpy as np
from numba import njit, get_num_threads, prange, float64
from numba.typed import List, Dict

@njit
def multiplication(left, right, operator, length):
    result_functional = np.zeros(shape=length, dtype=float64)
    for i, left_value in enumerate(left):
        for j, right_value in enumerate(right):
            for k, count in operator[i][j].items():
                result_functional[k] += count * left_value * right_value
    return result_functional

@njit(parallel=True)
def pmultiplication(left, right, operator, length):
    result_functional = np.zeros(shape=length, dtype=float64)
    num_loads = get_num_threads()
    load_size = len(left) / num_loads
    for n in prange(num_loads):
        result_functional += multiplication(left[round(n * load_size): round((n+1) * load_size)], right,
                                            operator[round(n * load_size): round((n+1) * load_size)], length)
    return result_functional

@njit
def random_operator(dim, trunc):
    side_length = np.sum(dim ** np.arange(trunc + 1))
    full_length = np.sum(dim ** np.arange(2*trunc + 1))
    numba_shuffle_operator = List()
    for i in range(side_length):
        numba_list = List()
        for j in range(side_length):
            num = np.random.randint(low=1, high=(i+j+2))
            keys = np.random.randint(low=0, high=full_length, size=num)
            values = np.random.randint(low=1, high=num+1, size=num)
            numba_dict = Dict()
            for k in range(num):
                numba_dict[keys[k]] = values[k]
            numba_list.append(numba_dict)
        numba_shuffle_operator.append(numba_list)
    return numba_shuffle_operator


def prepare_multiplication(dim, trunc):
    operator = random_operator(dim, trunc)
    left = np.random.random(size=len(operator))
    right = np.random.random(size=len(operator[0]))
    length = np.sum(dim ** np.arange(2 * trunc + 1))
    return left, right, operator, length

# # compilation
left, right, operator, length = prepare_multiplication(dim=2, trunc=2)
multiplication(left, right, operator, length)
pmultiplication(left, right, operator, length)

# # parameters that will be used
left, right, operator, length = prepare_multiplication(dim=3, trunc=4)

And here is my code to show how the memory is eaten up:

import psutil
import pandas as pd
from IPython.display import display

num_iter = 100
process = psutil.Process()

cached_rss = np.array([process.memory_info().rss])
for i in range(num_iter):
    multiplication(left, right, operator, length)
    cached_rss = np.append(cached_rss, process.memory_info().rss)

cached_rss_parallel = np.array([process.memory_info().rss])
for i in range(num_iter):
    pmultiplication(left, right, operator, length)
    cached_rss_parallel = np.append(cached_rss_parallel, process.memory_info().rss)

memory = pd.DataFrame()
memory["diff since start"] = cached_rss - cached_rss[0]
memory["diff since last"] = 0
memory["diff since last"][1:] = np.diff(cached_rss, n=1)
memory["diff since start (parallel)"] = cached_rss_parallel - cached_rss_parallel[0]
memory["diff since last (parallel)"] = 0
memory["diff since last (parallel)"][1:] = np.diff(cached_rss_parallel, n=1)

pd.options.display.float_format = '{:,.0f}'.format
display(memory.tail(5).astype(float))
print(f'mean: {np.mean(memory["diff since last (parallel)"]):,.2f}')
print(f'standard dev: {np.std(memory["diff since last (parallel)"]):,.2f}')