question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A mystery to be debugged soon:

import pandas as pd
import numpy as np

arr = np.random.randn(100000, 5)

def leak():
    for i in xrange(10000):
        df = pd.DataFrame(arr.copy())
        result = df.xs(1000)
        # result = df.ix[5000]

if __name__ == '__main__':
    leak()

Issue Analytics

  • State:closed
  • Created 11 years ago
  • Reactions:9
  • Comments:20 (5 by maintainers)

github_iconTop GitHub Comments

16reactions
alanjdscommented, Aug 22, 2018

For the record, we (+@sbneto) are using this in preduction for a bit of time, and is doing very good:

# monkeypatches.py

# Solving memory leak problem in pandas
# https://github.com/pandas-dev/pandas/issues/2659#issuecomment-12021083
import pandas as pd
from ctypes import cdll, CDLL
try:
    cdll.LoadLibrary("libc.so.6")
    libc = CDLL("libc.so.6")
    libc.malloc_trim(0)
except (OSError, AttributeError):
    libc = None

__old_del = getattr(pd.DataFrame, '__del__', None)

def __new_del(self):
    if __old_del:
        __old_del(self)
    libc.malloc_trim(0)

if libc:
    print('Applying monkeypatch for pd.DataFrame.__del__', file=sys.stderr)
    pd.DataFrame.__del__ = __new_del
else:
    print('Skipping monkeypatch for pd.DataFrame.__del__: libc or malloc_trim() not found', file=sys.stderr)
9reactions
wesmcommented, Jan 8, 2013

Ok, this is, in a word, f*cked up. If I add gc.collect to that for loop it stops leaking memory:

import pandas as pd
import numpy as np
import gc

arr = np.random.randn(100000, 5)

def leak():
    pd.util.testing.set_trace()
    for i in xrange(10000):
        df = pd.DataFrame(arr.copy())
        result = df.xs(1000)
        gc.collect()
        # result = df.ix[5000]

if __name__ == '__main__':
    leak()

There are objects here that only get garbage collected when the cyclic GC runs. What’s the solution here, break cycle explicitly in __del__ so the Python memory allocator stops screwing us?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Garbage Collection and Cyclic References in Java - Baeldung
In this quick article, we'll see how the JVM makes sure to collect the unreachable but cyclic references. First, we'll explore different ...
Read more >
How does Java Garbage Collection work with Circular ...
Java's GC considers objects "garbage" if they aren't reachable through a chain starting at a garbage collection root, so these objects will ...
Read more >
Garbage Collector Design - Python Developer's Guide
This is the cyclic garbage collector, usually called just Garbage Collector (GC), even though reference counting is also a form of garbage collection....
Read more >
Garbage collection in Python: things you need to know
The reference counting algorithm has a lot of issues, such as circular references, thread locking, and memory and performance overhead.
Read more >
Garbage Collection and Application Performance - Dynatrace
The more objects die, the faster garbage collection is. If every object in the heap were to be garbage-collected, the GC cycle would...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found