question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Nuitka: memory leak in dataclasses_json library (type(name, bases, dict) changes __module__ value after init()).

See original GitHub issue
  • Nuitka version 0.6.19.4 Commercial: None Python: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] Flavor: Unknown Executable: C:\Temp\type3ex.venv\Scripts\python.exe OS: Windows Arch: x86_64 WindowsRelease: 10

  • How did you install Nuitka and Python pip install nuitka

  • The specific PyPI names and versions python -m pip freeze dataclasses-json==0.5.6 marshmallow==3.14.1 marshmallow-enum==1.5.1 mypy-extensions==0.4.3 Nuitka==0.6.19.4 typing-extensions==4.0.1 typing-inspect==0.7.1

  • Provide in your issue the Nuitka options used nuitka --standalone type3ex.py

  • Not a regression - previous versions seems have the issue

  • Short example type3ex.py :

import gc
from dataclasses import dataclass
from dataclasses_json import dataclass_json, Undefined
from marshmallow import class_registry

@dataclass_json(undefined=Undefined.EXCLUDE)
@dataclass
class Test:
    name: str

    def _init_(self):
        print(self.__module__)
        super().__init__()

for i in range(0, 5):
    Test.schema()

    gc.collect()
    print(len(gc.get_objects()))  # constantly increasing value in nuitka

    print(i, [each.__module__ for each in getattr(class_registry, '_registry')['TestSchema']])  # "leaking" refs

Running example:

  • by Python:
(.venv) C:\Temp\type3ex>python type3ex.py
14220
0 ['marshmallow.schema']
14249
1 ['marshmallow.schema']
14249
2 ['marshmallow.schema']
14249
3 ['marshmallow.schema']
14249
  • Packed by nuitka:
(.venv) C:\Temp\type3ex>type3ex.dist\type3ex.exe       
14773
0 ['dataclasses_json.mm']
14801
1 ['dataclasses_json.mm', 'dataclasses_json.mm']
14827
2 ['dataclasses_json.mm', 'dataclasses_json.mm', 'dataclasses_json.mm']
14853
3 ['dataclasses_json.mm', 'dataclasses_json.mm', 'dataclasses_json.mm', 'dataclasses_json.mm']
14879
4 ['dataclasses_json.mm', 'dataclasses_json.mm', 'dataclasses_json.mm', 'dataclasses_json.mm', 'dataclasses_json.mm']

Investigation shows that there is possible issue in nuitka/build/static_src/HelpersBuiltin.c BUILTIN_TYPE3() method

PyObject *BUILTIN_TYPE3(PyObject *module_name, PyObject *name, PyObject *bases, PyObject *dict) {
    PyObject *pos_args = PyTuple_New(3);
    PyTuple_SET_ITEM(pos_args, 0, name);
    Py_INCREF(name);
    PyTuple_SET_ITEM(pos_args, 1, bases);
    Py_INCREF(bases);
    PyTuple_SET_ITEM(pos_args, 2, dict);
    Py_INCREF(dict);

    PyObject *result = PyType_Type.tp_new(&PyType_Type, pos_args, NULL);

    if (unlikely(result == NULL)) {
        Py_DECREF(pos_args);
        return NULL;
    }

    PyTypeObject *type = Py_TYPE(result);

    if (likely(PyType_IsSubtype(type, &PyType_Type))) {
        if (NuitkaType_HasFeatureClass(type) && type->tp_init != NULL) {
            int res = type->tp_init(result, pos_args, NULL);

            if (unlikely(res < 0)) {
                Py_DECREF(pos_args);
                Py_DECREF(result);
                return NULL;
            }
        }
    }

    Py_DECREF(pos_args);

    int res = PyObject_SetAttr(result, const_str_plain___module__, module_name);

    if (res < 0) {
        return NULL;
    }

    return result;
}

Method PyObject_SetAttr(result, const_str_plain___module__, module_name) called after type->tp_init(result, pos_args, NULL) so in init() method of parent object __module__ value can differ from __module__ value after object creation. And thus every code that uses type(name, bases, dict) and in init() function rely on __module__ value may encounter unexpected behavior (In this case it’s memory leak). Moreover - the final __module__ value in nuitka differ from such in python.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
kayhayencommented, Feb 3, 2022

The memory leak might come from not releasing the result when it cannot set the __module__ value, which though it seems it does do, but in that instance a Py_DECREF was missing. With this code it works and does not leak:

    if (HAS_ATTR_BOOL(result, const_str_plain___module__) == false) {
        int res = SET_ATTRIBUTE(result, const_str_plain___module__, module_name);

        if (res < 0) {
            Py_DECREF(result);
            return NULL;
        }
    }

This will not overwrite existing values, and also release in case of an error. Thanks for your report. This will be in the next hotfix of Nuitka and on factory shortly.

0reactions
VyacheslavVlasenkocommented, Feb 7, 2022

Good news! Thanks for the quick solution.

Read more comments on GitHub >

github_iconTop Results From Across the Web

dataclasses-json
Dataclasses JSON. This library provides a simple API for encoding and decoding dataclasses to and from JSON. It's very easy to get started....
Read more >
Memory leak in python when defining dictionaries in functions
Answer after comment discussion. Memory management is handled by python itself using Garbage Collection. Normally you should not touch this ...
Read more >
Issue 25410: Clean up and fix OrderedDict - Python tracker
So if you want to change the behavior of OrderedDict you must be sure ... in <module> TypeError: unsupported operand type(s) for +:...
Read more >
Tracking Down a Freaky Python Memory Leak - Benoit Bernard
The more Python objects you create and keep alive, the more memory you consume. This is why I turned to objgraph, a very...
Read more >
Just use Dictionaries - Agile & Coding
A Python dictionary has a simple & well-known API. It is possible to merge data using a nice & minimalistic syntax, without mutating...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found