question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

free() invalid pointer

See original GitHub issue

Description When I shut down triton inference server, there’s one line: 3067267c406779d44c4cda84e61911b

Triton Information What version of Triton are you using? 21.12

Are you using the Triton container or did you build it yourself? Here’s the dockerfile:

FROM nvcr.io/nvidia/tritonserver:21.12-py3
LABEL maintainer="NVIDIA"
LABEL repository="tritonserver"

RUN apt-get update && apt-get -y install swig && apt-get -y install python3-dev && apt-get install -y cmake
RUN pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
RUN pip3 install -v kaldifeat

Here’s the model.py.

import kaldifeat

class TritonPythonModel:

    def initialize(self, args):
        pass

    def execute(self, requests):
        pass

    def finalize(self):
        """`finalize` is called only once when the model is being unloaded.
        Implementing `finalize` function is OPTIONAL. This function allows
        the model to perform any necessary clean ups before exit.
        """
        print('Cleaning up...')

config.pbtxt

name: "model"
backend: "python"
max_batch_size: 64

input [
  {
    name: "wav"
    data_type: TYPE_FP32
    dims: [-1]
  },
  {
    name: "wav_lens"
    data_type: TYPE_INT32
    dims: [1]
  }
]

output [
  {
    name: "speech"
    data_type: TYPE_FP16
    dims: [-1, 80]  # 80
  },
  {
    name: "speech_lengths"
    data_type: TYPE_INT32
    dims: [1]
  }
]

dynamic_batching {
    preferred_batch_size: [ 16, 32 ]
  }
instance_group [
    {
      count: 1
      kind: KIND_GPU
    }
]

To Reproduce

  1. Build docker based on the above dockerfile.
  2. Run the model_repo with model.py in it.
  3. Shut down triton by ‘ctrl-c’

Expected behavior Expect no such line.

I test on 2 different machine. Both will give this error? warning? One will not generate core, and another will generate a core file.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:22 (12 by maintainers)

github_iconTop GitHub Comments

3reactions
tanmayv25commented, Feb 2, 2022

The module kaldifeat has lots of leaks and invalid read/writes on import.

This can be verified using:

valgrind python3 -c "import kaldifeat; print(kaldifeat.__version__)"

However, we do not see the free() invalid pointer error in this case. Running Triton in valgrind with --trace-children=yes gives more details about the invalid free:

==16111== Invalid free() / delete / delete[] / realloc()
==16111==    at 0x483CFBF: operator delete(void*) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==16111==    by 0x13E724: pybind11::finalize_interpreter() (in /tmp/host/model_repo/test_model/triton_python_backend_stub)
==16111==    by 0x11C363: main (in /tmp/host/model_repo/test_model/triton_python_backend_stub)
==16111==  Address 0x44eedf48 is 24 bytes inside a block of size 65 alloc'd
==16111==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==16111==    by 0x5219378: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==16111==    by 0x521A271: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==16111==    by 0x521A327: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==16111==    by 0x521A5E1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==16111==    by 0x4B03EF77: ??? (in /usr/local/lib/python3.8/dist-packages/_kaldifeat.cpython-38-x86_64-linux-gnu.so)
==16111==    by 0x4B041147: ??? (in /usr/local/lib/python3.8/dist-packages/_kaldifeat.cpython-38-x86_64-linux-gnu.so)
==16111==    by 0x4B0301D8: ??? (in /usr/local/lib/python3.8/dist-packages/_kaldifeat.cpython-38-x86_64-linux-gnu.so)
==16111==    by 0x4B02941F: PyInit__kaldifeat (in /usr/local/lib/python3.8/dist-packages/_kaldifeat.cpython-38-x86_64-linux-gnu.so)
==16111==    by 0x4D7C095: _PyImport_LoadDynamicModuleWithSpec (in /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0)
==16111==    by 0x4D7E104: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0)
==16111==    by 0x4E34526: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0)
==16111== 

The trace demostrates the free() invalid pointer originates in pybind11::finalize_interpreter() clean-up. The issue comes up when importing kaldifeat with pybind11. A simple reproducer is described below:

main.cpp :

#include <pybind11/embed.h> // everything needed for embedding
#include <iostream>

namespace py = pybind11;

int main() {
    py::scoped_interpreter guard{}; // start the interpreter and keep it alive
    py::module_ kaldifeat = py::module_::import("kaldifeat");
    std::cerr << "Module Loaded" << std::endl;
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.17)
project(example)

include(FetchContent)

FetchContent_Declare(
  pybind11
  GIT_REPOSITORY "https://github.com/pybind/pybind11"
  GIT_TAG "v2.6"
  GIT_SHALLOW ON
)
FetchContent_MakeAvailable(pybind11)


add_executable(example main.cpp)
target_link_libraries(example PRIVATE pybind11::embed)
~                                                          

In the directory with these file run the following commands:

cmake .
make example
./example

When running the example we see the below issue:

./example 
Module Loaded
free(): invalid pointer
Aborted (core dumped)

The backtrace for the Invalid free for example:

==16174== Invalid free() / delete / delete[] / realloc()
==16174==    at 0x483CFBF: operator delete(void*) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==16174==    by 0x129EA3: void __gnu_cxx::new_allocator<std::_Fwd_list_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::destroy<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (in /tmp/host/py_invalid_free/example)
==16174==    by 0x1251CC: void std::allocator_traits<std::allocator<std::_Fwd_list_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::destroy<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(std::allocator<std::_Fwd_list_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) (in /tmp/host/py_invalid_free/example)
==16174==    by 0x120B06: std::_Fwd_list_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_erase_after(std::_Fwd_list_node_base*, std::_Fwd_list_node_base*) (in /tmp/host/py_invalid_free/example)
==16174==    by 0x11CF41: std::_Fwd_list_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::~_Fwd_list_base() (in /tmp/host/py_invalid_free/example)
==16174==    by 0x11C9DB: std::forward_list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::~forward_list() (in /tmp/host/py_invalid_free/example)
==16174==    by 0x11246C: pybind11::detail::internals::~internals() (in /tmp/host/py_invalid_free/example)
==16174==    by 0x11C05E: pybind11::finalize_interpreter() (in /tmp/host/py_invalid_free/example)
==16174==    by 0x11C14B: pybind11::scoped_interpreter::~scoped_interpreter() (in /tmp/host/py_invalid_free/example)
==16174==    by 0x10E5A5: main (in /tmp/host/py_invalid_free/example)
==16174==  Address 0x20f9cce8 is 24 bytes inside a block of size 65 alloc'd
==16174==    at 0x483BE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==16174==    by 0x4EA2378: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==16174==    by 0x4EA3271: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==16174==    by 0x4EA3327: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==16174==    by 0x4EA35E1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==16174==    by 0x47D5AF77: ??? (in /usr/local/lib/python3.8/dist-packages/_kaldifeat.cpython-38-x86_64-linux-gnu.so)
==16174==    by 0x47D5D147: ??? (in /usr/local/lib/python3.8/dist-packages/_kaldifeat.cpython-38-x86_64-linux-gnu.so)
==16174==    by 0x47D4C1D8: ??? (in /usr/local/lib/python3.8/dist-packages/_kaldifeat.cpython-38-x86_64-linux-gnu.so)
==16174==    by 0x47D4541F: PyInit__kaldifeat (in /usr/local/lib/python3.8/dist-packages/_kaldifeat.cpython-38-x86_64-linux-gnu.so)
==16174==    by 0x4A05095: _PyImport_LoadDynamicModuleWithSpec (in /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0)
==16174==    by 0x4A07104: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0)
==16174==    by 0x4ABD526: ??? (in /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0)

As you can see the free() invalid pointer is raised even when running outside Triton Python Backend. It is coming from pybind11::finalize_interpreter() when running both within Triton and outside Triton. I have tried the latest pybind11 v2.9.0, it gives the same issue.

Closing the issue as the issue is reproducible outside Triton and is shown to manifest when importing kaldifeat within pybind11 interpreter.

1reaction
csukuangfjcommented, Feb 14, 2022

I just created a GitHub repo to reproduce the core dump issue by changing import kaldifeat to import torch. Please see https://github.com/csukuangfj/memory-leak-example

You can see the output from GitHub actions at https://github.com/csukuangfj/memory-leak-example/runs/5179267107?check_suite_focus=true

A screenshot of the output is given below: Screen Shot 2022-02-14 at 1 41 34 PM

kaldifeat uses PyTorch C++ API and it is the responsibility of PyTorch to manage the memory.


[edited]: So memory issues with kaldifeat should be reproducible by replacing kaldifeat with torch.

Read more comments on GitHub >

github_iconTop Results From Across the Web

C free(): invalid pointer - Stack Overflow
You're attempting to free something that isn't a pointer to a "freeable" memory address. Just because something is an address doesn't mean ...
Read more >
What are invalid pointers in C/C++? - Educative.io
This case will occur if a user points a pointer towards a dynamically allocated memory location, deallocates that dynamically allocated memory, and forgets...
Read more >
Fix Free Invalid Pointer Error in C | Delft Stack
Another common error when using the dynamic memory is to call the free function on the pointers that have already been freed. This...
Read more >
Free(): invalid pointer - ROOT Forum
I am creating a simple event display in 3D. The idea behind is read three std::vector (x,y,z information) from a rootfile, then fill...
Read more >
CWE-763: Release of Invalid Pointer or Reference (4.9) - MITRE
This example allocates a BarObj object using the new operator in C++, however, the programmer then deallocates the object using free(), which may...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found