question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[DISCUSS][RFC] Static Destruction Order Problem

See original GitHub issue

A crash occurs when exiting the process when at least one vulkan device has been constructed. This appears to be due to the way static destruction order interacts with library unloading order on Windows (a similar issue appeared previously with NNVM).

The core of the previous issue, and seemingly the same issue here, is that when using static destructors in libraries, it is easy to run into a situation where one library is calling into another after the other has been unloaded. I suspect in this case, the vulkan library is unloaded before TVM’s static destructors are invoked, leading to a crash when trying to destroy vulkan devices.

The crash occurs here, on the call to vkDestroyDevice: https://github.com/dmlc/tvm/blob/master/src/runtime/vulkan/vulkan_device_api.cc#L16

We can check that this is due to the destruction/library unload order by forcing this destructor to be called before any libraries are unloaded, by calling it at the end of main(). This requires a small modification to the destructor so that it won’t fail when called twice:

VulkanWorkspace::~VulkanWorkspace() {
  for (VulkanContext& ctx : context_) {
    vkDestroyDevice(ctx.device, nullptr);
  }
  if (instance_ != nullptr) {
    vkDestroyInstance(instance_, nullptr);
    instance_ = nullptr;
  }
}

and then making the following call at the end of main()

  TVMContext vulkan_ctx{ (DLDeviceType)kDLVulkan, 0 };
  tvm::runtime::DeviceAPI::Get(vulkan_ctx)->~DeviceAPI();

stops the issue from occurring entirely.

However, this is quite a hacky solution, and doing this from python is even more cumbersome, especially if the process doesn’t exit cleanly.

Is there a better way of handling this cleanup that can prevent these issues?

More generally, there may be a wider issue of relying on lifetimes of static variables that keeps manifesting in this kind of issue. Is there a better way of handling library-wide lifetimes that could prevent this as the project grows? The most direct solution I can think of is requiring library clients to call a cleanup function when they are done with the library, eg TVMDestroy (for C++ this could have a RAII wrapper that is constructed in main()). This is more onerous, but would guarantee that the cleanup code is always run before any libraries have been unloaded in the course of shutting down the process. It could also be used to allow other libraries like NNVM and TOPI to register their own cleanup functions with TVM, and TVM can ensure that their cleanup happens first.

I’d be interested to hear any thoughts on how this specific issue can be resolved easily, and more generally this type of destructor/library unload issue in the future.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
alex-weavercommented, Jul 1, 2018

Ah - I wasn’t specific before but the error raised when calling vkDestroyDevice is an access violation, not a catchable exception. This was because the vuklan API library had been unloaded by the OS before TVM’s static destructors were called.

This means that any cross library function calls in static destructors can potentially cause an un-catchable error, and I’m not sure there is a safe way to determine if a library you depend on has been unloaded when the static destructor runs.

As far as I can see the only way to prevent this is to require an eager destructor for any destructor that may make cross-library calls (which could be as subtle as releasing a reference to a registered shared_ptr) in static destructors.

1reaction
tqchencommented, Jul 1, 2018

Unfortunately, static destruction order problem does occur in certain cases. There are several ways to alleviate this problem

  • For resources that we can control, always obtain a shared_ptr of a global singleton you dependent on(so destruction happens after that).
  • I do agree that having an eager destructor could help in certain cases, as long as the destructor itself is idempotent (if it is called twice the second time have no effect). Let us say this function is called TVMFinalize. In cases when this function is not called, the destructor will still be called automatically.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Destruction order of static objects in C++ - Stack Overflow
The static objects are destructed in the reverse order of construction. And the order of construction is very hard to control. The only...
Read more >
logger object is not ensured to outlive all static local variables ...
A common solution for that issue is to put all the statics instances in one place in the order you want them created...
Read more >
When are static objects destroyed? - GeeksforGeeks
static objects are allocated storage in static storage area. static object is destroyed at the termination of program.
Read more >
Access to threadlocal static member in destructor of a global ...
This Code snippet causes a crash while the destructor ist running #include <iostream> #include <thread> #include <unordered_map> class ...
Read more >
Leaked object with OwnedArray - General JUCE discussion
Your problem is using the static variables for the OwnedArrays. ... the destruction order may be that the Juce leak detector is destroyed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found