Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deadlock when objects emit messages at garbage collection

See original GitHub issue

Hello 👋

I have found a issue whereby loguru can hit a deadlock due to garbage collection. Specifically, if there is an object in memory that has the following properties:

Is part of a Cyclic Isolate (i.e. it will be garbage collected by the generational garbage collector, not the reference counter)
Implements a __del__ finalizer which explicitly or implicitly triggers a log statement

Then any memory assignment during log handling has a chance to trigger a deadlock.

Reproducible test case

Python version: 3.8.12 Loguru version: 0.6.0

The following is a script which demonstrates the deadlock:

import sys

import loguru


class CyclicReference:
    """A minimal cyclic reference.

    Cyclical references are garbage collected using the generational collector rather than
    via reference counting. This is important here, because the generational collector runs
    periodically, meaning that it is hard to predict when the stack will be overtaken by a
    garbage collection process - but it will almost always be when allocating memory of some
    kind.

    When this object is garbage-collected, a log will be emitted.
    """
    def __init__(self, _other: "CyclicReference" = None):
        self.other = _other or CyclicReference(_other=self)

    def __del__(self):
        loguru.logger.info("Goodbye!")


class Sink:
    def write(self, message):
        # This simulates assigning some memory as part of log handling. Eventually this will
        # trigger the generational garbage collector to take over here.
        # You could replace it with `json.dumps(message.record)` to get the same effect.
        _ = [dict() for _ in range(20)]

        # Actually write the message somewhere
        sys.stderr.write(message)

    def flush(self):
        sys.stderr.flush()


def main():
    # Set up logger with sink:
    loguru.logger.add(Sink(), level="INFO")

    # Keep logging until a cyclic reference is garbage collected, at which point deadlock
    # should be reached:
    for index in range(10000):
        CyclicReference()  # Here we create cyclic isolates, simulating code which uses them
        loguru.logger.info("Hello", index=index)


if __name__ == "__main__":
    main()

Expected behaviour

A series of Hello logs are emitted, interspersed with Goodbye! logs.

Actual behaviour

The program reaches a deadlock, requiring a SIGTERM to exit.

Explanation of the issue

The emit method of loguru’s handler uses a lock to avoid contention for the log handler. It is for this reason, that emitting additional log messages in custom sinks is not supported and will result in deadlocks (see #390).

This seems to be a relatively sensible limitation, however the nature of the generational garbage collector in CPython can cause this to happen inadvertently and unpredictably in code which has cyclic isolates and performs some level of processing during log handling.

Real world example

The reproducible example above seems contrived, and one might suggest that the answer is simply to avoid reference cycles, so I will give an actual example of this that I’ve seen. The Python Elasticsearch client contains reference cycles as a means for sharing connections between interfaces.

As a transport it uses aiohttp, which implements a __del__ garbage collection finalizer. This in turn invokes the asyncio exception handler, which defaults to emitting a log statement. Since this is a stdlib logging call, it doesn’t cause a problem on its own, but it is likely that users of loguru will set up some kind of intercept handler to forward third party stdlib logs to loguru.

Additional notes

It took me some time to figure out a reproducible example for this, as I was only familiar with the reference counting part of the garbage collector. I found the this article useful to understand the generational collector, which is what led me to isolate the circumstances to reproduce.

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

Delgancommented, Sep 30, 2022

Hi @jacksmith15!

This is a bug report of very great quality, thanks a lot for that! You did all the tedious work of investigation, it was a tremendous help to understand the problem. I can only imagine how hard this bug was to track down.

Your analysis is very interesting. I didn’t realize the possibility of a deadlock if the logger was used in a __del__ and the garbage collector happened to run within an handler… 😬

I saw that you went further ahead and implemented a solution before opening a PR, thanks once again for your work. 👍

Regarding the chosen solution of making handlers re-entrant. Some time ago, I had to solve a bug causing a deadlock when a process was forked using the multiprocessing module (see https://github.com/Delgan/loguru/issues/231#issuecomment-622925398 ). During my investigation, I came to the conclusion that using Lock instead of RLock had greatly simplified the resolution of the problem and avoided other complications. For this reason, I was glad that Loguru was initially implemented with a simple Lock and therefore I had good reasons to never switch to RLock. I thought about it again today, and I’m not sure what I meant at the time. My analysis of the problem must have been incorrect, because I can’t find a scenario in which using a RLock would cause subtle deadlocks. It seems acceptable, finally.

However, what I don’t like about using a RLock is that it prevents to guarantee the atomicity of the operations performed by the handler. Suddenly, the implementation must take into account that the sink is re-entrant. Although it avoids deadlocks, it can cause other undesirable and even hazardous problems.

I thought for a long time that there was no way to cleanly prevent the user from facing a deadlock, but I finally realized: why not just check that the lock is not already in use by the current thread when calling the handler?

The implementation could roughly look like that (exception handling not addressed):

--- a/loguru/_handler.py
+++ b/loguru/_handler.py
@@ -2,7 +2,8 @@ import functools
 import json
 import multiprocessing
 import os
-from threading import Thread
+from threading import Thread, local

 from ._colorizer import Colorizer
 from ._locks_machinery import create_handler_lock
@@ -57,6 +58,8 @@ class Handler:
         self._exception_formatter = exception_formatter
         self._id = id_
         self._levels_ansi_codes = levels_ansi_codes  # Warning, reference shared among handlers
+        self._local = local()
+        self._local.locked = False

         self._decolorized_format = None
         self._precolorized_formats = {}
@@ -168,6 +171,12 @@ class Handler:
             str_record = Message(formatted)
             str_record.record = record

+            if self._local.locked:
+                raise RuntimeError("Deadlock avoided")
+
+            self._local.locked = True
+
             with self._lock:
                 if self._stopped:
                     return
@@ -176,6 +185,8 @@ class Handler:
                 else:
                     self._sink.write(str_record)

+            self._local.locked = False
+
         except Exception:
             if not self._error_interceptor.should_catch():
                 raise

If I’m not mistaken, I don’t think there is any possible data race or deadlock with this solution.

Well, this implies that the message in __del__ won’t be logged and will generate an error instead, but this is hard to solve without making the handler re-entrant (unless we implement a system to postpone the message?).

This also has the advantage of avoiding deadlocks or infinite loops in case the logger is used inside a sink. This is the most common problem, and it would allow to warn the user with an exception explaining the problem explicitly.

That’s basically the first solution your proposed. What do you think of it? That’s not ideal as some messages might be discarded, but I basically consider logging inside __del__ or signal to be very risky and best avoided.

0reactions

jacksmith15commented, Oct 7, 2022

I’ve raised a PR here: https://github.com/Delgan/loguru/pull/720

Feel free to disregard if you already know how you want to go about implementing it 👍