Deadlock when objects emit messages at garbage collection
See original GitHub issueHello 👋
I have found a issue whereby loguru can hit a deadlock due to garbage collection. Specifically, if there is an object in memory that has the following properties:
- Is part of a Cyclic Isolate (i.e. it will be garbage collected by the generational garbage collector, not the reference counter)
- Implements a
__del__
finalizer which explicitly or implicitly triggers a log statement
Then any memory assignment during log handling has a chance to trigger a deadlock.
Reproducible test case
Python version: 3.8.12 Loguru version: 0.6.0
The following is a script which demonstrates the deadlock:
import sys
import loguru
class CyclicReference:
"""A minimal cyclic reference.
Cyclical references are garbage collected using the generational collector rather than
via reference counting. This is important here, because the generational collector runs
periodically, meaning that it is hard to predict when the stack will be overtaken by a
garbage collection process - but it will almost always be when allocating memory of some
kind.
When this object is garbage-collected, a log will be emitted.
"""
def __init__(self, _other: "CyclicReference" = None):
self.other = _other or CyclicReference(_other=self)
def __del__(self):
loguru.logger.info("Goodbye!")
class Sink:
def write(self, message):
# This simulates assigning some memory as part of log handling. Eventually this will
# trigger the generational garbage collector to take over here.
# You could replace it with `json.dumps(message.record)` to get the same effect.
_ = [dict() for _ in range(20)]
# Actually write the message somewhere
sys.stderr.write(message)
def flush(self):
sys.stderr.flush()
def main():
# Set up logger with sink:
loguru.logger.add(Sink(), level="INFO")
# Keep logging until a cyclic reference is garbage collected, at which point deadlock
# should be reached:
for index in range(10000):
CyclicReference() # Here we create cyclic isolates, simulating code which uses them
loguru.logger.info("Hello", index=index)
if __name__ == "__main__":
main()
Expected behaviour
A series of Hello
logs are emitted, interspersed with Goodbye!
logs.
Actual behaviour
The program reaches a deadlock, requiring a SIGTERM to exit.
Explanation of the issue
The emit
method of loguru’s handler uses a lock to avoid contention for the log handler. It is for this reason, that emitting additional log messages in custom sinks is not supported and will result in deadlocks (see #390).
This seems to be a relatively sensible limitation, however the nature of the generational garbage collector in CPython can cause this to happen inadvertently and unpredictably in code which has cyclic isolates and performs some level of processing during log handling.
Real world example
The reproducible example above seems contrived, and one might suggest that the answer is simply to avoid reference cycles, so I will give an actual example of this that I’ve seen. The Python Elasticsearch client contains reference cycles as a means for sharing connections between interfaces.
As a transport it uses aiohttp, which implements a __del__
garbage collection finalizer. This in turn invokes the asyncio exception handler, which defaults to emitting a log statement. Since this is a stdlib logging
call, it doesn’t cause a problem on its own, but it is likely that users of loguru will set up some kind of intercept handler to forward third party stdlib logs to loguru
.
Additional notes
It took me some time to figure out a reproducible example for this, as I was only familiar with the reference counting part of the garbage collector. I found the this article useful to understand the generational collector, which is what led me to isolate the circumstances to reproduce.
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:8 (8 by maintainers)
Top GitHub Comments
Hi @jacksmith15!
This is a bug report of very great quality, thanks a lot for that! You did all the tedious work of investigation, it was a tremendous help to understand the problem. I can only imagine how hard this bug was to track down.
Your analysis is very interesting. I didn’t realize the possibility of a deadlock if the logger was used in a
__del__
and the garbage collector happened to run within an handler… 😬I saw that you went further ahead and implemented a solution before opening a PR, thanks once again for your work. 👍
Regarding the chosen solution of making handlers re-entrant. Some time ago, I had to solve a bug causing a deadlock when a process was forked using the
multiprocessing
module (see https://github.com/Delgan/loguru/issues/231#issuecomment-622925398 ). During my investigation, I came to the conclusion that usingLock
instead ofRLock
had greatly simplified the resolution of the problem and avoided other complications. For this reason, I was glad that Loguru was initially implemented with a simpleLock
and therefore I had good reasons to never switch toRLock
. I thought about it again today, and I’m not sure what I meant at the time. My analysis of the problem must have been incorrect, because I can’t find a scenario in which using aRLock
would cause subtle deadlocks. It seems acceptable, finally.However, what I don’t like about using a
RLock
is that it prevents to guarantee the atomicity of the operations performed by the handler. Suddenly, the implementation must take into account that the sink is re-entrant. Although it avoids deadlocks, it can cause other undesirable and even hazardous problems.I thought for a long time that there was no way to cleanly prevent the user from facing a deadlock, but I finally realized: why not just check that the lock is not already in use by the current thread when calling the handler?
The implementation could roughly look like that (exception handling not addressed):
If I’m not mistaken, I don’t think there is any possible data race or deadlock with this solution.
Well, this implies that the message in
__del__
won’t be logged and will generate an error instead, but this is hard to solve without making the handler re-entrant (unless we implement a system to postpone the message?).This also has the advantage of avoiding deadlocks or infinite loops in case the logger is used inside a sink. This is the most common problem, and it would allow to warn the user with an exception explaining the problem explicitly.
That’s basically the first solution your proposed. What do you think of it? That’s not ideal as some messages might be discarded, but I basically consider logging inside
__del__
orsignal
to be very risky and best avoided.I’ve raised a PR here: https://github.com/Delgan/loguru/pull/720
Feel free to disregard if you already know how you want to go about implementing it 👍