Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak

See original GitHub issue

We have been noticing a memory leak with watchtower for long running processes.
Logs are being delivered to Cloudwatch but memory usage keeps going up. Python version 2.7.10 and watchtower 0.3.3

Snippet of code to reproduce

import watchtower, logging, time

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(str(time.time()))
wt = watchtower.CloudWatchLogHandler(use_queues=True, send_interval=60, max_batch_count=10)
logger.addHandler(wt)
logger.propagate = False

for _ in xrange(50000):
    logger.info('Junk data | {}'.format("0" * 10000))
    time.sleep(0.05)

Issue Analytics

State:
Created 7 years ago
Reactions:1
Comments:6 (1 by maintainers)

Top GitHub Comments

6reactions

jcerjakcommented, Feb 8, 2018

I checked the snippet above with Python 2.7.6 and latest watchtower master by using the memory_profiler.

When using queues (use_queues=True):

watchtower_memory_leak_with_queues

Without using queues (use_queues=False):

watchtower_memory_leak_no_queues

Upon investigation, it seems that the issue is related to botocore, and might be further amplified with the way threads are used in watchtower: https://github.com/boto/botocore/issues/805

This seems to be confirmed by doing manual garbage collection after pushing logs to CloudWatch:

import gc

...
response = self.cwl_client.put_log_events(**kwargs)
gc.collect()

When using queues (use_queues=True) and gc:

watchtower_memory_leak_with_queues_and_gc

0reactions

kislyukcommented, Feb 11, 2022

We use watchtower in production and do not observe any memory leaks. Also, watchtower does not share boto3 clients across threads unless explicitly passed a configuration that forces it to do so.

I am going to close this issue for now. When I run the reproduction script above, it stabilizes at a steady state memory consumption of 60MB. Network outages and associated retries may temporarily increase this footprint. If somebody has a specific concern, please post a complete reproduction and an explanation of why it is different from what you expect.