Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak when utilizing Store SCP behind a load balancer with health checks

See original GitHub issue

Describe the bug We are currently utilizing 8 Store SCP processes across 8 servers (1 process per server) all behind a loadbalancer.org load balancer. We noticed that, over time, our Store SCP processes memory usage would balloon to the point where it would start affecting other processes on the machine and we’d have to restart the processes. After testing and investigating, it was discovered to have been caused by the health checks from the load balancer which was set to Connect to port - Attempt to make a connection to the specified port..

With the health checking enabled, this was the resulting memory snapshot of one of the Store SCP processes after approximately 65 hours of idle time:

 Aug 23 09:53:48 joints-io-1 (2418585) stord [INFO] Top 10 memory offenders for stord process:
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/site-packages/pydicom/uid.py:69: size=88.2 MiB, count=1229712, average=75 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/threading.py:909: size=16.0 MiB, count=1, average=16.0 MiB
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/_weakrefset.py:84: size=8247 KiB, count=645, average=12.8 KiB
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:290: size=6507 KiB, count=83291, average=80 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:216: size=5846 KiB, count=44114, average=136 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:180: size=3271 KiB, count=44538, average=75 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/threading.py:238: size=2888 KiB, count=9661, average=306 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copyreg.py:88: size=2744 KiB, count=43893, average=64 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:238: size=1582 KiB, count=19891, average=81 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:275: size=698 KiB, count=10147, average=70 B

top output:

    PID       USER     PR  NI    VIRT       RES    SHR  S  %CPU    %MEM     TIME+      COMMAND   
    2418586  joints    20   0    3637100   1.6g   8796  S   7.6    13.5     302:44.39  /usr/local/bin/stord

The memory ballooning is alleviated if health checks from our load balancer are disabled.

Expected behavior The hope is that health check port connections can be handled properly so as not to balloon process’ memory to harmful levels.

Steps To Reproduce Example python code being run:

    from pynetdicom import AE, AllStoragePresentationContexts, evt

    # Transfer Syntaxes we support
    IMPLICIT_VR_LE                  = '1.2.840.10008.1.2'
    EXPLICIT_VR_LE                  = '1.2.840.10008.1.2.1'
    EXPLICIT_VR_BE                  = '1.2.840.10008.1.2.2'        
    IMPLICIT_VR_LE_JPEG_BASELINE    = '1.2.840.10008.1.2.4.50'
    IMPLICIT_VR_LE_JPEG_EXTENDED    = '1.2.840.10008.1.2.4.51'
    IMPLICIT_VR_LE_JPEG_PROG_10_12  = '1.2.840.10008.1.2.4.55'     
    IMPLICIT_VR_LE_JPEG_LOSSLESS    = '1.2.840.10008.1.2.4.57'     
    IMPLICIT_VR_LE_JPEG_LOSSLESS_FO = '1.2.840.10008.1.2.4.70'
    IMPLICIT_VR_LE_JPEGLS_LOSSLESS  = '1.2.840.10008.1.2.4.80'
    IMPLICIT_VR_LE_JPEGLS_LOSSY     = '1.2.840.10008.1.2.4.81'
    IMPLICIT_VR_LE_JPEG2K_LOSSLESS  = '1.2.840.10008.1.2.4.90'
    IMPLICIT_VR_LE_JPEG2K_LOSSY     = '1.2.840.10008.1.2.4.91'
    IMPLICIT_VR_LE_RLE_LOSSLESS     = '1.2.840.10008.1.2.5'
  
    ACCEPTED_TRANSFER_SYNTAXES = (
      IMPLICIT_VR_LE,
      EXPLICIT_VR_LE,
      EXPLICIT_VR_BE,
      IMPLICIT_VR_LE_JPEG_BASELINE,
      IMPLICIT_VR_LE_JPEG_EXTENDED,
      IMPLICIT_VR_LE_JPEG_PROG_10_12,
      IMPLICIT_VR_LE_JPEG_LOSSLESS,
      IMPLICIT_VR_LE_JPEG_LOSSLESS_FO,
      IMPLICIT_VR_LE_JPEGLS_LOSSLESS,
      IMPLICIT_VR_LE_JPEGLS_LOSSY,
      IMPLICIT_VR_LE_JPEG2K_LOSSLESS,
      IMPLICIT_VR_LE_JPEG2K_LOSSY,
      IMPLICIT_VR_LE_RLE_LOSSLESS,
    )

    ae               = AE('MEDSTRAT')
    port             = 11112
    contexts         = AllStoragePresentationContexts
    transferSyntaxes = list(ACCEPTED_TRANSFER_SYNTAXES)
    storeHandler     = CSTOREHandler()
    evtHandlers      = (
      (evt.EVT_C_STORE, storeHandler.handleEvent),
    )

    for context in contexts:
      context.transfer_syntax = transferSyntaxes

    ae.supported_contexts   = contexts
    ae.maximum_pdu_size     = 0
    ae.maximum_associations = 40

    logger.info(f'Starting Storage service on port {port} (max threads: {ae.maximum_associations})')

    scp = ae.start_server(('', port), block = False, evt_handlers = evtHandlers)

Your environment

[root@joints-io-1 ~]$ python -c "import platform; print(platform.platform())"
Linux-5.4.106-1-pve-x86_64-with-centos-8.3.2011

[root@joints-io-1 ~]$ python -c "import sys; print('Python ', sys.version)"
Python  3.7.9 (default, May 30 2021, 14:22:29) 
[GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]

[root@joints-io-1 ~]$ python -c "import pydicom; print('pydicom ', pydicom.__version__)"
pydicom  1.4.2

[root@joints-io-1 ~]$ python -c "import pynetdicom; print('pynetdicom ', pynetdicom.__version__)"
pynetdicom  1.5.6

We’re using loadbalancer.org’s Virtual ADC (https://www.loadbalancer.org/products/virtual/enterprise-va-1g/). The version we’re currently running is v8.5.2 with Layer 4 load balancing set to Direct Routing as the Forwarding Method with Health Check Type Connect to Port.

Issue Analytics

State:
Created 2 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

scaramallioncommented, Dec 23, 2021

OK, should be fixed now. As far as I can tell the problem was due to dead Association threads not being garbage collected automatically, so I’ve added a call to gc.collect() that should run every 30 s or so.

0reactions

bfraubcommented, Aug 26, 2021

Here is a the packet dump from wireshark:

tcpdump104.txt

Top Results From Across the Web

Memory Leak Detector - Cisco Content Hub

The Memory Leak Detector feature is a tool that can be used to detect memory leaks on a router that is running Cisco...

Practice Exam 1 Flashcards | Quizlet

A user is attempting to connect to an Amazon Linux instance using SSH and is ... The load balancer will continue to perform...

Checking CPU and memory resources

This command shows all of the top processes that are running on the FortiGate and their CPU usage. The process names are on...

Sidekiq MemoryKiller - GitLab Docs

The GitLab Rails application code suffers from memory leaks. For web requests this problem ... The MemoryKiller is controlled using environment variables.

Maintenance Fixes - Radware Support

In Outbound Link Load Balancing environment, the transparent health check to a ... Image upload on the Management port using SCP was slower...