question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory leak when utilizing Store SCP behind a load balancer with health checks

See original GitHub issue

Describe the bug We are currently utilizing 8 Store SCP processes across 8 servers (1 process per server) all behind a loadbalancer.org load balancer. We noticed that, over time, our Store SCP processes memory usage would balloon to the point where it would start affecting other processes on the machine and we’d have to restart the processes. After testing and investigating, it was discovered to have been caused by the health checks from the load balancer which was set to Connect to port - Attempt to make a connection to the specified port..

With the health checking enabled, this was the resulting memory snapshot of one of the Store SCP processes after approximately 65 hours of idle time:

 Aug 23 09:53:48 joints-io-1 (2418585) stord [INFO] Top 10 memory offenders for stord process:
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/site-packages/pydicom/uid.py:69: size=88.2 MiB, count=1229712, average=75 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/threading.py:909: size=16.0 MiB, count=1, average=16.0 MiB
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/_weakrefset.py:84: size=8247 KiB, count=645, average=12.8 KiB
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:290: size=6507 KiB, count=83291, average=80 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:216: size=5846 KiB, count=44114, average=136 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:180: size=3271 KiB, count=44538, average=75 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/threading.py:238: size=2888 KiB, count=9661, average=306 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copyreg.py:88: size=2744 KiB, count=43893, average=64 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:238: size=1582 KiB, count=19891, average=81 B
 Aug 23 09:53:51 joints-io-1 (2418585) stord [INFO] /usr/local/lib/python3.7/copy.py:275: size=698 KiB, count=10147, average=70 B

top output:

    PID       USER     PR  NI    VIRT       RES    SHR  S  %CPU    %MEM     TIME+      COMMAND   
    2418586  joints    20   0    3637100   1.6g   8796  S   7.6    13.5     302:44.39  /usr/local/bin/stord

The memory ballooning is alleviated if health checks from our load balancer are disabled.

Expected behavior The hope is that health check port connections can be handled properly so as not to balloon process’ memory to harmful levels.

Steps To Reproduce Example python code being run:

    from pynetdicom import AE, AllStoragePresentationContexts, evt

    # Transfer Syntaxes we support
    IMPLICIT_VR_LE                  = '1.2.840.10008.1.2'
    EXPLICIT_VR_LE                  = '1.2.840.10008.1.2.1'
    EXPLICIT_VR_BE                  = '1.2.840.10008.1.2.2'        
    IMPLICIT_VR_LE_JPEG_BASELINE    = '1.2.840.10008.1.2.4.50'
    IMPLICIT_VR_LE_JPEG_EXTENDED    = '1.2.840.10008.1.2.4.51'
    IMPLICIT_VR_LE_JPEG_PROG_10_12  = '1.2.840.10008.1.2.4.55'     
    IMPLICIT_VR_LE_JPEG_LOSSLESS    = '1.2.840.10008.1.2.4.57'     
    IMPLICIT_VR_LE_JPEG_LOSSLESS_FO = '1.2.840.10008.1.2.4.70'
    IMPLICIT_VR_LE_JPEGLS_LOSSLESS  = '1.2.840.10008.1.2.4.80'
    IMPLICIT_VR_LE_JPEGLS_LOSSY     = '1.2.840.10008.1.2.4.81'
    IMPLICIT_VR_LE_JPEG2K_LOSSLESS  = '1.2.840.10008.1.2.4.90'
    IMPLICIT_VR_LE_JPEG2K_LOSSY     = '1.2.840.10008.1.2.4.91'
    IMPLICIT_VR_LE_RLE_LOSSLESS     = '1.2.840.10008.1.2.5'
  
    ACCEPTED_TRANSFER_SYNTAXES = (
      IMPLICIT_VR_LE,
      EXPLICIT_VR_LE,
      EXPLICIT_VR_BE,
      IMPLICIT_VR_LE_JPEG_BASELINE,
      IMPLICIT_VR_LE_JPEG_EXTENDED,
      IMPLICIT_VR_LE_JPEG_PROG_10_12,
      IMPLICIT_VR_LE_JPEG_LOSSLESS,
      IMPLICIT_VR_LE_JPEG_LOSSLESS_FO,
      IMPLICIT_VR_LE_JPEGLS_LOSSLESS,
      IMPLICIT_VR_LE_JPEGLS_LOSSY,
      IMPLICIT_VR_LE_JPEG2K_LOSSLESS,
      IMPLICIT_VR_LE_JPEG2K_LOSSY,
      IMPLICIT_VR_LE_RLE_LOSSLESS,
    )

    ae               = AE('MEDSTRAT')
    port             = 11112
    contexts         = AllStoragePresentationContexts
    transferSyntaxes = list(ACCEPTED_TRANSFER_SYNTAXES)
    storeHandler     = CSTOREHandler()
    evtHandlers      = (
      (evt.EVT_C_STORE, storeHandler.handleEvent),
    )

    for context in contexts:
      context.transfer_syntax = transferSyntaxes

    ae.supported_contexts   = contexts
    ae.maximum_pdu_size     = 0
    ae.maximum_associations = 40

    logger.info(f'Starting Storage service on port {port} (max threads: {ae.maximum_associations})')

    scp = ae.start_server(('', port), block = False, evt_handlers = evtHandlers)

Your environment

[root@joints-io-1 ~]$ python -c "import platform; print(platform.platform())"
Linux-5.4.106-1-pve-x86_64-with-centos-8.3.2011

[root@joints-io-1 ~]$ python -c "import sys; print('Python ', sys.version)"
Python  3.7.9 (default, May 30 2021, 14:22:29) 
[GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]

[root@joints-io-1 ~]$ python -c "import pydicom; print('pydicom ', pydicom.__version__)"
pydicom  1.4.2

[root@joints-io-1 ~]$ python -c "import pynetdicom; print('pynetdicom ', pynetdicom.__version__)"
pynetdicom  1.5.6

We’re using loadbalancer.org’s Virtual ADC (https://www.loadbalancer.org/products/virtual/enterprise-va-1g/). The version we’re currently running is v8.5.2 with Layer 4 load balancing set to Direct Routing as the Forwarding Method with Health Check Type Connect to Port.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
scaramallioncommented, Dec 23, 2021

OK, should be fixed now. As far as I can tell the problem was due to dead Association threads not being garbage collected automatically, so I’ve added a call to gc.collect() that should run every 30 s or so.

0reactions
bfraubcommented, Aug 26, 2021

Here is a the packet dump from wireshark:

tcpdump104.txt

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory Leak Detector - Cisco Content Hub
The Memory Leak Detector feature is a tool that can be used to detect memory leaks on a router that is running Cisco...
Read more >
Practice Exam 1 Flashcards | Quizlet
A user is attempting to connect to an Amazon Linux instance using SSH and is ... The load balancer will continue to perform...
Read more >
Checking CPU and memory resources
This command shows all of the top processes that are running on the FortiGate and their CPU usage. The process names are on...
Read more >
Sidekiq MemoryKiller - GitLab Docs
The GitLab Rails application code suffers from memory leaks. For web requests this problem ... The MemoryKiller is controlled using environment variables.
Read more >
Maintenance Fixes - Radware Support
In Outbound Link Load Balancing environment, the transparent health check to a ... Image upload on the Management port using SCP was slower...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found