Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ChannelFull exception crashing workers during backlog

See original GitHub issue

We recently encountered ChannelFull worker crashes while clearing ~15 minutes of messages (after the runworker processes had been temporarily offline).

At the time there were approximately 20,000 keys in the appropriate redis db and we were using the default channel capacity of 100.

After research it seems like the suggested solution to clear the backlog is to increase the number of workers – so we did. They proceeded to crash with the included stack trace and didn’t help to process messages any faster.

We found ourselves in a situation where the only way to get things operating correctly was to pull the plug on new incoming WebSocket connections. This is because the ChannelFull error was crashing workers which, in turn, means that the channels weren’t actually being cleared (leading the more crashes and so on).

At the time we had 32 worker processes across a number of machines attempting to catch up.

Is this expected behaviour for the workers to crash like this, and how could we mitigate similar problems in the future?

Setup

Nginx proxying to upstream daphne running in containers
Using channels to service only WebSocket requests
asgi_redis.RedisSentinelChannelLayer backend
Running runworker via supervisor on a number of machines

Versions

asgi-redis==1.3.0
channels==1.1.3
daphne==1.2.0
django==1.11.1
Twisted==17.1.0

Traceback

Traceback (most recent call last):
  File "/home/team/releases/current/manage.py", line 9, in <module>
    execute_from_command_line(sys.argv)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 363, in execute_from_command_line
    utility.execute()
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 355, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 283, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/channels/management/commands/runworker.py", line 83, in handle
    worker.run()
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/channels/worker.py", line 151, in run
    consumer_finished.send(sender=self.__class__)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py", line 193, in send
    for receiver in self._live_receivers(sender)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/channels/message.py", line 93, in send_and_flush
    sender.send(message, immediately=True)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/channels/channel.py", line 44, in send
    self.channel_layer.send(self.name, content)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/asgi_redis/core.py", line 177, in send
    raise self.ChannelFull
asgiref.base_layer.ChannelFull