question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ChannelFull exception crashing workers during backlog

See original GitHub issue

We recently encountered ChannelFull worker crashes while clearing ~15 minutes of messages (after the runworker processes had been temporarily offline).

At the time there were approximately 20,000 keys in the appropriate redis db and we were using the default channel capacity of 100.

After research it seems like the suggested solution to clear the backlog is to increase the number of workers – so we did. They proceeded to crash with the included stack trace and didn’t help to process messages any faster.

We found ourselves in a situation where the only way to get things operating correctly was to pull the plug on new incoming WebSocket connections. This is because the ChannelFull error was crashing workers which, in turn, means that the channels weren’t actually being cleared (leading the more crashes and so on).

At the time we had 32 worker processes across a number of machines attempting to catch up.

Is this expected behaviour for the workers to crash like this, and how could we mitigate similar problems in the future?

Setup

  • Nginx proxying to upstream daphne running in containers
  • Using channels to service only WebSocket requests
  • asgi_redis.RedisSentinelChannelLayer backend
  • Running runworker via supervisor on a number of machines

Versions

asgi-redis==1.3.0
channels==1.1.3
daphne==1.2.0
django==1.11.1
Twisted==17.1.0

Traceback

Traceback (most recent call last):
  File "/home/team/releases/current/manage.py", line 9, in <module>
    execute_from_command_line(sys.argv)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 363, in execute_from_command_line
    utility.execute()
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 355, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 283, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 330, in execute
    output = self.handle(*args, **options)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/channels/management/commands/runworker.py", line 83, in handle
    worker.run()
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/channels/worker.py", line 151, in run
    consumer_finished.send(sender=self.__class__)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py", line 193, in send
    for receiver in self._live_receivers(sender)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/channels/message.py", line 93, in send_and_flush
    sender.send(message, immediately=True)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/channels/channel.py", line 44, in send
    self.channel_layer.send(self.name, content)
  File "/home/team/releases/current/virtualenv/local/lib/python2.7/site-packages/asgi_redis/core.py", line 177, in send
    raise self.ChannelFull
asgiref.base_layer.ChannelFull

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
andrewgodwincommented, May 26, 2017

Ah yes, I see what’s happening, the atomic message handling is not correctly dealing with ChannelFull. I’ll work on a fix for it soon.

0reactions
matteingcommented, Jun 6, 2017

This issue started affecting me in production and development too; thanks for the super quick fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Channels Documentation
Channels is a project that takes Django and extends its abilities beyond HTTP - to handle WebSockets, chat protocols,.
Read more >
test Documentation
Zero-downtime deployment with browsers paused while new workers spin up ... crash)[U+3002][U+9019][U+500B][U+5099][U+9078][U+65B9][U+6848][U+662F].
Read more >
Legal limbo: Massive backlog leaves asylum seekers in for a ...
In 2012, the federal government introduced a law imposing a two-month time limit to close cases, but cases can be extended if there...
Read more >
Tower of Doom & Shadow Over Mystara
Capcom Japan had turned to its US branch to try and work something out with TSR, with SSI mediating - SSI was the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found