PubSub Architecture
See original GitHub issueContinuing from django/channels_redis#251 - I have a few concerns around the architecture for pubsub. While the reconnect logic is admirable, I’m not sure that it’s appropriate in pubsub because it allows for silent missed messages.
Let’s say between our _do_keepalive()
loops the connection is lost, and a publisher elsewhere sends a message. RedisSingleShardConnection
will silently reconnect having missed the message. In the current blocking list architecture, this would be fine because we could reconnect to the same key and continue popping items…they would simply queue up like a log (redis streams would work well). But in pubsub, those messages will be lost.
This is very relevant to our usage. In the event of a network blip or sentinel failover, the websocket consumers disconnect and the frontend attempts to gacefully recover by reconnecting websockets and by performing a full state refresh to ensure no data has been missed.
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (3 by maintainers)
@LiteWait @acu192 Absolutely you should not be using Daphne in prod (we also use Uvicorn). In terms of dropping messages between webserver and client, websockets run over tcp so you should have the same guarantees there as you do with tcp. That said, you should expect network issues everywhere. If you are using websockets as a source of truth, I think that’s a mistake as you’d need to implement some sort of 2PC on top. Distributed systems are difficult, which is why we treat redis/channels as a nice-to-have real-time sync, which we expect to break and we fall back to sync’ing via api which is backed by our postgres cluster and postgres’ decades of battle testing to overcome these exact issues.
@acu192 Btw - you may have something similar in-house but to stress our infra for consistency at scale, we built this little tool - https://github.com/zumalabs/sockbasher
In dire need of some external docs but might be useful in its current form for you.