DEFAULT_PENDING_SIZE is not enforced
See original GitHub issueContext
I’m working on a project where a single application is responsible for communicating with Bluetooth Low Energy devices and NATS. To put it simply it acts as a gateway.
The publish rate of our devices can be pretty high some times, and we observed multiple times NATS flushing much more data than what we expected (DEFAULT_PENDING_SIZE
). At the same time, the coroutine function _flush_pending()
of nats.aio.Client
was called many times, but the flusher did not have the chance to execute yet.
Reproduction
I created a script which reproduces the issue: https://github.com/charbonnierg/nats.py/blob/debug/publish_pending_queue_size/examples/many_publish.py
Here is an example of the (long) logs:
2022-01-14 00:49:49,017 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 1070681, max_size: 1048576)
2022-01-14 00:49:49,018 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 1119215, max_size: 1048576)
2022-01-14 00:49:49,018 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 1189709, max_size: 1048576)
2022-01-14 00:49:49,018 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 1204078, max_size: 1048576)
# 100 more similar lines with growing pending_size were removed before pasting
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6436012, max_size: 1048576)
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6499805, max_size: 1048576)
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6516089, max_size: 1048576)
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6586225, max_size: 1048576)
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6666835, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6707166, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6724802, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6814196, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6911094, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6975466, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6982040, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7079261, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7124865, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7197400, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7251865, max_size: 1048576)
2022-01-14 00:49:49,054 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7341005, max_size: 1048576)
2022-01-14 00:49:49,054 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7344573, max_size: 1048576)
2022-01-14 00:49:49,054 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7426482, max_size: 1048576)
2022-01-14 00:49:49,054 - nats.aio.client - DEBUG - Flushing 7426482 bytes
2022-01-14 00:49:49,057 - nats.aio.client - DEBUG - Flushed 7426482 bytes
As we can see, the writer flushes 7426482 bytes in one go, and ignores the pending size which is set to 1024*1024.
It seems that due to many publish
, asyncio
never let a chance to the _flusher
coroutine to execute, and as such more data is appended to the pending queue.
Is this behaviour known ?
In our case, flushing so much data in one go is problematic. We do not want to flush data between each publish either because we want to benefit from the pending queue and avoid sending messages one at a time.
Resolution
I created a branch with a single commit which uses an asyncio.Future
to wait until flusher finishes writing when_flush_pending
is called.
async def _flush_pending(
self,
block: bool = False,
timeout: Optional[float] = None
) -> asyncio.Future:
assert self._flush_queue, "Client.connect must be called first"
try:
future = asyncio.Future()
if not self.is_connected:
future.set_result(None)
return future
# kick the flusher!
await self._flush_queue.put(future)
# Optionally block
if block:
try:
await asyncio.wait_for(future, timeout)
except asyncio.TimeoutError:
raise TimeoutError
Then we can simply block when pending size is greater than DEFAULT_PENDING_SIZE:
async def _send_command(self, cmd, priority: bool = False) -> None:
if priority:
self._pending.insert(0, cmd)
else:
self._pending.append(cmd)
self._pending_data_size += len(cmd)
if self._pending_data_size > DEFAULT_PENDING_SIZE:
_logger.debug(
f"Kicking the flusher (pending_size: {self._pending_data_size}, max_size: {DEFAULT_PENDING_SIZE})"
)
# FIXME: Should a timeout be used now that this can block ?
# Now: process_ping, send_publish, send_subscribe and send_unsubscribe can block as long as flushing in on-going
await self._flush_pending(block=True)
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (3 by maintainers)
Top GitHub Comments
I think sounds good to add an option to
publish
to wait for the timeout if needed, maybe a similar flush default timeout as inflush
? https://github.com/nats-io/nats.py/blob/main/nats/aio/client.py#L934For your information, we’ve been using this fix for at least a month (sorry for not opening the issue sooner…) and we did not have any problem with it