question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DEFAULT_PENDING_SIZE is not enforced

See original GitHub issue

Context

I’m working on a project where a single application is responsible for communicating with Bluetooth Low Energy devices and NATS. To put it simply it acts as a gateway.

The publish rate of our devices can be pretty high some times, and we observed multiple times NATS flushing much more data than what we expected (DEFAULT_PENDING_SIZE). At the same time, the coroutine function _flush_pending() of nats.aio.Client was called many times, but the flusher did not have the chance to execute yet.

Reproduction

I created a script which reproduces the issue: https://github.com/charbonnierg/nats.py/blob/debug/publish_pending_queue_size/examples/many_publish.py

Here is an example of the (long) logs:

2022-01-14 00:49:49,017 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 1070681, max_size: 1048576)
2022-01-14 00:49:49,018 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 1119215, max_size: 1048576)
2022-01-14 00:49:49,018 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 1189709, max_size: 1048576)
2022-01-14 00:49:49,018 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 1204078, max_size: 1048576)
# 100 more similar lines with growing pending_size were removed before pasting
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6436012, max_size: 1048576)
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6499805, max_size: 1048576)
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6516089, max_size: 1048576)
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6586225, max_size: 1048576)
2022-01-14 00:49:49,052 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6666835, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6707166, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6724802, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6814196, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6911094, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6975466, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 6982040, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7079261, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7124865, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7197400, max_size: 1048576)
2022-01-14 00:49:49,053 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7251865, max_size: 1048576)
2022-01-14 00:49:49,054 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7341005, max_size: 1048576)
2022-01-14 00:49:49,054 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7344573, max_size: 1048576)
2022-01-14 00:49:49,054 - nats.aio.client - DEBUG - Kicking the flusher (pending_size: 7426482, max_size: 1048576)
2022-01-14 00:49:49,054 - nats.aio.client - DEBUG - Flushing 7426482 bytes
2022-01-14 00:49:49,057 - nats.aio.client - DEBUG - Flushed 7426482 bytes

As we can see, the writer flushes 7426482 bytes in one go, and ignores the pending size which is set to 1024*1024.

It seems that due to many publish, asyncio never let a chance to the _flusher coroutine to execute, and as such more data is appended to the pending queue.

Is this behaviour known ?

In our case, flushing so much data in one go is problematic. We do not want to flush data between each publish either because we want to benefit from the pending queue and avoid sending messages one at a time.

Resolution

I created a branch with a single commit which uses an asyncio.Future to wait until flusher finishes writing when_flush_pending is called.

    async def _flush_pending(
        self,
        block: bool = False,
        timeout: Optional[float] = None
    ) -> asyncio.Future:
        assert self._flush_queue, "Client.connect must be called first"
        try:
            future = asyncio.Future()
            if not self.is_connected:
                future.set_result(None)
                return future
            # kick the flusher!
            await self._flush_queue.put(future)
            # Optionally block
            if block:
                try:
                    await asyncio.wait_for(future, timeout)
                except asyncio.TimeoutError:
                    raise TimeoutError

Then we can simply block when pending size is greater than DEFAULT_PENDING_SIZE:

    async def _send_command(self, cmd, priority: bool = False) -> None:
        if priority:
            self._pending.insert(0, cmd)
        else:
            self._pending.append(cmd)
        self._pending_data_size += len(cmd)
        if self._pending_data_size > DEFAULT_PENDING_SIZE:
            _logger.debug(
                f"Kicking the flusher (pending_size: {self._pending_data_size}, max_size: {DEFAULT_PENDING_SIZE})"
            )
            # FIXME: Should a timeout be used now that this can block ?
            # Now: process_ping, send_publish, send_subscribe and send_unsubscribe can block as long as flushing in on-going
            await self._flush_pending(block=True)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
wallyqscommented, Jan 20, 2022

I think sounds good to add an option to publish to wait for the timeout if needed, maybe a similar flush default timeout as in flush? https://github.com/nats-io/nats.py/blob/main/nats/aio/client.py#L934

1reaction
charbonniergcommented, Jan 20, 2022

For your information, we’ve been using this fix for at least a month (sorry for not opening the issue sooner…) and we did not have any problem with it

Read more comments on GitHub >

github_iconTop Results From Across the Web

Standards without enforcement are nothing more than empty ...
According to this formula, if there is no real enforcement (i.e., if the value of enforcement is zero or close to zero), real...
Read more >
Fact-check: Do most counties refuse to enforce gun laws?
"The majority of counties in this country have declared that they are not going to enforce state and federal gun laws," Murphy said...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found