question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

cancelling main starting coroutine firstly (before all other pending tasks)

See original GitHub issue

Hi, I found the case when I want to cancel main starting coroutine (and await it’s cancellation) before the cancellation of all other pending loop’s tasks. Perhaps, it’s not the issue of aiorun package directly, but let me ask advice how to deal with such case (if it can be solved using aiorun).

The case can be reproduced by using code below:

import argparse
import asyncio
import logging
import signal

from aiorun import run


class SomeThirdPartyRunner:
    def __init__(self):
        self._queue = list(range(5))
        # actually, I can't use `shutdown_waits_for` below because it's 3'rd party package
        self._work_task = asyncio.create_task(self._work())
        self._work_task_done = asyncio.Future()

    async def _work(self):
        try:
            while self._queue:
                logging.info(f'processing queue={self._queue}')
                self._queue.pop()
                await asyncio.sleep(1)
        except asyncio.CancelledError:
            logging.info(' *** we are here only when aiorun is used *** ')

        finally:
            self._work_task_done.set_result(None)

    async def wait_done(self):
        await self._work_task_done


async def corofn():
    runner = SomeThirdPartyRunner()
    try:
        await asyncio.sleep(2)

    except asyncio.CancelledError:
        pass

    finally:
        logging.info('stopping runner...')
        await runner.wait_done()
        logging.info('runner stopped')


def run_through_aiorun():
    run(_aiorun_main())


async def _aiorun_main():
    await corofn()


def run_through_asyncio():
    asyncio.run(_asyncio_main())


async def _asyncio_main():
    loop = asyncio.get_event_loop()

    task = loop.create_task(corofn())
    task.add_done_callback(lambda _: loop.stop())

    loop.add_signal_handler(signal.SIGINT, task.cancel)
    loop.add_signal_handler(signal.SIGTERM, task.cancel)

    await task


if __name__ == '__main__':
    logging.basicConfig(
        level=logging.DEBUG,
        format="%(asctime)s - [%(levelname)s] - [%(name)s] "
               "- %(filename)s:%(lineno)d - %(message)s",
    )

    parser = argparse.ArgumentParser()
    parser.add_argument('--aiorun',
                        action='store_true',
                        help='Run script using aiorun package')
    args = parser.parse_args()

    if args.aiorun:
        logging.info('Running using aiorun')
        run_through_aiorun()
    else:
        logging.info('Running using asyncio')
        run_through_asyncio()

Starting script using aiorun and press Ctrl+C on 2’nd message processing queue:

(env_3_8) MacBook-Pro-2:~ fedir$ python script.py --aiorun
[INFO] - [root] - script.py:83 - Running using aiorun
[DEBUG] - [aiorun] - aiorun.py:155 - Entering run()
[DEBUG] - [asyncio] - selector_events.py:59 - Using selector: KqueueSelector
[DEBUG] - [aiorun] - aiorun.py:236 - Creating default executor
[INFO] - [root] - script.py:18 - processing queue=[0, 1, 2, 3, 4]
[INFO] - [root] - script.py:18 - processing queue=[0, 1, 2, 3]
^C[DEBUG] - [aiorun] - aiorun.py:304 - Entering shutdown handler
[CRITICAL] - [aiorun] - aiorun.py:317 - Stopping the loop
[INFO] - [aiorun] - aiorun.py:249 - Entering shutdown phase.
[INFO] - [aiorun] - aiorun.py:262 - Cancelling pending tasks.
[DEBUG] - [aiorun] - aiorun.py:264 - Cancelling task: <Task pending name='Task-1' coro=<run.<locals>.new_coro() running at /Users/fedir/env/env_3_8/lib/python3.8/site-packages/aiorun.py:206> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x10a00fdf0>()]>>
[DEBUG] - [aiorun] - aiorun.py:264 - Cancelling task: <Task pending name='Task-2' coro=<SomeThirdPartyRunner._work() running at script.py:20> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x10a00fe50>()]>>
[INFO] - [aiorun] - aiorun.py:276 - Running pending tasks till complete
[INFO] - [root] - script.py:40 - stopping runner...
[INFO] - [root] - script.py:22 -  *** we are here only when aiorun is used ***
[INFO] - [root] - script.py:42 - runner stopped
[INFO] - [aiorun] - aiorun.py:281 - Waiting for executor shutdown.
[INFO] - [aiorun] - aiorun.py:286 - Shutting down async generators
[INFO] - [aiorun] - aiorun.py:288 - Closing the loop.
[INFO] - [aiorun] - aiorun.py:290 - Leaving. Bye!

Desired result for me in this case is do not cancel corofn() along with SomeThirdPartyRunner._work().

Starting script using asyncio and press Ctrl+C on the same log msg:

(env_3_8) MacBook-Pro-2:~ fedir$ python script.py
[INFO] - [root] - script.py:86 - Running using asyncio
[DEBUG] - [asyncio] - selector_events.py:59 - Using selector: KqueueSelector
[INFO] - [root] - script.py:18 - processing queue=[0, 1, 2, 3, 4]
[INFO] - [root] - script.py:18 - processing queue=[0, 1, 2, 3]
^C[INFO] - [root] - script.py:40 - stopping runner...
[INFO] - [root] - script.py:18 - processing queue=[0, 1, 2]
[INFO] - [root] - script.py:18 - processing queue=[0, 1]
[INFO] - [root] - script.py:18 - processing queue=[0]
[INFO] - [root] - script.py:42 - runner stopped

I’ve done some updates into aiorun to receive expected result: https://github.com/cjrh/aiorun/blob/master/aiorun.py#L210

_origin_coro_task = loop.create_task(new_coro())

and then on the start of “Entering the shutdown phase”: https://github.com/cjrh/aiorun/blob/master/aiorun.py#L249

if _origin_coro_task is not None:
    logger.debug("Cancelling origin coro task: %s", _origin_coro_task)
    _origin_coro_task.cancel()
    loop.run_until_complete(_origin_coro_task)

And received expected result below:

(env_3_8) MacBook-Pro-2:~ fedir$ python script.py --aiorun
[INFO] - [root] - script.py:83 - Running using aiorun
[DEBUG] - [aiorun] - aiorun.py:155 - Entering run()
[DEBUG] - [asyncio] - selector_events.py:59 - Using selector: KqueueSelector
[DEBUG] - [aiorun] - aiorun.py:237 - Creating default executor
[INFO] - [root] - script.py:18 - processing queue=[0, 1, 2, 3, 4]
[INFO] - [root] - script.py:18 - processing queue=[0, 1, 2, 3]
^C[DEBUG] - [aiorun] - aiorun.py:309 - Entering shutdown handler
[CRITICAL] - [aiorun] - aiorun.py:322 - Stopping the loop
[INFO] - [aiorun] - aiorun.py:250 - Entering shutdown phase.
[DEBUG] - [aiorun] - aiorun.py:252 - Cancelling origin coro task: <Task pending name='Task-1' coro=<run.<locals>.new_coro() running at /Users/fedir/env/env_3_8/lib/python3.8/site-packages/aiorun.py:207> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x105197d90>()]>>
[INFO] - [root] - script.py:40 - stopping runner...
[INFO] - [root] - script.py:18 - processing queue=[0, 1, 2]
[INFO] - [root] - script.py:18 - processing queue=[0, 1]
[INFO] - [root] - script.py:18 - processing queue=[0]
[INFO] - [root] - script.py:42 - runner stopped
[INFO] - [aiorun] - aiorun.py:267 - Cancelling pending tasks.
[INFO] - [aiorun] - aiorun.py:281 - Running pending tasks till complete
[INFO] - [aiorun] - aiorun.py:286 - Waiting for executor shutdown.
[INFO] - [aiorun] - aiorun.py:291 - Shutting down async generators
[INFO] - [aiorun] - aiorun.py:293 - Closing the loop.
[INFO] - [aiorun] - aiorun.py:295 - Leaving. Bye!

It would be great to receive expected result using the current aiorun functionality. Thx in advance for feedback. (env versions)

(env_3_8) MacBook-Pro-2:~ fedir$ python -V
Python 3.8.1
(env_3_8) MacBook-Pro-2:~ fedir$ pip freeze | grep aiorun
aiorun==2020.1.3

P.S. I can provide more practical example of such behaviour if needed (using the combination of aiorun and aiokafka packages).

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
cjrhcommented, Feb 5, 2020

I have reproduced what you observe, using the docker-compose file and your example.

  • In your try-finally (after CancelledError has been raised inside all the running tasks), we call consumer.stop(), same as in the aiokafka docs (I read through their guide).
  • Inside consumer.stop, aiokafka calls self._coordinator.close():
    @asyncio.coroutine
    def stop(self):
        """ Close the consumer, while waiting for finilizers:

            * Commit last consumed message if autocommit enabled
            * Leave group if used Consumer Groups
        """
        if self._closed:
            return
        log.debug("Closing the KafkaConsumer.")
        self._closed = True
        if self._coordinator:
            yield from self._coordinator.close()
        if self._fetcher:
            yield from self._fetcher.close()
        yield from self._client.close()
        log.debug("The KafkaConsumer has closed.")
  • Inside the group coordinator close, aiokafka calls self._maybe_leave_group():
    @asyncio.coroutine
    def close(self):
        """Close the coordinator, leave the current group
        and reset local generation/memberId."""
        if self._closing.done():
            return

        self._closing.set_result(None)
        # We must let the coordination task properly finish all pending work
        if not self._coordination_task.done():
            yield from self._coordination_task
        yield from self._stop_heartbeat_task()
        yield from self._stop_commit_offsets_refresh_task()

        yield from self._maybe_leave_group()     # <---- HERE
  • Inside _maybe_leave_group, aiokafka calls self._send_req(request), to leave the group:
    @asyncio.coroutine
    def _maybe_leave_group(self):
        if self.generation > 0:
            # this is a minimal effort attempt to leave the group. we do not
            # attempt any resending if the request fails or times out.
            version = 0 if self._client.api_version < (0, 11, 0) else 1
            request = LeaveGroupRequest[version](self.group_id, self.member_id)
            try:
                yield from self._send_req(request)
            except Errors.KafkaError as err:
                log.error("LeaveGroup request failed: %s", err)
            else:
                log.info("LeaveGroup request succeeded")
        self.reset_generation()
  • It never gets a response, because the _read task has already been killed (as you already figured out)
  • The _read task just blindly absorbs cancellation in its add_done_callback():
    @staticmethod
    def _on_read_task_error(self_ref, read_task):
        # We don't want to react to cancelled errors
        if read_task.cancelled():
            print('read_task was cancelled')   
            return                             # <----- HERE

        try:
            read_task.result()
        except (OSError, EOFError, ConnectionError) as exc:
            self_ref().close(reason=CloseReason.CONNECTION_BROKEN, exc=exc)
        except Exception as exc:
            self = self_ref()
            self.log.exception("Unexpected exception in AIOKafkaConnection")
            self.close(reason=CloseReason.CONNECTION_BROKEN, exc=exc)
  • I added the print() in the snipped above, you can see it print out after you press CTRL-C

This problem is not specific to aiorun: the same problem happens with the standard library asyncio.run, unless you override the default cancel-all-tasks behaviour by implementing your own signal handler, as you’ve already shown.

So I think in the short-to-medium term your best option is to set up a custom signal handler, like you’ve already done, and control the shutdown sequence.

It is not good for libraries to require special considerations for shutdown - if many libraries did this, it would become very difficult to combine several of them together in one asyncio service. This is why all libraries should be designed so that all tasks can receive CancelledError and shut down in an orderly process. Anyway, that’s not up to you and me I guess.

aiokafka will have to change how their shutdown process works in order to be compatible with default asyncio.run behaviour (and also therefore aiorun). It’s telling that the aiokafka examples only use loop.run_until_complete() instead of asyncio.run. I had a look at the aiokafka source code. What they could do is absorb cancellation inside the AIOKafkaConnection._read method. This the current _read method:

    @classmethod
    @asyncio.coroutine
    def _read(cls, self_ref):
        # XXX: I know that it become a bit more ugly once cyclic references
        # were removed, but it's needed to allow connections to properly
        # release resources if leaked.
        # NOTE: all errors will be handled by done callback

        reader = self_ref()._reader
        while True:
            resp = yield from reader.readexactly(4)
            size, = struct.unpack(">i", resp)

            resp = yield from reader.readexactly(size)
            self_ref()._handle_frame(resp)

Perhaps something like this might work:

    @classmethod
    @asyncio.coroutine
    def _read(cls, self_ref):
        # XXX: I know that it become a bit more ugly once cyclic references
        # were removed, but it's needed to allow connections to properly
        # release resources if leaked.
        # NOTE: all errors will be handled by done callback

        reader = self_ref()._reader
        while True:
            try:
                resp = yield from reader.readexactly(4)
            except asyncio.CancelledError:
                continue
            size, = struct.unpack(">i", resp)

            resp = yield from reader.readexactly(size)
            self_ref()._handle_frame(resp)

I did only very limited testing, but this appears to shutdown clean. (It’s still qutie messy though).

1reaction
cjrhcommented, Jan 31, 2020

@FedirAlifirenko Hi, and thanks for the report. It may take me a few days to get to it but it looks interesting. I just wanted to let you know I’ve seen your report and I will definitely go through it as soon as I have time available.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cancellation in coroutines - Medium
First coroutine will be cancelled and the other one won't be affected job1.cancel(). A cancelled child doesn't affect other siblings.
Read more >
asyncio: Why does cancelling a task lead to ... - Stack Overflow
To cancel a running Task use the cancel() method. Calling it will cause the Task to throw a CancelledError exception into the wrapped...
Read more >
Coroutines and Tasks — Python 3.11.1 documentation
Coroutines, Awaitables, Creating Tasks, Task Cancellation, Task Groups, ... await asyncio.sleep(delay) print(what) async def main(): print(f"started at ...
Read more >
Asyncio gather() Cancel All Tasks if One Task Fails
You can cancel all tasks when one task fails when using asyncio.gather() by manually traversing the list of tasks can cancel them manually....
Read more >
How to Manage Exceptions When Waiting On Multiple ...
When awaiting a single coroutine exception handling is managed in exactly ... The scheduler executes the first task (main) until it hands ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found