question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Migrate STDOUT/STDIN exchanges from asynchio pipes to queues

See original GitHub issue

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/2900

Originally created by @aaronsteers on 2021-08-18 21:44:57


As discussed in #2743 and this comment (https://gitlab.com/meltano/meltano/-/issues/2743#note_569087851), asyncio queues can be used in place of pipes to send data between processes.

cc @pandemicsyn


From our code:

https://gitlab.com/meltano/meltano/-/blob/c58cb8e56be1b8bae0eefcb6ba906e3c2010852e/src/meltano/core/logging/output_logger.py#L201

   async def _read_from_fd(self, read_fd):
        # Since we're redirecting our own stdout and stderr output,
        # the line length limit can be arbitrarily large.
        line_length_limit = 1024 * 1024 * 1024  # 1 GiB

        reader = asyncio.StreamReader(limit=line_length_limit)
        read_protocol = asyncio.StreamReaderProtocol(reader)

        loop = asyncio.get_event_loop()
        read_transport, _ = await loop.connect_read_pipe(  # <<<<
            lambda: read_protocol, os.fdopen(read_fd)
        )

        await capture_subprocess_output(reader, self)

From https://docs.python.org/3/library/asyncio-platforms.html#windows:

SelectorEventLoop has the following limitations:

  • Pipes are not supported, so the loop.connect_read_pipe() and loop.connect_write_pipe() methods are not implemented.

Alternative implementation using queues from here:

async def first_pipe_cmd(command, queue, cwd="."):
   proc = await asyncio.create_subprocess_shell(
           command,
           stdout=asyncio.subprocess.PIPE,
           #stderr=asyncio.subprocess.PIPE, 
           cwd=cwd)
  #await asyncio.wait(_outstream_handler(proc.stderr, "stderr", "first_pipe_cmd")) #Broken at the moment
   data ="first"
   while data:
       data = await proc.stdout.readline()
       line = data.decode()
       if data: await queue.put(line)
       logging.info(f"Queue data for processes, data is {line}")
   logging.info("First piped process has completed")

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
aaronsteerscommented, Oct 9, 2022

@tayloramurphy - thanks for the ping.

On reviewing this, I don’t think it is a high priority as of now, given that:

  1. BATCH messages mitigate the performance side of the issue. (Mitigated in part although admittedly not in full.)
  2. Windows support is working now natively for both meltano run and elt. (Interop was one of the drivers here, if I remember correctly.)

There are still potential benefits, but there are risks also in terms of compatibility and performance. There’s another discussion (I think I the SDK repo) about preformance benchmarking and I think we’d want that process be in place before making a big change like this.

It’s also arguable that investing in adding BATCH support to stream maps (along with good docs) may be a better investment, since that would bring similar benefits of reducing memory pressure to legacy taps and targets as well as sdk-based ones.

@BuzzCutNorman, @visch - do you see strong value on this as of now or would you agree it’s ok to deprioritize?

1reaction
tayloramurphycommented, Oct 11, 2022

Windows does not work with elt

Thankfully we’ve documented it as such in https://docs.meltano.com/guide/installation-guide#windows 😅

Read more comments on GitHub >

github_iconTop Results From Across the Web

A non-blocking read on a subprocess.PIPE in Python
(Coming from google?) all PIPEs will deadlock when one of the PIPEs' buffer gets filled up and not read. e.g. stdout deadlock when...
Read more >
Event Loop — Python 3.11.1 documentation
The event loop is the core of every asyncio application. ... This method clears all queues and shuts down the executor, ... PIPE,...
Read more >
Working with Subprocesses — PyMOTW 3
The methods of the protocol class are called automatically based on I/O events for the subprocess. Because both the stdin and stderr arguments ......
Read more >
Kombu Documentation - Read the Docs
from kombu import Connection, Exchange, Queue ... When you transfer money from one bank to another, your bank sends a message to a...
Read more >
A curated list of awesome Go frameworks, libraries and ...
schema - Library to embed schema migrations for database/sql-compatible databases inside your Go binaries. skeema - Pure-SQL schema management system for MySQL, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found