question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

open() doesn't play nice with subprocess.run when used for STDIN

See original GitHub issue

What are you trying to achieve?

I want to run a command line program using a file on S3 as STDIN.

What is the expected result?

The file should stream into the command line program as standard input

What are you seeing instead?

UnsupportedOperation: fileno

Steps/code to reproduce the problem

from subprocess import run
from smart_open import open as smart_open
s3 = boto3.Session(profile_name="development").client("s3")
with smart_open("s3://bucket-name/path/to/file.gz", transport_params={"client": s3}, buffering=0) as f:
    run(("cat",), stdin=f)

Traceback

The following is the result while running the above in the ipython shell:

---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
Cell In [21], line 2
      1 with smart_open("REDACTED", transport_params={"client": s3}, buffering=0) as f:
----> 2     run(("cat",), stdin=f)

File /usr/lib/python3.10/subprocess.py:501, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    498     kwargs['stdout'] = PIPE
    499     kwargs['stderr'] = PIPE
--> 501 with Popen(*popenargs, **kwargs) as process:
    502     try:
    503         stdout, stderr = process.communicate(input, timeout=timeout)

File /usr/lib/python3.10/subprocess.py:832, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
    811     raise SubprocessError('Cannot disambiguate when both text '
    812                           'and universal_newlines are supplied but '
    813                           'different. Pass one or the other.')
    815 # Input and output objects. The general principle is like
    816 # this:
    817 #
   (...)
    827 # are -1 when not using PIPEs. The child objects are -1
    828 # when not redirecting.
    830 (p2cread, p2cwrite,
    831  c2pread, c2pwrite,
--> 832  errread, errwrite) = self._get_handles(stdin, stdout, stderr)
    834 # We wrap OS handles *before* launching the child, otherwise a
    835 # quickly terminating child could make our fds unwrappable
    836 # (see #8458).
    838 if _mswindows:

File /usr/lib/python3.10/subprocess.py:1603, in Popen._get_handles(self, stdin, stdout, stderr)
   1600     p2cread = stdin
   1601 else:
   1602     # Assuming file-like object
-> 1603     p2cread = stdin.fileno()
   1605 if stdout is None:
   1606     pass

File /usr/lib/python3.10/gzip.py:359, in GzipFile.fileno(self)
    353 def fileno(self):
    354     """Invoke the underlying file object's fileno() method.
    355 
    356     This will raise AttributeError if the underlying file object
    357     doesn't support fileno().
    358     """
--> 359     return self.fileobj.fileno()

I’ve tried this without the buffering=0 argument as well with the same results. If this isn’t possible, then I suppose my next best option would be to just pull the entire file down and do everything locally. The problem in my case is that the file is Very Large, so I can’t just do something simple like:

with smart_open("s3://bucket-name/path/to/file.gz") as f:
    run(("cat",), stdin=BytesIO(f.read()))

'cause I’m assuming that that would dump the whole file into RAM first.

Versions

Please provide the output of:

import platform, sys, smart_open
print(platform.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)
Linux-5.15.0-48-generic-x86_64-with-glibc2.35
Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0]
smart_open 6.2.0

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
mpenkovcommented, Sep 21, 2022

There probably is. If the dirty way feels wrong, have a look on SO, e.g. here: https://stackoverflow.com/questions/4846891/python-piping-output-between-two-subprocesses

0reactions
limedanielcommented, Sep 21, 2022

I tried to do exactly this with subprocess.run() but it ended up with the same error:

from subprocess import run

run(("gunzip",), stdin=run(("aws","s3","cp","s3://REDACTED.gz","-"), stdout=PIPE).stdout)
AttributeError: 'bytes' object has no attribute 'fileno'

I guess I could run the string with shell=True, but that felt dirty so I was hoping there was a Better Way:

run("aws s3 cp s3://REDACTED.gz - | gunzip", shell=True)

☝🏻 This works, but I just assumed that there’s a Pythonic way to do it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I pass a string into subprocess.Popen (using the stdin ...
This is the best answer for Python 3.4+ (using it in Python 3.6). It indeed does not work with check_call but it works...
Read more >
subprocess — Subprocess management — Python 3.11.1 ...
The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use...
Read more >
Python Tutorial: subprocesses module - 2020 - BogoToBogo
A program can create new processes using library functions such as those found in the os or subprocess modules such as os.fork(), subprocess.Popen(),...
Read more >
Subprocess management — Python 2.7.2 documentation
If a string is specified for args, it will be used as the name or path of the program to execute; this will...
Read more >
How To Use subprocess to Run External Programs in Python 3
You can use the subprocess.run function to run an external program from your Python code. First, though, you need to import the subprocess ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found