question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Helper class to source any file-like object

See original GitHub issue

I have a use case where I’m picking up CSV over SFTP (using a library called pysftp) and running it through a pipeline. Initially I thought it would look something like this:

import petl as etl
import pysftp

with pysftp.Connection(hostname, username=username, password=password) as conn:
    with conn.open('path/to/file.csv') as f:
            # f is a file-like object
            (etl.fromcsv(f)
                ...
            )

But that raised an error since f doesn’t implement open() and therefore can’t be opened. So I ended up writing a wrapper like:

class SftpSource(object):
    def __init__(self, conn, path):
        self.conn = conn
        self.path = path

    @contextmanager
    def open(self, mode='r'):
        if not mode.startswith('r'):
            raise ArgumentError('source is read-only')
        f = self.conn.open(self.path)
        try:
            yield f
        finally:
            f.close()

Which wasn’t much trouble at all, but I was trying to think of how you could generalize that (and wrap it into petl) to handle basically any file-like object. Something like:

class FloSource(object):
    def __init__(self, open_, *args, **kwargs):
        self.open_ = open_
        self.args = args
        self.kwargs = kwargs

    @contextmanager
    def open(self, mode='r'):
        f = self.open_(*args, **kwargs)
        try:
            yield f
        finally:
            f.close()

And used like:

conn = pysftp.Connection(...)
source = etl.FloSource(conn.open, '/path/to/data.csv')
(etl.fromcsv(source)
     ...
)

Just a thought, in case this might be helpful to others. Would be happy to work on a PR.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
alimanfoocommented, Jan 25, 2017

Thanks Robert. I’d be happy to include some generalisation to support this. Would just need to make sure it works properly with the context manager protocol to ensure whatever is opened is closed finally. Using the pattern below I think this means that whatever is returned by opener would need to support the context manager protocol. Is this a reasonable expectation do you think?

On Wednesday, January 25, 2017, Robert Martin notifications@github.com wrote:

Actually, this looks a lot like the existing FileSource class. I suppose you could also give FileSource a kwarg for an open function. This worked for my purposes:

class FileSource(object): def init(self, path, opener=None, **kwargs): self.path = path self.opener = opener self.kwargs = kwargs

def open(self, mode='r'):
    if self.opener:
        return self.opener(self.path, **self.kwargs)
    return io.open(self.filename, mode, **self.kwargs)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/petl/issues/409#issuecomment-275254308, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QmiCYCRI_E-pQQhvTEEG7XvBCvgLks5rV82vgaJpZM4LuEb- .

– Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

0reactions
juarezrcommented, Jul 3, 2020

@rbrtmrtn ,

With release v1.5.0 there is new function for registering custom sources that fill this role.

Also with release v1.6.0 there is support for reading for remote sources by using the package fsspec, including SFTP servers.

For this working is required:

  1. Install petl: pip install petl
  2. Install fsspec: pip install fsspec
  3. Install paramiko: pip install paramiko
  4. Use a full url pointing to the file in the remote server in from...() and to...() functions.

E.g:

import petl as etl

myurl = "sftp://myuser:mypassword@myserver/path/to/myfile.csv"
table2 = etl.fromcsv(myurl)
# ...

Do you think this closes this issue?

Although #410 could be useful for other custom cases.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Write java class object to java source file - Stack Overflow
I want to write it to a java file. I don't want to write a lot of code for this job. Is there...
Read more >
FileHelper (Oracle® Coherence Java API Reference)
Validate that the given File exists and represents a directory. Methods inherited from class java.lang.Object · clone, equals, finalize, getClass, hashCode, ...
Read more >
io — Core tools for working with streams — Python 3.11.1 ...
Source code: Lib/io.py Overview: The io module provides Python's main facilities for dealing with ... Other common terms are stream and file-like object....
Read more >
Is putting general-use functions in a "helpers" file an anti ...
So, my answer would be: Utility files that group together static functions aren't bad. Java's Math class has already been given as an ......
Read more >
File Class (System.IO) | Microsoft Learn
Provides static methods for the creation, copying, deletion, moving, and opening of a single file, and aids in the creation of FileStream objects....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found