question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parse Excel from in-memory file object

See original GitHub issue

We came across a situation where we had a file object representing Excel data (came from HTTP POST but I’m thinking it could also come from MongoDB for example), and would’ve liked to pass it directly to Pandas to parse (vs saving it to disk and passing path to Pandas).

Could this be possible?

I saw that xlrd had file_contents as a possible argument of open_workbook: https://github.com/python-excel/xlrd/blob/master/xlrd/__init__.py#L385

Maybe ExcelFile in Pandas could take path_or_buffer as argument, and pass along the correct one to xlrd. https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L1133

Don’t know if that could work for openpyxl also.

Thoughts?

Thanks! Nicolas

Issue Analytics

  • State:closed
  • Created 11 years ago
  • Reactions:1
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
nicolasherycommented, Jun 28, 2012

Haha well done! I didn’t know we were racing 😉

Yeah didn’t get a chance to work on it as much as I wanted. But I did spot:

I also saw that xlrd was going to support Excel 2007 in future versions, I don’t know what you want to do about that, ie keep using both, or switch to only xlrd.

I didn’t know about tempfile (nice trick!). I was going to just feed xlrd the bytes from file.read(), but that wouldn’t work for openpyxl. I also had put together a function to check what type of Excel file it is, inspired with how the master branch of xlrd does it:

def _excel_type(filepath_or_buffer):
    # Thanks to xlrd for this
    peeksz = 4
    if isinstance(filepath_or_buffer, str):
        f = open(filepath_or_buffer, "rb")
        peek = f.read(peeksz)
        f.close()
    elif hasattr(filepath_or_buffer, 'read') \
            and hasattr(filepath_or_buffer, 'seek'):
        f = filepath_or_buffer
        peek = f.read(peeksz)
        f.seek(0)
    else:
        raise TypeError("You must provide the path to a file "
                        "or a file-like object")
    # Check if ZIP file
    if peek == "PK\x03\x04" \
        or peek == "PK\x03\x04".encode('latin1'): # Python 3
        return 'xlsx'
    else:
        return 'xls'

Then I would’ve checked the type (and I like your way of checking for file-like, ie needs a read method, possibly a seek too, this way it works with “file objects” coming from Flask HTTP uploads, or MongoDB GridFS…), and done:

wb = xlrd.open_workbook(filename=filename)
# or
f = open(filename, 'rb')
bytes = f.read()
f.close()
wb = xlrd.open_workbook(file_contents=bytes)

I guess the only advantage there is it saves having to use a tempfile and an I/O trip to the disk. But your solution has the advantage that it just works, and also is compatible with openpyxl.

Thanks for taking the time!

0reactions
ludaavicscommented, Aug 20, 2012

sweet

Read more comments on GitHub >

github_iconTop Results From Across the Web

Work with excel files in memory — pyexcel 0.1.7 documentation
Excel files in memory can be manipulated directly without saving it to physical disk and vice versa. This is useful in excel file...
Read more >
Process File in memory using python - excel - Stack Overflow
The traverse function uses the xlrd library for parsing the excel. I would like to perform the same operation without requiring downloading the ......
Read more >
How do you read an Excel File from a memory stream - MSDN
I thought about trying to use the PIA's to cast the stream and an excel object. Any thoughts?
Read more >
Parse and read a large spreadsheet document (Open XML SDK)
This topic shows how to use the classes in the Open XML SDK 2.5 for Office to programmatically read a large Excel file....
Read more >
ASP.NET Core Blazor file uploads - Microsoft Learn
To read data from a user-selected file, call IBrowserFile.OpenReadStream on the file and read from the returned stream. For more information, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found