Parse Excel from in-memory file object
See original GitHub issueWe came across a situation where we had a file object representing Excel data (came from HTTP POST but I’m thinking it could also come from MongoDB for example), and would’ve liked to pass it directly to Pandas to parse (vs saving it to disk and passing path to Pandas).
Could this be possible?
I saw that xlrd had file_contents
as a possible argument of open_workbook
:
https://github.com/python-excel/xlrd/blob/master/xlrd/__init__.py#L385
Maybe ExcelFile
in Pandas could take path_or_buffer
as argument, and pass along the correct one to xlrd.
https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L1133
Don’t know if that could work for openpyxl also.
Thoughts?
Thanks! Nicolas
Issue Analytics
- State:
- Created 11 years ago
- Reactions:1
- Comments:11 (5 by maintainers)
Top Results From Across the Web
Work with excel files in memory — pyexcel 0.1.7 documentation
Excel files in memory can be manipulated directly without saving it to physical disk and vice versa. This is useful in excel file...
Read more >Process File in memory using python - excel - Stack Overflow
The traverse function uses the xlrd library for parsing the excel. I would like to perform the same operation without requiring downloading the ......
Read more >How do you read an Excel File from a memory stream - MSDN
I thought about trying to use the PIA's to cast the stream and an excel object. Any thoughts?
Read more >Parse and read a large spreadsheet document (Open XML SDK)
This topic shows how to use the classes in the Open XML SDK 2.5 for Office to programmatically read a large Excel file....
Read more >ASP.NET Core Blazor file uploads - Microsoft Learn
To read data from a user-selected file, call IBrowserFile.OpenReadStream on the file and read from the returned stream. For more information, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Haha well done! I didn’t know we were racing 😉
Yeah didn’t get a chance to work on it as much as I wanted. But I did spot:
file
object (not file-like, has to be an instance offile
, which is kind of limitting…) https://github.com/chronossc/openpyxl/blob/master/openpyxl/reader/excel.py#L43I also saw that xlrd was going to support Excel 2007 in future versions, I don’t know what you want to do about that, ie keep using both, or switch to only xlrd.
I didn’t know about
tempfile
(nice trick!). I was going to just feed xlrd the bytes from file.read(), but that wouldn’t work for openpyxl. I also had put together a function to check what type of Excel file it is, inspired with how the master branch of xlrd does it:Then I would’ve checked the type (and I like your way of checking for file-like, ie needs a
read
method, possibly aseek
too, this way it works with “file objects” coming from Flask HTTP uploads, or MongoDB GridFS…), and done:I guess the only advantage there is it saves having to use a tempfile and an I/O trip to the disk. But your solution has the advantage that it just works, and also is compatible with openpyxl.
Thanks for taking the time!
sweet