question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

st.file_uploader produces undesired results for some pdfs.

See original GitHub issue

Hi everyone!

Something (undesired) is happening to some pdfs when they are stored in memory by st.file_uploader.

I have a streamlit app that allows user to upload documents (docx, pdf, txt) and automatically processes and cleans them.

The upload and processing works just fine, however I noticed that when I am running the EXACT SAME functions line by line but with the pdf simply inputted from my local storage/file path, it produces very different outcomes (usually for the streamlit upload some words are not correctly processed).

Again, the only difference is that, in scenario A the pdf is uploaded via st.file_uploader and in scenario B the pdf is given by a local path. I therefore assume that the pdf is somehow differently stored by st.file_uploader and I am not sure how to fix this. Please note that the correct output is coming from defining local file paths. The streamlit output ist faulty.

Expected behavior:

Pdf uploaded by defining local file path: 2nd paragraph (correctly processed words): correct

Actual behavior:

Pdf uploaded over st.file_uploader: 2nd paragraph (words not correctly processed):

faulty

Additional information:

This behaviour has only been showing up for pdfs, not for docx, txt files so far.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11

github_iconTop GitHub Comments

1reaction
willhuang1997commented, Jun 27, 2022

@kajarenc , would you happen to know what is wrong with this off the top of your head? I looked at some of the code but couldn’t find anything that would help point me in the right direction.

0reactions
jonas-nothnagelcommented, Jul 15, 2022

Dear @kajarenc and @willhuang1997

After some further digging, I found the problem to be already mentioned quite a lot by the community. Please see here: https://github.com/streamlit/streamlit/issues/904

I am now closing this now. Thank you very much for your engagement and please excuse the mistake on my side. I hope for the issue above we will be able to find a solution soon.

A temporary fix is to store the file in a temporary folder as outlined here https://github.com/deepset-ai/haystack/issues/2824

file = st.file_uploader("File upload", type=["pdf"])

with tempfile.NamedTemporaryFile(mode="wb") as temp:
    bytes_data = files.getvalue()
    temp.write(bytes_data)
    print(temp.name)
Read more comments on GitHub >

github_iconTop Results From Across the Web

st.file_uploader - Streamlit Docs
A tooltip that gets displayed next to the file uploader. on_change (callable). An optional callback invoked when this file_uploader's value changes.
Read more >
Download and open PDF file using Ajax - javascript
Here is how I got this working. $.ajax({ url: '<URL_TO_FILE>', success: function(data) { var blob=new Blob([data]); var link=document.
Read more >
Issues Uploading Documents: Common Errors, Causes, ...
Overview. This article will help you diagnose and resolve common issues when uploading PDFs or other document types via the DocuSign web ...
Read more >
How to Edit a PDF | PDF Editor
There are several software tools available that make editing a PDF a straightforward process. If you're using your computer or want to edit ......
Read more >
Working with File Uploads In Streamlit Python - YouTube
In this indepth tutorial we will be working with the file uploads feature of streamlit - how to process Images, PDF,Docx,Txt,CSV etc.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found