PDF Files cause `fromStream` to never finish
See original GitHub issueUsing 16.5.3, I’m seeing an issue where an await
’ed call to fromStream
with a stream of a PDF file never actually resolves
e.g.:
const fileType = await FileType.fromStream(stream);
I haven’t yet been able to test the 17.x release series due other blocking dependencies.
It seems that removing the await
from line 592 of node_modules\file-type\core.js
causes the function to return properly; and suggests a reference to https://github.com/Borewit/strtok3/issues/551
i.e. this “works”:
if (checkString('%PDF')) {
await tokenizer.ignore(1350);
const maxBufferSize = 10 * 1024 * 1024;
const buffer = Buffer.alloc(Math.min(maxBufferSize, tokenizer.fileInfo.size));
tokenizer.readBuffer(buffer, {mayBeLess: true});
and this doesn’t:
if (checkString('%PDF')) {
await tokenizer.ignore(1350);
const maxBufferSize = 10 * 1024 * 1024;
const buffer = Buffer.alloc(Math.min(maxBufferSize, tokenizer.fileInfo.size));
await tokenizer.readBuffer(buffer, {mayBeLess: true});
Issue Analytics
- State:
- Created 2 years ago
- Comments:9
Top Results From Across the Web
iTextSharp + FileStream = Corrupt PDF file - Stack Overflow
I think your problem was that you weren't properly adding content to your PDF. This is done through the Document.Add() method and you...
Read more >Troubleshoot viewing PDF files on the web - Adobe Support
Follow these steps to solve the common issues around viewing PDF files from a website.
Read more >Open and Save PDF file in C# and VB.NET - Syncfusion
This page describes how to open and save PDF file from or to file system, and stream in C# and VB.NET using Syncfusion...
Read more >Parse PDF Files While Retaining Structure with Tabula-py
It's hard to copy-and-paste rows of data out of PDF files. Try tabula-py to extract data into a CSV or Excel spreadsheet using...
Read more >Inside the PDF File Format - Command Line Fanatic
So what are these PDFs? Why PDF rather than HTML? The truth is that PDF, or Portable Document Format , gets sort of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@Borewit thanks for the tips.
I think we can close this issue
fileTypeStream(), in v16 this was stream(), is probably the closest thing to work around your issue. But it comes at a price, it a has a limited sample size preventing the read to much at the cost of some file type determination.