Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incorrectly Parsed Object on Microsoft invoice PDF

See original GitHub issue

Hi!

Thanks for a really welcome module.

I’m encountering thousands of different kinds of PDFs generated by other people, and got into some trouble with one specific one from Microsoft, getting the following error:

Incorrectly parsed object contents

These are the PDFs that I try to combine, I think the offending one is the top one as it’s the only one not generated by Puppeteer: Din_Microsoft-fakturaoversikt.pdf 3e63ebd0-8775-11e9-888e-1f95e38b402c.pdf

Presumably the PDF doesn’t follow the standards, though there’s little I can do about that.

My use case is to combine this PDF with a generated page that gives some info about it, for accounting purposes. As such, I don’t really need to parse it any more than what’s needed to append it to my PDF.

My code looks as follows:

// pdfsToMerge is an array of filePaths
async function mergePdfs(pdfsToMerge, filePath) {
  const mergedPdf = PDFDocumentFactory.create();
  pdfsToMerge.forEach(pdfFilePath => {
    const pdf = fs.readFileSync(pdfFilePath)
    const pagesToMerge = PDFDocumentFactory.load(pdf).getPages()
    pagesToMerge.forEach( page => {
      mergedPdf.addPage(page)
    })
  })
  const mergedPdfFile = await PDFDocumentWriter.saveToBytes(mergedPdf)
  await fs.writeFileSync(filePath, mergedPdfFile)
  logger.verbose("Merged PDFs", { mergedPdfs: pdfsToMerge, filePath });
  return
}

Issue Analytics

State:
Created 4 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

Hopdingcommented, Jun 9, 2019

@DanielJackson-Oslo The RC should be perfectly stable. The only change it includes is the fix for this issue. And of course, it passed all the unit and integration tests before I cut it. So if it’s working well for you, then there shouldn’t be anything to worry about. (I always cut RCs for every release, no matter how trivial the changes).

It would certainly be possible to get away with less object parsing (and therefore tolerate more invalid objects) if you just want to copy pages. However, in order to find and copy the page objects (and any other objects they reference) it is still necessary to parse some objects.

Implementing this sort of “lazy parsing” would take more than just writing a function, though. It would be necessary to modify some of pdf-lib’s parsing code. The parser currently scans input PDFs from start to finish, parsing each object it encounters along the way.

If this is something you’d be interested in working on, I’d be open to working with you on it. Just note that it would require learning about the structure of PDF files. Please open a new issue if you’d like to continue the discussion further!

1reaction

DanielJackson-Oslocommented, Jun 9, 2019

@DanielJackson-Oslo I’d like to add the Din_Microsoft-fakturaoversikt.pdf file you shared to the pdf-lib GitHub repo to create a regression test for this issue. Do you mind? Does the file contain any sensitive information?

@Hopding Feel free to use it! It’s a bill for my own Office 365, presumably the same one they generate for all customers.

Thanks for the quick follow up. Looking forward to 0.6.4 releasing. How stable is the rc?

Top Results From Across the Web

Parse text as JSON or XML (Power Query)

Parse text as JSON or XML (Power Query) ... You can parse (or deconstruct) the contents of a column with text strings that...

Error: Can't read file, or Presentation cannot be opened

Right-click the file in File Explorer and select Open. Still having trouble? If you're having a problem with PowerPoint that's not resolved here,...

Error while getting data from zoho invoices and to save ...

And noticed that code column doesn't have any value and hence the parsing is failing with above error message, please see below for...

Overview of Released Application Hotfixes for ...

This page lists application hotfixes (code fixes) that have been released in cumulative updates for Microsoft Dynamics NAV 2018.

Web service error codes (Microsoft Dataverse) - Power Apps

Message: {0} The Billing system cannot find the object (e.g. account or ... Message: Crm expression body parsing error occurred. 0x80040260

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Incorrectly Parsed Object on Microsoft invoice PDF

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[Question] keys for PDFNames and PDFCatalog

Merging 2 PDFs - Invalid key BSISpaces