question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Merging some pdfs results in ExternalDocument returning error

See original GitHub issue

Hi,

I am using your library for merging multiples pdfs in one after uploading various documents from aa web application. While this is working without problems most of the time, certain documents fail the EternalDocument stage, without a specific error being returned. The process is the following :

  • documents are uploaded as data-uri from a web application
  • if documents are images they are converted to pdf before being uploaded,
  • when uploaded pdfs are stored as blob ( base64 ) in a database
  • when all documents have been successfully uploaded they are merged in a single pdf that is stored on disk;

The merge process, extract each pdf from the database blob, turn it back to data-uri and convert it to a buffer to pass it to ExternalDocument before to turn it in a recognized pdfjs pdf and add it to the merged pdf.

async function _mergeFiles(files, fileName) {

  var pdf = requireNode('pdfjs');

  var fs = requireNode('fs');

  var toBuffer = requireNode('data-uri-to-buffer');

  try {

    var doc = new pdf.Document();

    for (var i = 0; i < files.length; i++) {

      var file = files[i];
      var src,
        ext;

      src = file.dataUri.toBuffer();

      var dataUri = 'data:application/pdf;base64,' + src.toString('base64');

      src = toBuffer(dataUri);

      ext = new pdf.ExternalDocument(src);

      doc.setTemplate(ext);

      doc.addPagesOf(ext);


    }

    var writeStream = doc.pipe(fs.createWriteStream(fileName));

    await doc.end();

    var writeStreamClosedPromise = new Promise((resolve, reject) => {

      try {

        writeStream.on('close', () => resolve())

      } catch (e) {

        reject({file: file.name, sequence: file.sequence, reason: e});

      }

    })

    src = null;
    ext = null;
    doc = null;
    dataUri = null;

    return writeStreamClosedPromise;

  } catch (e) {

    reject({file: file.name, sequence: file.sequence, reason: e});

  }

}

this process works fine most of the time but some pdfs won’t pass the ExternalDocument stage.

ACTU 04-20.pdf CCF_000001.pdf LONGY_JULIE_Complément dossier_LONGY Julie.pdf LONGY_JULIE_DAMA_LONGY Julie.pdf LONGY_JULIE_detail dossier_longy julie.pdf

the above files are examples of pdfs that won’t pass the External Document step.

I am using the latest version of pdfjs : v2.3.7

thanks for your help

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
otroboecommented, Jun 16, 2020

Hello!

I don’t know if it would help, but I needed to merge PDF too, and this library saved me (so thanks a lot @rkusa, very nice job!).

My PDF files are generated with puppeeter and others are stored in AWS. I work with “Buffers” only, and it seems it’s what you need @rernens too.

Here’s my method:

const {Document, ExternalDocument} = require('pdfjs');

/**
 * Merge multiple PDF buffers into one buffer
 *
 * @param {Array} bufferList
 * @return {Promise}
 */
const mergeBufferPdfs = (bufferList) => {
    if (bufferList.length === 0) {
        throw new Error('You must pass buffers to merge a PDF');
    }

    const mergeDocument = new Document();
    let externalDocument;

    bufferList.forEach((buffer) => {
        externalDocument = new ExternalDocument(buffer);
        mergeDocument.addPagesOf(externalDocument);
    });

    return mergeDocument.asBuffer();
};

So far so good, I like when it’s simple. I hope it can help somehow.

1reaction
rernenscommented, Jun 9, 2020

@rkusa Hi Markus. Even if this was not the main use-case of pdfjs, so far it has proven to be the lighter weight et most reliable one for merging pdfs altogether. Tried many libraries and yours remains unmatchable so far even if some parsing issues remain. Thanks for that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why am I getting an error message when trying to combine ...
The problem was not software related. The files I was trying to merge into a single PDF file were stored in a shared...
Read more >
Error while merging existing PDF files (e.g. PdfLexer errors)
As a result sometimes we run into problems. Please carry out the following steps to verify that a merge operation is the cause...
Read more >
pdf-merger-js throws a TypeError and Fails - Stack Overflow
I got the same error whenever I used compressed PDFs for merging. Seems to be a bug in the underlying pdfjs library according...
Read more >
How to Work With a PDF in Python
You'll also learn how to merge, split, watermark, and rotate pages in PDFs using Python ... Some PDFs will return text and some...
Read more >
draft-melanchuk-sipping-msml-05 - IETF Datatracker
Processing within a transaction stops if any errors occur. ... When operators are used, the result of the join will return the name...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found