Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request]: Use PDF as a template, similar to mail merge in word

See original GitHub issue

I’m looking for a client-side library that will allow me to create a preview of a document for a client. Idea is to create a simple HTML form with couple of inputs and a preview button that will generate a pdf based on a template. I’m aware that I can load an existing PDF and add an overlay to it (as shown in samples), but I’d like to replace text, for example, I’d like to replace {{name}} with John and {{surname}} with Smith.

I’ve searched over the issues and found https://github.com/Hopding/pdf-lib/issues/33 and https://github.com/Hopding/pdf-lib/issues/137, as I understand Your library doesn’t support reading the text, so please consider this as a feature request. With this one feature, Your library would be an ideal solution for client-side pdf manipulation.

Issue Analytics

State:
Created 4 years ago
Reactions:18
Comments:14 (5 by maintainers)

Top GitHub Comments

2reactions

Hopdingcommented, Dec 31, 2019

Implementing this feature without using AcroForms presents three main challenges:

Locating the placeholders. This requires pdf-lib to sift through all the content streams in a document and locate all the text drawing operators. This wouldn’t be too difficult to do. The challenging part is mapping the glyph IDs to unicode text. This would be a significant undertaking. The PDF specification defines a ridiculous number of ways to store fonts and encode text. Writing code to support all of them is entirely possible to do. It would just take a lot of time and effort. The final step in this process is to process all the unicode text and produce a list of words/sentences/paragraphs in the document. You might think this last step would be simple, but it is not. PDF does not store text in a structured format like HTML. It just says to draw characters at X/Y coordinates. So you’d need to convert these spatial coordinates to structured text.
Encoding the replacement text. Presumably, you’d want this feature to automatically draw the replacement text in the same font as the placeholders. This is also much harder than you might expect. For example, the font the placeholders were drawn in might have been subsetted, meaning it might not support the replacement text. And even if it does, you’d need to extract all the font objects for the placeholder font and figure out how to encode the new text (because, again, the PDF spec allows all sorts of fonts and encodings).
Laying out new text block. As @Misiu mentioned, it’s highly unlikely that the replacement text will have the same length as the placeholder text. This means that you’d need to handle laying out the text already present on the document, not just the placeholders. And not necessarily just the sentence of paragraph to which the placeholders belonged. If the replacement text it long enough, it might require other paragraphs to be relaid out. And what happens if you end up exceeding the page length? And this is assuming your dealing with simple paragraphs of text. Many PDF documents have all sorts of fancy images and complicated layouts that would be extremely difficult to identify and handle automatically.

There are some shortcuts that could be taken if we placed some restrictions on the feature. For example, we could make (1) much easier if we required the placeholder text to be tagged with marked content operators (see section 14.6 MarkedContent of the PDF spec). But this would require the placeholders to be created in a special way, so it wouldn’t be able to identify arbitrary strings of text like {{foo}}.

We could make (2) much easier as well, if we didn’t try to automatically extract and reuse the font that the placeholders were drawn in. This step would be fairly straightforward if we required you to embed/provide your own font, just like you’d do for PDFPage.drawText.

But as for (3), I’m not too sure what could be done to simplify this. I’m open to ideas though! I’m sure other PDF libraries (such as iText or PDFBox) support text extraction and replacement in some form/fashion. So it’d be interesting to see how they handle this part.

1reaction

DaveLocommented, Dec 27, 2019

@Hopding , I’m interested in this functionality for use in variable data printing, at my company we send a customized instruction booklet to customers.

Our current solution converts HTML to PDF , but we are reaching the point where development is constraining our design team since every change means a lengthy rebuild.

Using this library there are places where I can easily put dynamic objects in blank areas (images, barcodes, etc), but other places where having string interpolation would be hugely helpful (Hello, {{name}} welcome to {{service}})