question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request]: Use PDF as a template, similar to mail merge in word

See original GitHub issue

I’m looking for a client-side library that will allow me to create a preview of a document for a client. Idea is to create a simple HTML form with couple of inputs and a preview button that will generate a pdf based on a template. I’m aware that I can load an existing PDF and add an overlay to it (as shown in samples), but I’d like to replace text, for example, I’d like to replace {{name}} with John and {{surname}} with Smith.

I’ve searched over the issues and found https://github.com/Hopding/pdf-lib/issues/33 and https://github.com/Hopding/pdf-lib/issues/137, as I understand Your library doesn’t support reading the text, so please consider this as a feature request. With this one feature, Your library would be an ideal solution for client-side pdf manipulation.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:18
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
Hopdingcommented, Dec 31, 2019

Implementing this feature without using AcroForms presents three main challenges:

  1. Locating the placeholders. This requires pdf-lib to sift through all the content streams in a document and locate all the text drawing operators. This wouldn’t be too difficult to do. The challenging part is mapping the glyph IDs to unicode text. This would be a significant undertaking. The PDF specification defines a ridiculous number of ways to store fonts and encode text. Writing code to support all of them is entirely possible to do. It would just take a lot of time and effort. The final step in this process is to process all the unicode text and produce a list of words/sentences/paragraphs in the document. You might think this last step would be simple, but it is not. PDF does not store text in a structured format like HTML. It just says to draw characters at X/Y coordinates. So you’d need to convert these spatial coordinates to structured text.
  2. Encoding the replacement text. Presumably, you’d want this feature to automatically draw the replacement text in the same font as the placeholders. This is also much harder than you might expect. For example, the font the placeholders were drawn in might have been subsetted, meaning it might not support the replacement text. And even if it does, you’d need to extract all the font objects for the placeholder font and figure out how to encode the new text (because, again, the PDF spec allows all sorts of fonts and encodings).
  3. Laying out new text block. As @Misiu mentioned, it’s highly unlikely that the replacement text will have the same length as the placeholder text. This means that you’d need to handle laying out the text already present on the document, not just the placeholders. And not necessarily just the sentence of paragraph to which the placeholders belonged. If the replacement text it long enough, it might require other paragraphs to be relaid out. And what happens if you end up exceeding the page length? And this is assuming your dealing with simple paragraphs of text. Many PDF documents have all sorts of fancy images and complicated layouts that would be extremely difficult to identify and handle automatically.

There are some shortcuts that could be taken if we placed some restrictions on the feature. For example, we could make (1) much easier if we required the placeholder text to be tagged with marked content operators (see section 14.6 MarkedContent of the PDF spec). But this would require the placeholders to be created in a special way, so it wouldn’t be able to identify arbitrary strings of text like {{foo}}.

We could make (2) much easier as well, if we didn’t try to automatically extract and reuse the font that the placeholders were drawn in. This step would be fairly straightforward if we required you to embed/provide your own font, just like you’d do for PDFPage.drawText.

But as for (3), I’m not too sure what could be done to simplify this. I’m open to ideas though! I’m sure other PDF libraries (such as iText or PDFBox) support text extraction and replacement in some form/fashion. So it’d be interesting to see how they handle this part.

1reaction
DaveLocommented, Dec 27, 2019

@Hopding , I’m interested in this functionality for use in variable data printing, at my company we send a customized instruction booklet to customers.

Our current solution converts HTML to PDF , but we are reaching the point where development is constraining our design team since every change means a lengthy rebuild.

Using this library there are places where I can easily put dynamic objects in blank areas (images, barcodes, etc), but other places where having string interpolation would be hugely helpful (Hello, {{name}} welcome to {{service}})

Read more comments on GitHub >

github_iconTop Results From Across the Web

Word: Mail Merge with PDF Attachments - Office Bytes
Sample Letter Word Document: this is your file that you would like to turn into a merged PDF for recipients. This is a...
Read more >
Word Mail Merge to Separate PDFs with Custom File Names ...
Convert your Word Mail Merge into separate PDF documents! No thid-party plug-in, no complexity, and no need to go moving and renaming ...
Read more >
SW31: MS Word- Mail Merge to PDF - University of Aberdeen
Create individual PDF files from a Word mail merge​​ Open Word, then click the File tab. 2. Click Open to browse to your...
Read more >
Data sources you can use for a mail merge - Microsoft Support
Excel spreadsheet An Excel spreadsheet works well as a data source for mail merge if all data is well-formatted and on one sheet...
Read more >
Mail Merge to Separate PDFs with Custom File ... - YouTube
UPDATE! Mac code now available on website! ***We tell you the best way of converting your Mail Merge into separate PDF documents so...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found