[Feature Request]: Use PDF as a template, similar to mail merge in word
See original GitHub issueI’m looking for a client-side library that will allow me to create a preview of a document for a client.
Idea is to create a simple HTML form with couple of inputs and a preview button that will generate a pdf based on a template.
I’m aware that I can load an existing PDF and add an overlay to it (as shown in samples), but I’d like to replace text, for example, I’d like to replace {{name}}
with John
and {{surname}}
with Smith
.
I’ve searched over the issues and found https://github.com/Hopding/pdf-lib/issues/33 and https://github.com/Hopding/pdf-lib/issues/137, as I understand Your library doesn’t support reading the text, so please consider this as a feature request. With this one feature, Your library would be an ideal solution for client-side pdf manipulation.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:18
- Comments:14 (5 by maintainers)
Implementing this feature without using AcroForms presents three main challenges:
pdf-lib
to sift through all the content streams in a document and locate all the text drawing operators. This wouldn’t be too difficult to do. The challenging part is mapping the glyph IDs to unicode text. This would be a significant undertaking. The PDF specification defines a ridiculous number of ways to store fonts and encode text. Writing code to support all of them is entirely possible to do. It would just take a lot of time and effort. The final step in this process is to process all the unicode text and produce a list of words/sentences/paragraphs in the document. You might think this last step would be simple, but it is not. PDF does not store text in a structured format like HTML. It just says to draw characters at X/Y coordinates. So you’d need to convert these spatial coordinates to structured text.There are some shortcuts that could be taken if we placed some restrictions on the feature. For example, we could make (1) much easier if we required the placeholder text to be tagged with marked content operators (see section 14.6 MarkedContent of the PDF spec). But this would require the placeholders to be created in a special way, so it wouldn’t be able to identify arbitrary strings of text like
{{foo}}
.We could make (2) much easier as well, if we didn’t try to automatically extract and reuse the font that the placeholders were drawn in. This step would be fairly straightforward if we required you to embed/provide your own font, just like you’d do for
PDFPage.drawText
.But as for (3), I’m not too sure what could be done to simplify this. I’m open to ideas though! I’m sure other PDF libraries (such as iText or PDFBox) support text extraction and replacement in some form/fashion. So it’d be interesting to see how they handle this part.
@Hopding , I’m interested in this functionality for use in variable data printing, at my company we send a customized instruction booklet to customers.
Our current solution converts HTML to PDF , but we are reaching the point where development is constraining our design team since every change means a lengthy rebuild.
Using this library there are places where I can easily put dynamic objects in blank areas (images, barcodes, etc), but other places where having string interpolation would be hugely helpful (
Hello, {{name}} welcome to {{service}}
)