Embedded pages are uncompress
See original GitHub issueHi,
I’m migrating our pdf generation tool from HummusJS to pdf-lib.
I have to merge two pdf pages into a new pdf. To achieve that I embed the pages into a new pdf document. The size of the generated file is significantly increased compared to the original one. When I compress the generated pdf, the file size is closer than what I expect.
I did a test case to show the issue:
import {readFileSync, writeFileSync} from 'fs'
import {PDFDocument} from 'pdf-lib'
(async () => {
const pdfDoc = await PDFDocument.create();
const pdfSource = await PDFDocument.load(readFileSync('./Lorem ipsum dolor sit amet.pdf'));
const embeddedPage = await pdfDoc.embedPage(pdfSource.getPage(0));
const page = pdfDoc.addPage();
page.drawPage(embeddedPage);
writeFileSync('embedded.pdf', await pdfDoc.save());
})();
The source file is a simple pdf of 200 KB made with Word : Lorem ipsum dolor sit amet.pdf The file generated by this script is 1.5 MB. It’s 7.5x larger : embedded.pdf
I think that the LZW stream from the source file is uncompressed before being embedded into the new PDF file.
Am I correct?
EDIT: I inspected both PDF and I found a FlateDecode stream in the source pdf which is decoded in the destination pdf.

Regards,
Julien
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:5

Top Related StackOverflow Question
I also face a problem related to uncompressed embedded pages. In my scenario i want to apply a page background (letter paper) to every page of a content PDF. The only way i found to achieve this is to embed both the page background and all content pages to a new PDF and then draw them in the correct order on newly created pages.
So far, so good. But because the embedded pages are not compressed the final PDF has a size of approx 750 KB instead of 85 KB. Those values are for a 5 page PDF. If i do the same with 50 pages i end up with over 8 MB (with compression it’s down to 975 KB).
With the proposed change in the PR i end up with approx 125 KB for the 5 page PDF which is fine.
@Hopding Any chance this PR get’s accepted and released? Is there anything i can do to get this done?
I’m also seeing huge file size increases with embedded pages - my use-case is splitting an input pdf up into sections and rearranging the pages for bookbinding - with a 3mb/296 page input file, for example, I’m getting outputs of 12mb/16 page sections (each of which are made up of 32 pages of the original doc, arranged two-to-a-sheet)
EDIT: After doing some more digging it looks like part of my problem is embedded fonts, though the amount of space embedded fonts take up in the generated documents is still much higher than in the original (1.3mb versus 7mb+)
EDIT2: Even more digging, turns out some of my issue was user error - every time you call PDFDocument.embedPdf() it embeds a new copy of the fonts, and I was doing it for each page, instead of embedding all the pages in one call and then reorganizing them after - rewriting my logic knocked down my filesize significantly.
(Sorry for the extra notifs - I thought it was probably better to keep my findings here instead of just deleting my comment, in case someone else hits the same problem)