Inconsistent results with simultaneous requests
See original GitHub issueI am using electron-pdf v1.3.0 on an Ubuntu image running as a VirtualBox on a SLES11 Linux server to convert thousands of HTML pages to PDFs in succession. I cannot alter the installation of the SLES11 box to run electron, which is why electron is running in the VirtualBox.
The basic approach is to have a main electron process read through the data to determine the html pages that should be converted to PDFs, then have all of these requests sent through child processes using the electron-workers@1.10.3 module (to group the requests in pools which reduces the effort on the HTML server) which will eventually execute the following code:
const jobOptions = {
inMemory: false
}
const options = {
//customCss: "/opt/convo-viewer/public/css/style.css",
pageSize: "A4",
printBackground: true,
disableCache: true,
outputWait: 60
}
var exporter = new electronPDF({
resilient: false
});
function createJob(exporter,srcreq,targetreq, options, jobOptions) {
logger.LogWithConsole.info('creatingJob to convert '+srcreq);
return exporter.createJob(srcreq,targetreq, options, jobOptions).then( job => {
return new Promise.Promise(function(resolve,reject) {
try {
var t1 = moment(new Date());
job.on('window.capture.end', (data, err) => {
if (err) {
logger.LogWithConsole.error(srcreq+' had window.capture.end received with err '+err);
jobsFailed++;
resolve(srcreq+' had window.capture.end error->'+err)
} else {
logger.LogWithConsole.info('window.capture.end received with no err');
}
})
job.on('window.termination', (windowContext) => {
logger.LogWithConsole.error('job '+path.basename(srcreq)+' ended early, try longer ELECTRONPDF_THRESHOLD')
jobsFailed++;
resolve(srcreq+'window.termination')
})
job.on('export-complete', (data) => {
logger.LogWithConsole.info('export-complete received with data='+data);
})
job.on('job-complete', (r) => {
var t2 = moment(new Date());
var diff = t2.diff(t1, 'seconds');
var fsize = fs.statSync(targetreq).size/1024;
var kbsec = fsize/diff;
logger.LogWithConsole.info('Completed job for '+path.basename(srcreq)+' in '+diff+' seconds for a '+fsize.toFixed(2)+'KB file or '+kbsec.toFixed(2)+' KB/Sec');
// -- console.log(r)
jobsCompleted++;
resolve();
})
job.render()
} catch (err) {
logger.LogWithConsole.error('Exception during processing: '+err);
jobsFailed++;
resolve(err.message);
}
})
})
}
where srcreq is the url to the HTML server and targetreq is the path to the output PDF. The worker threads that make the requests are configurable, but in these runs were set to 4, one for each available processor.
The run completes for more than 20,000 pages in a little under 4 hours. When I look at the logs I do see this error quite often:
“[ERROR] a window was left in the cache that was already destroyed, do proper cleanup”
but other than that no other logged errors. However, when I manually look at the PDFs I randomly see two major problems:
- Sometimes text inside the PDF is missing. General format is OK but some of the fields come up blank.
- The PDF is printed without any style from the .css. The data is all there but formatting is missing.
Now of course when I redo another run or try the specific URL manually it works fine. So my question is: is there any known problems or concerns with running multiple PDF requests simultaneously from child electron processes???
Unfortunately I cannot provide any more concrete examples because of the randomness of the problem.
For problem #2 you may notice in the code snip-it I provided I commented out the “customCss” option thinking that maybe there was contention for the custom file from the possible simultaneous requests. I am assuming the custom file is read with each request So far I have not seen that this helps or hurts the problem.
Any insight you can provide will be greatly appreciated
Issue Analytics
- State:
- Created 5 years ago
- Comments:8
Top GitHub Comments
I imbedded a dispatchEvent(new Event(‘view-ready’)) into the HTML page and I do not see the problem any more. With my limited javascript knowledge, I am assuming there is a specific view-ready event listener for each document so if there are two or more processes concurrently requesting different pages the view-ready event will only trigger the correct window to be converted to PDF.
Thanks for all your help!!!
The wait is a parameter that you pass in the options for each request, you can set it however you want.
There is no way for electron to magically know that the page is loaded. You could alternatively consider using Puppeteer, it has the ability to wait based on network idleness (https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagegotourl-options)
When electron loads the content from your webserver it is inside of an Electron BrowserWindow, and you can emit events on the Document that way. This is explained in the readme.
I would not say there is a timeout. If a PDF is not rendering as you expect it means the page was captured before the browser was done doing it’s thing; this does not mean there was a timeout.
Electron-pdf has no way of knowing the state of the page, unless you code into your application an event saying it’s done. This is how I implement my export process, and it’s the only way to get the optimal performance.