Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use pool of tabs

See original GitHub issue

This is more of a how to question than a issue. Lets assume the scenario of generating screenshots of webpages concurrently. My thought was to create a pool of tabs like below and utilize the workers to get screenshots.

//headless-page-pool
let CDP = require('chrome-remote-interface');
var genericPool:any = require('generic-pool');

const sandBoxFactory = {
    create:async function () {
        try {
            let tabMeta = await CDP.New({ remote : true });
            let client = await CDP({ tab : tabMeta})
            client._target = tabMeta;
            return client;
        } catch (e) {
            console.error(e);
        }
    },
    validate: function(client:any) {
        //TODO: Find a way to validate dev tools connection
    },
    destroy: async function (client:any){
        try {
            return await client.Target.closeTarget(client._target.id);
        } catch (e) {
            console.error(e);
        }
    }
}

var opts = {
    max: 10, // maximum size of the pool
    min: 3 // minimum size of the pool
}

let WorkerPool:any;

function getWorkerPool() {
    return WorkerPool;
}

function createWorkerPool() {
    console.log('creating pool');
    WorkerPool = genericPool.createPool(sandBoxFactory, opts);
    WorkerPool.on('factoryCreateError', function(err:any){
        console.error(err);
    });

    WorkerPool.on('factoryDestroyError', function(err:any){
        console.error(err);
    });
    return WorkerPool
}

function getWorkerFromPool() {
    return WorkerPool.acquire()
}

export { getWorkerPool, createWorkerPool, getWorkerFromPool };

This creates 3 tabs in chrome and in my request handler

import {createWorkerPool, getWorkerPool, getWorkerFromPool } from './headless-page-pool';
import fs = require('fs');
createWorkerPool();

export async function captureScreenshot(url:string, 
    options:any,    timeout=10000):Promise<any> {
    try {
        return new Promise(async function(resolve, reject) {
            let WorkerPool = getWorkerPool();
            let worker = await getWorkerFromPool();
            await worker.Page.enable();
            worker.Page.loadEventFired(async function() {
                console.log('page loaded');
                let result = await worker.Page.captureScreenshot(); //This line never resolves since the target is not active
                resolve(Buffer.from(result.data, 'base64'));
            });
            await worker.Page.navigate({ url });
            WorkerPool.release(worker);
        });
    } catch (error) {
        console.error(error);
    }
}

The problem is await worker.Page.captureScreenshot(); never resolves because the tab is not active. Just click on the tab containing the url in chrome it just resolves.

This can be workaround by calling

worker.Target.activateTarget({ targetId : worker._target.id})

After doing this I just get the blank screen as image unless I put a pause which gives the actual image. The bottom line is whatever I do there is no way to process multiple images simultaneously because only the active tab can process the image, So how do we use multiple tabs and process images in all the tabs concurrently. Is this a bug in chrome which does not allow to capture screenshots when the tab is not active or I am missing something.

Let me know if I am not clear.

Issue Analytics

State:
Created 6 years ago
Reactions:1
Comments:14 (6 by maintainers)

Top GitHub Comments

4reactions

cyrus-andcommented, Apr 12, 2017

It seems that you’re right, the tab must be exposed during the screenshot phase. As you say, activating the target on page load won’t work probably because another page load event steals the focus before the screenshot is completed thus producing empty/partial images.

Luckily though it’s the page load phase which is usually time consuming and that part can be fully parallelized by spawning multiple tabs/targets. Try this, instead of calling Page.captureScreenshot as soon as the page is loaded, simply enqueue the task to a common array (or use promises, see below), then when all the pages in the batch are finished start the serial activate-screenshot-repeat phase.

Here’s what I mean:

const fs = require('fs');

const CDP = require('chrome-remote-interface');

function loadForScrot(url) {
    return new Promise(async (fulfill, reject) => {
        const tab = await CDP.New();
        const client = await CDP({tab});
        const {Page} = client;
        Page.loadEventFired(() => {
            fulfill({client, tab});
        });
        await Page.enable();
        await Page.navigate({url});
    });
}

async function process(urls) {
    try {
        const handlers = await Promise.all(urls.map(loadForScrot));
        for (const {client, tab} of handlers) {
            const {Page} = client;
            await CDP.Activate({id: tab.id});
            const filename = `/tmp/scrot_${tab.id}.png`;
            const result = await Page.captureScreenshot();
            const image = Buffer.from(result.data, 'base64');
            fs.writeFileSync(filename, image);
            console.log(filename);
            await client.close();
        }
    } catch (err) {
        console.error(err);
    }
}

process(['http://example.com',
         'http://example.com',
         'http://example.com',
         'http://example.com',
         'http://example.com',
         'http://example.com',
         'http://example.com',
         'http://example.com']);

Please let me know if this can work for you.

1reaction

anandanand84commented, Apr 20, 2017

@pthieu I used client.Target.closeTarget(client._target.id); not sure if that is the right way but it does close the browser tab.