Roadmap for v1.0

See original GitHub issue

I’m thinking about what kind of functionality this library should provide before it should be released as v1. I might edit the list in the future:

My goals:

(#25) Make sure it’s reliable and crawl more than 10 million pages with it (so far the maximum I crawled was ~800k pages)
(#9) Improve sameDomainDelay and skipDuplicateUrls. Detection of domains should use TLD.js for example. Documentation should be better. And there should be a way to provide the URL without using data or { url: … } Not a goal for 1.0 anymore
(#28) Optimize the code, fix code smells
More tests, get code coverage up to > 90%
More documentation on the concurrency types. Maybe make CONCURRENCY_BROWSER the default as it is more robust?
More code snippets in the documentation page (for Cluster.queue for example)
Provide a cluster.execute function which executes the job ~with higher priority (does not queue it at the end)~ and returns a Promise which is resolved when the job is finished. Might also solve this confusion: https://github.com/thomasdondorf/puppeteer-cluster/issues/10#issuecomment-419324832
Statistics API: How many jobs in queue, how many jobs processes, etc.
#41 Offer more functionality, maybe provide a way to use puppeteer-extra?
#36 ~~Sandbox~~ Offer a way to run code from users in a sandbox, maybe even Docker? => This can now be implemented via custom concurrency implementations (although there are now custom implementations right now)
#70 Improve types

Maybe:

Provide a simple but robust data store with the library
Rename API: Some parts of API are rather unfortunate
- concurrency should be concurrencyType
- maxConcurrency maybe maxWorkers?
Provide queue function to the task function for a more functional syntax (so that you don’t need to access cluster from inside the task

Not planned (for now):

~~https://github.com/thomasdondorf/puppeteer-cluster/issues/8#issuecomment-421307994 Mixed concurrency models~~
- Reason: It does not work well together with the idea of having a sandbox (which part of the browser/page/context stuff should be sandboxed then)

Issue Analytics

State:
Created 5 years ago
Reactions:12
Comments:10 (2 by maintainers)

Top GitHub Comments

4reactions

ermolaev1337commented, Jul 23, 2019

Is there a way to connect the puppeteer-cluster to a remote instance of chromium? (“connect” instead of “launch”)

2reactions

generic11commented, Oct 15, 2020

Hello - just wanted to get a feel for how active this project is. I see puppeteer cluster as being useful for several projects I’d like to work on. However, I’m hesitant to use it if development will be abandoned. Is development still happening? Thanks!

Top Results From Across the Web

Roadmap v1.0 - GitHub

An accessible browser compatible javascript command palette - Roadmap v1.0 · asabaylus/react-command-palette.

Roadmap to v1.0 - Get Help and Help Others - RedwoodJS ...

Defining 1.0 is hard. It's an organizational feat, a bit of a balancing act, and to top it all off, it's ongoing. Certain...

FHIR Roadmap for TEFCA Exchange

Stage 1: Common Agreement V1.0 – FHIR Content Exchange . ... three-year FHIR Roadmap to help align and accelerate FHIR adoption across the ......

Early Access Roadmap - v1.0 - The Anacrusis

Early Access Roadmap - v1.0. January 6, 2022. As we enter Early Access/Game Preview, we wanted to let you know what our immediate...

European Training Calendar - Roadmap v1.0 - SALTO-YOUTH

Roadmap v1.0. Training Course. 2-9 August 2018 | Eskisehir, Türkiye. To identify & develop youth entrepreneurial skills by focusing on topics such as...