Roadmap for v1.0
See original GitHub issueI’m thinking about what kind of functionality this library should provide before it should be released as v1. I might edit the list in the future:
My goals:
- (#25) Make sure it’s reliable and crawl more than 10 million pages with it (so far the maximum I crawled was ~800k pages)
- (#9)
ImproveNot a goal for 1.0 anymoresameDomainDelay
andskipDuplicateUrls
. Detection of domains should use TLD.js for example. Documentation should be better. And there should be a way to provide the URL without using data or { url: … } - (#28) Optimize the code, fix code smells
- More tests, get code coverage up to > 90%
- More documentation on the concurrency types. Maybe make
CONCURRENCY_BROWSER
the default as it is more robust? - More code snippets in the documentation page (for
Cluster.queue
for example) - Provide a
cluster.execute
function which executes the job ~with higher priority (does not queue it at the end)~ and returns a Promise which is resolved when the job is finished. Might also solve this confusion: https://github.com/thomasdondorf/puppeteer-cluster/issues/10#issuecomment-419324832 - Statistics API: How many jobs in queue, how many jobs processes, etc.
- #41 Offer more functionality, maybe provide a way to use puppeteer-extra?
- #36
SandboxOffer a way to run code from users in a sandbox, maybe even Docker? => This can now be implemented via custom concurrency implementations (although there are now custom implementations right now) - #70 Improve types
Maybe:
- Provide a simple but robust data store with the library
- Rename API: Some parts of API are rather unfortunate
concurrency
should beconcurrencyType
maxConcurrency
maybemaxWorkers
?
- Provide queue function to the task function for a more functional syntax (so that you don’t need to access cluster from inside the task
Not planned (for now):
-
https://github.com/thomasdondorf/puppeteer-cluster/issues/8#issuecomment-421307994 Mixed concurrency models- Reason: It does not work well together with the idea of having a sandbox (which part of the browser/page/context stuff should be sandboxed then)
Issue Analytics
- State:
- Created 5 years ago
- Reactions:12
- Comments:10 (2 by maintainers)
Top Results From Across the Web
Roadmap v1.0 - GitHub
An accessible browser compatible javascript command palette - Roadmap v1.0 · asabaylus/react-command-palette.
Read more >Roadmap to v1.0 - Get Help and Help Others - RedwoodJS ...
Defining 1.0 is hard. It's an organizational feat, a bit of a balancing act, and to top it all off, it's ongoing. Certain...
Read more >FHIR Roadmap for TEFCA Exchange
Stage 1: Common Agreement V1.0 – FHIR Content Exchange . ... three-year FHIR Roadmap to help align and accelerate FHIR adoption across the ......
Read more >Early Access Roadmap - v1.0 - The Anacrusis
Early Access Roadmap - v1.0. January 6, 2022. As we enter Early Access/Game Preview, we wanted to let you know what our immediate...
Read more >European Training Calendar - Roadmap v1.0 - SALTO-YOUTH
Roadmap v1.0. Training Course. 2-9 August 2018 | Eskisehir, Türkiye. To identify & develop youth entrepreneurial skills by focusing on topics such as...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Is there a way to connect the puppeteer-cluster to a remote instance of chromium? (“connect” instead of “launch”)
Hello - just wanted to get a feel for how active this project is. I see puppeteer cluster as being useful for several projects I’d like to work on. However, I’m hesitant to use it if development will be abandoned. Is development still happening? Thanks!