question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I’m thinking about what kind of functionality this library should provide before it should be released as v1. I might edit the list in the future:

My goals:

  • (#25) Make sure it’s reliable and crawl more than 10 million pages with it (so far the maximum I crawled was ~800k pages)
  • (#9) Improve sameDomainDelay and skipDuplicateUrls. Detection of domains should use TLD.js for example. Documentation should be better. And there should be a way to provide the URL without using data or { url: … } Not a goal for 1.0 anymore
  • (#28) Optimize the code, fix code smells
  • More tests, get code coverage up to > 90%
  • More documentation on the concurrency types. Maybe make CONCURRENCY_BROWSER the default as it is more robust?
  • More code snippets in the documentation page (for Cluster.queue for example)
  • Provide a cluster.execute function which executes the job ~with higher priority (does not queue it at the end)~ and returns a Promise which is resolved when the job is finished. Might also solve this confusion: https://github.com/thomasdondorf/puppeteer-cluster/issues/10#issuecomment-419324832
  • Statistics API: How many jobs in queue, how many jobs processes, etc.
  • #41 Offer more functionality, maybe provide a way to use puppeteer-extra?
  • #36 Sandbox Offer a way to run code from users in a sandbox, maybe even Docker? => This can now be implemented via custom concurrency implementations (although there are now custom implementations right now)
  • #70 Improve types

Maybe:

  • Provide a simple but robust data store with the library
  • Rename API: Some parts of API are rather unfortunate
    • concurrency should be concurrencyType
    • maxConcurrency maybe maxWorkers?
  • Provide queue function to the task function for a more functional syntax (so that you don’t need to access cluster from inside the task

Not planned (for now):

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:12
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
ermolaev1337commented, Jul 23, 2019

Is there a way to connect the puppeteer-cluster to a remote instance of chromium? (“connect” instead of “launch”)

2reactions
generic11commented, Oct 15, 2020

Hello - just wanted to get a feel for how active this project is. I see puppeteer cluster as being useful for several projects I’d like to work on. However, I’m hesitant to use it if development will be abandoned. Is development still happening? Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Roadmap v1.0 - GitHub
An accessible browser compatible javascript command palette - Roadmap v1.0 · asabaylus/react-command-palette.
Read more >
Roadmap to v1.0 - Get Help and Help Others - RedwoodJS ...
Defining 1.0 is hard. It's an organizational feat, a bit of a balancing act, and to top it all off, it's ongoing. Certain...
Read more >
FHIR Roadmap for TEFCA Exchange
Stage 1: Common Agreement V1.0 – FHIR Content Exchange . ... three-year FHIR Roadmap to help align and accelerate FHIR adoption across the ......
Read more >
Early Access Roadmap - v1.0 - The Anacrusis
Early Access Roadmap - v1.0. January 6, 2022. As we enter Early Access/Game Preview, we wanted to let you know what our immediate...
Read more >
European Training Calendar - Roadmap v1.0 - SALTO-YOUTH
Roadmap v1.0. Training Course. 2-9 August 2018 | Eskisehir, Türkiye. To identify & develop youth entrepreneurial skills by focusing on topics such as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found