Proper crawler setup
See original GitHub issueIf I want to crawl only *.example.com
(both http
and https
) and want to exclude images, js files, css files - what should my crawler setup look like?
I tried many combinations but I either get external sites crawled or only http
and not https
and looks like excluded urls are overwritten by included so I managed to keep crawler to stick with the domain more or less but can’t make it ignore unwanted files.
My setup looks like this:
Thank you!
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
How to configure your first crawler - Algolia
You can access the crawler's configuration through the Editor tab of the Crawler Admin. After selecting or creating a crawler, click on the...
Read more >RC Setup Guides for Rock Crawlers - So Dialed
Find setup tips and tuning guides for RC rock crawlers here at So Dialed, and find apps that help you keep score of...
Read more >Off-Road Outlaws: BEST Crawler Setup? Maxed Out Rock ...
Off-Road Outlaws: BEST Crawler Setup ? Maxed Out Rock Bouncer w/ Tracks + 4 Wheel Steer!
Read more >Setting crawler configuration options - AWS Glue
Learn about how to configure what a crawler does when it encounters schema changes and partition changes in your data store.
Read more >Tech Corner: A Beginner's Guide to Building a Rock Crawler ...
All it takes is a good amount of momentum behind a wheel and a sudden gain of traction for everything to go to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
1 line is 1 regex.
https?://.*\.example\.com/.*
It’s Java regex.