question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proper crawler setup

See original GitHub issue

If I want to crawl only *.example.com (both http and https) and want to exclude images, js files, css files - what should my crawler setup look like?

I tried many combinations but I either get external sites crawled or only http and not https and looks like excluded urls are overwritten by included so I managed to keep crawler to stick with the domain more or less but can’t make it ignore unwanted files.

My setup looks like this:

image

Thank you!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
marevolcommented, Feb 1, 2019

Exclude the unwanted filetypes - these files still get indexed

1 line is 1 regex.

0reactions
marevolcommented, Feb 2, 2019

https?://.*\.example\.com/.* It’s Java regex.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to configure your first crawler - Algolia
You can access the crawler's configuration through the Editor tab of the Crawler Admin. After selecting or creating a crawler, click on the...
Read more >
RC Setup Guides for Rock Crawlers - So Dialed
Find setup tips and tuning guides for RC rock crawlers here at So Dialed, and find apps that help you keep score of...
Read more >
Off-Road Outlaws: BEST Crawler Setup? Maxed Out Rock ...
Off-Road Outlaws: BEST Crawler Setup ? Maxed Out Rock Bouncer w/ Tracks + 4 Wheel Steer!
Read more >
Setting crawler configuration options - AWS Glue
Learn about how to configure what a crawler does when it encounters schema changes and partition changes in your data store.
Read more >
Tech Corner: A Beginner's Guide to Building a Rock Crawler ...
All it takes is a good amount of momentum behind a wheel and a sudden gain of traction for everything to go to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found