question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Robots parser to always handle absolute sitemap URL even without valid base URL

See original GitHub issue

I get an Invalid URL with sitemap directive for a sitemap url which is not relative.

I noticed in the implementation here that it only handles relative URLs, which might not be the case always, as now it is preferred to have the absolute URL for the sitemap in robots.txt.

Therefore if there is a URL provided with the sitemap, we cannot access it.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
pr3marcommented, Apr 4, 2019

Happy to be of service, you helped me a lot to build my own parser! I’d be honoured if you took a look at it, it’s here.

0reactions
sebastian-nagelcommented, Apr 9, 2019

Nice project! And another one in the long list of crawler/web-crawler/spider projects.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How Google Interprets the robots.txt Specification
The [absoluteURL] line points to the location of a sitemap or sitemap index file. It must be a fully qualified URL, including the...
Read more >
1.1 released! - Google Groups
[Robots] Robots parser to always handle absolute sitemap URL even without valid base URL (pr3mar, kkrugler, sebastian-nagel) #240
Read more >
Robots.txt for SEO: The Ultimate Guide
Learn how to help search engines crawl your website more efficiently using the robots.txt file to achieve a better SEO performance.
Read more >
Can a relative sitemap url be used in a robots.txt?
@Shams: The URLs listed in your sitemap have to use the same protocol and the same host as the sitemap file. If your...
Read more >
Manage your sitemaps using the Sitemaps report - Google Help
If it is not, the test should show why Google can't reach or index the page (common reasons: a robots.txt rule; an incorrect...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found