Problems with "?" in robots.txt
See original GitHub issueIn https://www.welt.de/robots.txt
there are ?
containing entries like Disallow: /*?config
. Hence https://www.welt.de/test?config
should be allowed but it is not. Whereas entries like Disallow: /*.xmli
work properly and disallow https://www.welt.de/test.xmli
. After my investigation I figured out that ?
is the problematic character.
I use RobotstxtServer#allow("https://www.welt.de/test?config")
for testing.
Issue Analytics
- State:
- Created 6 years ago
- Comments:15 (2 by maintainers)
Top Results From Across the Web
6 Common Robots.txt Issues & And How To Fix Them
1. Robots.txt Not In The Root Directory · 2. Poor Use Of Wildcards · 3. Noindex In Robots.txt · 4. Blocked Scripts And...
Read more >14 Common Robots.txt Issues (and How to Avoid Them)
Robots.txt files inform search engine crawlers which pages or files the crawler can or can't request from your site. They also block user...
Read more >Robot.txt SEO: Best Practices, Common Problems & Solutions
A broken or missing robots.txt file can also cause search engine crawlers to miss important pages on your website. If you have a...
Read more >robots.txt is not valid - Chrome Developers
robots.txt is not valid · It can keep search engines from crawling public pages, causing your content to show up less often in...
Read more >How to Fix the Problems with Drupal's Default Robots.txt File
Go to http://www.yourDrupalsite.com/robots.txt and double-check that your changes are in effect. You may need to do a refresh on your browser to ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’ll keep this on the radar, and will add a unit test to crawler-commons’ robots.txt parser, just to make sure that it continues to work. Thanks!
Thank you.
On Thu, Mar 22, 2018 at 11:27 AM, Sebastian Nagel notifications@github.com wrote: