[Sitemaps] Sitemap index: stop URL at closing </loc>
See original GitHub issue(cf. Nutch mailing list
With #153 the sitemaps SAX parser handles sitemaps with missing or not properly closed <url> elements. This should be also done for sitemap indexes, e.g.:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex>
<sitemap>
<loc>https://www.example.orgl/sitemap1.xml</loc>
<loc>https://www.example.org/sitemap2.xml</loc>
</sitemap>
</sitemapindex>
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Protocol - sitemaps.org
Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag. Include a <sitemap> entry for each Sitemap as a parent...
Read more >XML Sitemap: the ultimate reference guide | ContentKing
Every URL definition needs to contain at least the loc -tag (short for location). ... This XML Sitemap Index references two XML Sitemaps: ......
Read more >Fixing Sitemap Errors For Better Indexing of Submitted URLs
It normally means that Google can't find one or several of your sitemaps at designated locations because you used incomplete URLs. All the...
Read more >Google Search Console not finding all Sitemaps within ...
The only solution I can think of is to do delete the sitemap index then re-add again after a week. Ive had Panda...
Read more >Manage your sitemaps using the Sitemaps report - Google Help
If you want Google to stop visiting the URLs listed in a sitemap you will need to use a robots.txt rule. To delete...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @sebastian-nagel - got it, there are two
<loc>
elements in a row. I thought that was valid, due to howSiteMapIndex
has an list ofSiteMap
records, each of which has a list ofSiteMapURL
records. Seems like we’re abusing theSiteMap
class in this situation, we really want another concrete version ofAbstractSiteMap
that only has a single URL.The sitemap index given above is invalid, it does not follow the spec. However, as the parser is designed to parse also invalid sitemaps it should be able to parse also invalid sitemap indexes and silently try to fix their structure by assuming
</sitemap><sitemap>
between the two<loc>
elements. The SiteMapHandler class also auto-closes<url>
elements. I’ll open a PR to address it.