question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Sitemaps] Sitemap index: stop URL at closing </loc>

See original GitHub issue

(cf. Nutch mailing list

With #153 the sitemaps SAX parser handles sitemaps with missing or not properly closed <url> elements. This should be also done for sitemap indexes, e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex>
<sitemap>
<loc>https://www.example.orgl/sitemap1.xml</loc>
<loc>https://www.example.org/sitemap2.xml</loc>
</sitemap>
</sitemapindex>

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
kkruglercommented, Dec 12, 2018

Hi @sebastian-nagel - got it, there are two <loc> elements in a row. I thought that was valid, due to how SiteMapIndex has an list of SiteMap records, each of which has a list of SiteMapURL records. Seems like we’re abusing the SiteMap class in this situation, we really want another concrete version of AbstractSiteMap that only has a single URL.

0reactions
sebastian-nagelcommented, Dec 13, 2018

The sitemap index given above is invalid, it does not follow the spec. However, as the parser is designed to parse also invalid sitemaps it should be able to parse also invalid sitemap indexes and silently try to fix their structure by assuming </sitemap><sitemap> between the two <loc> elements. The SiteMapHandler class also auto-closes <url> elements. I’ll open a PR to address it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Protocol - sitemaps.org
Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag. Include a <sitemap> entry for each Sitemap as a parent...
Read more >
XML Sitemap: the ultimate reference guide | ContentKing
Every URL definition needs to contain at least the loc -tag (short for location). ... This XML Sitemap Index references two XML Sitemaps: ......
Read more >
Fixing Sitemap Errors For Better Indexing of Submitted URLs
It normally means that Google can't find one or several of your sitemaps at designated locations because you used incomplete URLs. All the...
Read more >
Google Search Console not finding all Sitemaps within ...
The only solution I can think of is to do delete the sitemap index then re-add again after a week. Ive had Panda...
Read more >
Manage your sitemaps using the Sitemaps report - Google Help
If you want Google to stop visiting the URLs listed in a sitemap you will need to use a robots.txt rule. To delete...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found