question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ampersand in title causes parser to fail

See original GitHub issue

The parser will allow an ampersand in the title tag but fails in some instances. The following will cause it to treat all HTML that follows as plain text:

String htmlCode = "<html><head><title>&X</title></head</html>";

System.out.println(page.getTitleText()); prints &x</title></head</html> instead of &x.

The full code to reproduce:

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.StringWebResponse;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HTMLParser;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;

public class HtmlUnitTest {
	public static void main(String[] args) {
		String htmlCode = "<html><head><title>&x</title></head</html>";
		try (WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
			StringWebResponse response = new StringWebResponse(htmlCode, new URL("http://http://htmlunit.sourceforge.net//test.html"));
			HtmlPage page = HTMLParser.parseHtml(response, webClient.getCurrentWindow());
			System.out.println(page.getTitleText());
			// work with the html page
		} catch (MalformedURLException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
}

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rbricommented, Jan 11, 2020

Hi Jeff, thanks for the reminder. Looks like i will not find the time to make the big move and change the parser. Instead of this i will fix it in Neko.

1reaction
rbricommented, Sep 17, 2019

Hope you can be a bit patient about this. I like to switch away from Neko to https://about.validator.nu/htmlparser/. Hope this step will solve this problem also. There is still some work to do and i also have some business task during the next weeks, So please be a bit patient…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Beware of the ampersand when using XML | TechRepublic
Using characters such as ampersands and greater-than can cause your XML parser to fail, even though the data appears correct. In this article, ......
Read more >
NEKO parsing error caused by ampersand - JBoss.org
I have a question regarding the usage of the Neko parser and following error message: Parse Error: XML Parsing Error: A semi colon...
Read more >
How does ampersand work in XML with examples? - eduCBA
XML Ampersand is defined as a Special character and we must escape it within an XML document without causing Parsing error. They could...
Read more >
How to solve Ampersand (&) conversion issue in XML?
The ampersand character can be tricky to construct in an XQuery string, as it is an escape character to the XQuery parser.
Read more >
ASP>Net4.5 C#: Ampersand Parsing Error - MSDN - Microsoft
Hi Jubbs,. A literal ampersand inside an XML tag is not allowed by the XML standard, and such a document will fail to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found