Ampersand in title causes parser to fail
See original GitHub issueThe parser will allow an ampersand in the title tag but fails in some instances. The following will cause it to treat all HTML that follows as plain text:
String htmlCode = "<html><head><title>&X</title></head</html>";
System.out.println(page.getTitleText());
prints &x</title></head</html>
instead of &x
.
The full code to reproduce:
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.StringWebResponse;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HTMLParser;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
public class HtmlUnitTest {
public static void main(String[] args) {
String htmlCode = "<html><head><title>&x</title></head</html>";
try (WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
StringWebResponse response = new StringWebResponse(htmlCode, new URL("http://http://htmlunit.sourceforge.net//test.html"));
HtmlPage page = HTMLParser.parseHtml(response, webClient.getCurrentWindow());
System.out.println(page.getTitleText());
// work with the html page
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Beware of the ampersand when using XML | TechRepublic
Using characters such as ampersands and greater-than can cause your XML parser to fail, even though the data appears correct. In this article, ......
Read more >NEKO parsing error caused by ampersand - JBoss.org
I have a question regarding the usage of the Neko parser and following error message: Parse Error: XML Parsing Error: A semi colon...
Read more >How does ampersand work in XML with examples? - eduCBA
XML Ampersand is defined as a Special character and we must escape it within an XML document without causing Parsing error. They could...
Read more >How to solve Ampersand (&) conversion issue in XML?
The ampersand character can be tricky to construct in an XQuery string, as it is an escape character to the XQuery parser.
Read more >ASP>Net4.5 C#: Ampersand Parsing Error - MSDN - Microsoft
Hi Jubbs,. A literal ampersand inside an XML tag is not allowed by the XML standard, and such a document will fail to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi Jeff, thanks for the reminder. Looks like i will not find the time to make the big move and change the parser. Instead of this i will fix it in Neko.
Hope you can be a bit patient about this. I like to switch away from Neko to https://about.validator.nu/htmlparser/. Hope this step will solve this problem also. There is still some work to do and i also have some business task during the next weeks, So please be a bit patient…