question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Misinformation of getByXPath method

See original GitHub issue

JavaDoc states: “Evaluates the specified XPath expression from this node, returning the matching elements.”

That implies, that with given html

<html>
<body>
    <h1>Wrong</h1>
    <div>
        <h1>Right</h1>
    </div>
<body>
</html>

and selected div node, in order to select child h1 node we need to pass xpath as //h1. But that’s not the case, we need to select current node first with a dot selector, so correct xpath is .//h1. While it is proper xpath, I’d argue, that JavaDoc implies, that the node is already selected. It’s specially confusing, if you print node as xml and try to validate your xpath via third party tools.

It’s a bit against common sense, that selected node does not traverse from it’s location. I do not expect change in the code, but more specific JavaDoc would be definitely helpful.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rbricommented, Jul 9, 2019

Please have a look at the commit, hope the updated docu is a bit more clear. Many thanks for the report and the discussion of all the details. Enjoy using HtmlUnit

0reactions
TomasTokaMrazekcommented, Jul 7, 2019

Apologies for delay, I was on vacation.

@DSantiagoBC

public void getByXPathSelectedNode() throws Exception {
        WebClient client = new WebClient();

        final String htmlContent = "<html>\n"
                + "  <head>\n"
                + "    <title>my title</title>\n"
                + "  </head>"
                + "  <body>\n"
                + "    <h1>Heading!</h1>\n"
                + "    <div id='d1'>\n"
                + "      <h1 id='h1'>HtmlUnit</h1>\n"
                + "    </div>\n"
                + "  </body>\n"
                + "</html>";

        StringWebResponse response = new StringWebResponse(htmlContent,
                new URL("http://htmlunit.sourceforge.net//test.html"));

        HtmlPage page = HTMLParser.parseHtml(response, client.getCurrentWindow());

        final HtmlDivision divNode = (HtmlDivision) page.getElementById("d1");

        log.debug("Xpath: {}", divNode.getByXPath("//h1").get(0));
        log.debug("Xpath: {}", divNode.getByXPath(".//h1").get(0));

        client.close();
    }

Result:

19:16:12.149 [main] [DEBUG] cz.jaktoviditoka.investmentportfolio.model.HtmlUnitTest - Xpath: HtmlHeading1[<h1>]
19:16:12.149 [main] [DEBUG] cz.jaktoviditoka.investmentportfolio.model.HtmlUnitTest - Xpath: HtmlHeading1[<h1 id="h1">]

@mguillem I don’t see a single usecase, where you want to select child object from html page and call xpath traverse method, which searches the whole page including parents. Why exactly would I select some div node from HtmlPage and then call getByXPath on this div in order to search whole HtmlPage? From Java OOP standpoint, I should call the method on the original HtmlPage object. Java is not command line.

I personally think, that the dot in xpath should be implicit, not explicit. But as I said that would bring compatibility issues, so I’ll settle with better docs. I literally spent hours trying to figure out, why getByXPath on child node searches whole tree. But that might be due to my WSO2 EI background, where the dot is basically never used.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Basic use of getByXpath in HtmlUnit - Stack Overflow
Once we have a reference to an HtmlPage we can search for a specific HtmlElement using one of get methods or XPath ....
Read more >
com.gargoylesoftware.htmlunit.html.HtmlPage.getByXPath ...
getByXPath. method. in. com.gargoylesoftware.htmlunit.html.HtmlPage. Best Java code snippets using ...
Read more >
Using HtmlUnit on .NET for Headless Browser Automation
Rails developers (and to a lesser extent Java web developers) commonly use yet another test automation technique: hosting the app in a real...
Read more >
Evaluating tools and techniques for web scraping - DiVA Portal
If the property lies in the element itself, @ can be used to indicate that. The command //book[@style = "paperback"] will extract all...
Read more >
Chapter 12. Presentation-layer testing - liveBook · Manning
If debugging is the process of removing software bugs, then programming must be ... Tgx zfsf enk xl rvw omsdhet re ntb ns...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found