Navigating to javascript form using HtmlUnit
See original GitHub issueNote: This information is duplicated on https://stackoverflow.com/questions/69849910/navigating-to-javascript-form-using-htmlunit. Feel free to request any additional information there.
Overview
I have successfully been using HtmlUnit to navigate BoardGameGeek and execute tasks (e.g. send GeekMail). Recently they changed their login from a normal webpage to a javascript-generated form, and now I just can’t seem to access the login form using HtmlUnit no matter what I try, including:
- Adding a WebWindowListener.
- Waiting for javascript to complete with
webClient.waitForBackgroundJavaScript(JAVASCRIPT_PAUSE)
orwebClient.waitForBackgroundJavaScriptStartingBefore(JAVASCRIPT_PAUSE)
. - Listing all windows (0) and forms (1).
- Printing XML for the current HtmlPage and all HtmlForms.
Of course I want a general understanding beyond this specific webpage / javascript code, but I give this specific concrete example since the other StackOverflow questions I looked at didn’t help in this particular situation (so maybe there is something unique here?).
Steps to Reproduce
- Use your browser to navigate to https://boardgamegeek.com/geekmail/compose?touser=FakeUserName (any name will suffice).
- As long as you are not logged in to BoardGameGeek, you will then see a pop-up window titled “Sign in” with text inputs “Username” and “Password”.
- If you View Page Source, you will see that the form is generated by javascript (https://cf.geekdo-static.com/frontend/main-es2015.aeff0e4f13bcecc7eb55.js or https://cf.geekdo-static.com/frontend/main-es5.aeff0e4f13bcecc7eb55.js). As far as I can tell, the created form has no name / id that I can use to access it. Even if it did, I do not seem to be able to view it within HtmlUnit (i.e. “Sign up” never appears when I print XML, whether for HtmlPage or HtmlForm).
Existing code
Here is my current version of the code with multiple attempts made to diagnose the problem / extract some useful information:
import com.gargoylesoftware.htmlunit.WebClient;
...
import java.util.LinkedList;
public class GeekMailSender {
...
// Static variable to track all website windows.
private final static LinkedList<WebWindow> websiteWindows = new LinkedList<WebWindow>();
// Inner-class to listen for new (i.e. pop-up) windows.
static class GeekMailWindowListener implements WebWindowListener {
public void webWindowClosed(WebWindowEvent event) {}
public void webWindowContentChanged(WebWindowEvent event) {}
public void webWindowOpened(WebWindowEvent event) {
GeekMailSender.websiteWindows.add(event.getWebWindow());
}
}
// Method to actually send GeekMail by navigating the BGG website.
public static void sendGeekMail(...) {
...
try (final WebClient webClient = new WebClient()) {
// Track creation of new (i.e. pop-up) windows.
websiteWindows.clear();
webClient.addWebWindowListener(new GeekMailWindowListener());
// Try to access the GeekMail page.
HtmlPage currentPage = webClient.getPage("https://boardgamegeek.com/geekmail/compose?touser=FakeUserName");
String pageTitle = currentPage.getTitleText();
System.out.println(pageTitle); // BoardGameGeek
// We may need to login first.
if (!pageTitle.contains("GeekMail")) {
// Need to wait for javascript to complete, otherwise no forms are available.
webClient.waitForBackgroundJavaScriptStartingBefore(JAVASCRIPT_PAUSE);
// No difference if use webClient.waitForBackgroundJavaScript(JAVASCRIPT_PAUSE);
// Unfortunately the only form found is the top-right Search form on BoardGameGeek.
if (currentPage.getForms().isEmpty()) {
// This does NOT happen.
System.out.println("WARNING! No form found, even after waiting for javascript!");
return;
}
// We don't find any windows at all... this confuses me.
if (websiteWindows.isEmpty()) {
// This does happen :(
System.out.println("WARNING! No windows found even after waiting for javascript!");
}
// Additional printing does not reveal where the form is.
// For instance, searching the XML for "Sign up" yields no results.
System.out.println(currentPage.asXml());
// And printing the one form we can access reveals it is just the Search form.
System.out.println(currentPage.getForms().size()); // 1
final HtmlForm loginForm = currentPage.getForms().get(0);
System.out.println(loginForm.asXml());
...
}
...
}
...
}
References
In trying to solve this, I have checked the following references (among many others):
- https://htmlunit.sourceforge.io/gettingStarted.html
- https://stackoverflow.com/questions/41117026/htmlunit-cant-find-forms-on-website
- https://stackoverflow.com/questions/54528410/locating-a-pop-up-window-with-htmlunit
- https://sourceforge.net/p/htmlunit/mailman/message/20356348/
- https://htmlunit.sourceforge.io/apidocs/com/gargoylesoftware/htmlunit/WebWindowListener.html
Nevertheless, I seem unable to locate the desired form. Any help would be much appreciated!
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
Thanks! Became a sponsor to help support your work. Much appreciated 😃
Hi there, Unfortunately, I have the same problem, for me, the polyfills are loading because an es-5 version of the polyfill exists too which is not a module, but the main-es5.js won’t load with the following error: 2022-06-17 14:11:40,051 [,] ERROR c.g.h.WebConsole:262 - ERROR {“fileName”:“https://pxe.cz:443/en/main-es5.js","lineNumber”:3}. It contains 3 lines, and the most part of the code is in line 3. Maybe the line is too long for htmlunit. The full URL I’m trying access is: https://pxe.cz/en/derivatives-market/electricity I need the 2 tables loaded with angular ajax inside the <pxe-widget-futures> HTML element but they are showing only in browsers, htmlunit won’ load them. The code I used:
Using the latest (2.62) htmlunit version with java 11 Investigating further> it wasn’ the line length, I’ve managed to narrow down to the second line of the following 3 lines:
What could be the problem? Thanks Tried with version 2.63 too, same error Anybody out there who can help me?