question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Navigating to javascript form using HtmlUnit

See original GitHub issue

Note: This information is duplicated on https://stackoverflow.com/questions/69849910/navigating-to-javascript-form-using-htmlunit. Feel free to request any additional information there.

Overview

I have successfully been using HtmlUnit to navigate BoardGameGeek and execute tasks (e.g. send GeekMail). Recently they changed their login from a normal webpage to a javascript-generated form, and now I just can’t seem to access the login form using HtmlUnit no matter what I try, including:

  • Adding a WebWindowListener.
  • Waiting for javascript to complete with webClient.waitForBackgroundJavaScript(JAVASCRIPT_PAUSE) or webClient.waitForBackgroundJavaScriptStartingBefore(JAVASCRIPT_PAUSE).
  • Listing all windows (0) and forms (1).
  • Printing XML for the current HtmlPage and all HtmlForms.

Of course I want a general understanding beyond this specific webpage / javascript code, but I give this specific concrete example since the other StackOverflow questions I looked at didn’t help in this particular situation (so maybe there is something unique here?).

Steps to Reproduce

  1. Use your browser to navigate to https://boardgamegeek.com/geekmail/compose?touser=FakeUserName (any name will suffice).
  2. As long as you are not logged in to BoardGameGeek, you will then see a pop-up window titled “Sign in” with text inputs “Username” and “Password”.
  3. If you View Page Source, you will see that the form is generated by javascript (https://cf.geekdo-static.com/frontend/main-es2015.aeff0e4f13bcecc7eb55.js or https://cf.geekdo-static.com/frontend/main-es5.aeff0e4f13bcecc7eb55.js). As far as I can tell, the created form has no name / id that I can use to access it. Even if it did, I do not seem to be able to view it within HtmlUnit (i.e. “Sign up” never appears when I print XML, whether for HtmlPage or HtmlForm).

Existing code

Here is my current version of the code with multiple attempts made to diagnose the problem / extract some useful information:

import com.gargoylesoftware.htmlunit.WebClient;
...
import java.util.LinkedList;

public class GeekMailSender {
    ...
    
    // Static variable to track all website windows.
    private final static LinkedList<WebWindow> websiteWindows = new LinkedList<WebWindow>();
    
    // Inner-class to listen for new (i.e. pop-up) windows.
    static class GeekMailWindowListener implements WebWindowListener {
        public void webWindowClosed(WebWindowEvent event) {}
        public void webWindowContentChanged(WebWindowEvent event) {}
        public void webWindowOpened(WebWindowEvent event) {
            GeekMailSender.websiteWindows.add(event.getWebWindow());
        }
    }

    // Method to actually send GeekMail by navigating the BGG website.
    public static void sendGeekMail(...) {
        ...
        try (final WebClient webClient = new WebClient()) {
            // Track creation of new (i.e. pop-up) windows.
            websiteWindows.clear();
            webClient.addWebWindowListener(new GeekMailWindowListener());
            	
            // Try to access the GeekMail page.
            HtmlPage currentPage = webClient.getPage("https://boardgamegeek.com/geekmail/compose?touser=FakeUserName");
            String pageTitle = currentPage.getTitleText();
            System.out.println(pageTitle);  // BoardGameGeek
            
            // We may need to login first.
            if (!pageTitle.contains("GeekMail")) {
            	// Need to wait for javascript to complete, otherwise no forms are available.
            	webClient.waitForBackgroundJavaScriptStartingBefore(JAVASCRIPT_PAUSE);
            	// No difference if use webClient.waitForBackgroundJavaScript(JAVASCRIPT_PAUSE);

                // Unfortunately the only form found is the top-right Search form on BoardGameGeek.
                if (currentPage.getForms().isEmpty()) {
                        // This does NOT happen.
            		System.out.println("WARNING! No form found, even after waiting for javascript!");
            		return;
            	}

                // We don't find any windows at all... this confuses me.
            	if (websiteWindows.isEmpty()) {
                        // This does happen :(
            		System.out.println("WARNING! No windows found even after waiting for javascript!");
            	}

                // Additional printing does not reveal where the form is.
                // For instance, searching the XML for "Sign up" yields no results.
            	System.out.println(currentPage.asXml());
            	
            	// And printing the one form we can access reveals it is just the Search form.
            	System.out.println(currentPage.getForms().size());  // 1
            	final HtmlForm loginForm = currentPage.getForms().get(0);
            	System.out.println(loginForm.asXml());
                ...
            }
            ...
        }
        ...
    }

References

In trying to solve this, I have checked the following references (among many others):

Nevertheless, I seem unable to locate the desired form. Any help would be much appreciated!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
MatroidXcommented, Nov 6, 2021

Thanks! Became a sponsor to help support your work. Much appreciated 😃

0reactions
zakimatyicommented, Jul 11, 2022

Hi there, Unfortunately, I have the same problem, for me, the polyfills are loading because an es-5 version of the polyfill exists too which is not a module, but the main-es5.js won’t load with the following error: 2022-06-17 14:11:40,051 [,] ERROR c.g.h.WebConsole:262 - ERROR {“fileName”:“https://pxe.cz:443/en/main-es5.js","lineNumber”:3}. It contains 3 lines, and the most part of the code is in line 3. Maybe the line is too long for htmlunit. The full URL I’m trying access is: https://pxe.cz/en/derivatives-market/electricity I need the 2 tables loaded with angular ajax inside the <pxe-widget-futures> HTML element but they are showing only in browsers, htmlunit won’ load them. The code I used:

final WebClient webClient = new WebClient();
List<String> collectedAlerts = new ArrayList<String>();
webClient.setAlertHandler(new CollectingAlertHandler(collectedAlerts));
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setCssEnabled(true);
            webClient.getOptions().setRedirectEnabled(false);
            webClient.getOptions().setAppletEnabled(false);
            webClient.getOptions().setFetchPolyfillEnabled(true);
            webClient.getOptions().setJavaScriptEnabled(true);
            webClient.getOptions().setPopupBlockerEnabled(true);
            webClient.getOptions().setTimeout(9000);
            webClient.getOptions().setActiveXNative(false);
            webClient.getOptions().setUseInsecureSSL(true);
            webClient.getOptions().setThrowExceptionOnFailingStatusCode(true);
            webClient.getOptions().setThrowExceptionOnScriptError(true);
            webClient.getOptions().setPrintContentOnFailingStatusCode(true);
            HtmlPage page = webClient.getPage(pxeUrl);
            webClient.waitForBackgroundJavaScript(9000);

Using the latest (2.62) htmlunit version with java 11 Investigating further> it wasn’ the line length, I’ve managed to narrow down to the second line of the following 3 lines:

var t=o.document.createElement("script");
                   t.async=!0,
               t.src="https://www.googletagmanager.com/gtm.js?id="+a.N.gtmId,

What could be the problem? Thanks Tried with version 2.63 too, same error Anybody out there who can help me?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Navigating to javascript form using HtmlUnit - Stack Overflow
I have successfully been using HtmlUnit to navigate BoardGameGeek and execute tasks (e.g. send GeekMail). Recently they changed their login ...
Read more >
Introduction to HtmlUnit - Baeldung
In this article, we will introduce HtmlUnit, a tool that allows us to, simply put, interact with and test an HTML site programmatically,...
Read more >
Getting Started with HtmlUnit - SourceForge
Frequently we want to change values in a form and submit the form back to the server. The following example shows how you...
Read more >
How to use HtmlForm to submit and how to use jQuery #307
And after my experience, I think the button in the form without any click event bundled, they just using form action. Thus, how...
Read more >
HtmlUnit - For Integration Testing and Webcrawling
With HtmlUnit the test program can "crawl" through the HTML code section by ... is wired with JavaScript, and then confirm that the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found