question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

java.util.concurrent.TimeoutException thrown at random netty read timeouts with RemoteWebDriver

See original GitHub issue

🐛 Bug Report

Netty at random times gets a read timeout at. This happens at different selenium commands ( for example: WebDriver.switchTo().defaultContent, WebElement.click, WebDriver.switchTo().window, WebElement.sendKeys, WebDriver.get, Alert.accept ) and at random in a quite small percentage chance (<1% test cases).

To Reproduce

I don’t have specific steps to reproduce. When our CI runs our test suite of thousands of tests run, about 10 fails at random due to this timeout. I could not reproduce by doing a simple long loop with a few commands on my development workstation.

Timeout details

This timeout always occurs at:

Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)

I could confirm that it took 3 minutes there, confirming that it is due to the default 3 minutes read timeout the selenium configures the netty with. But the commands that are timing outs would normally run very fast, much less than one second.

Trying the code below in a method called probably thousands times by my test suite, it failed entering the catch. But after it called again driver.switchTo().defaultContent() at the end of the code below it worked. So it seems that although the read timeout happens in netty, it still works normally afterwards.

try
{
driver.switchTo().defaultContent();
}
catch (TimeoutException e)
{
// this should never happen, but started happening at random after updating to selenium 4
// output information to help troubleshoot
System.err.println("TimeoutException thrown while trying to go to defaultContent (stack below). Trying again...");
e.printStackTrace();

try
{
Thread.sleep(5000);
}
catch (InterruptedException e1)
{
}

driver.switchTo().defaultContent();
}

In this case, the stack trace got by the e.printStackTrace() above was:

org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.0.0-beta-3', revision: '5d108f9a67'
System info: host: '51e5404d333b', ip: '172.18.0.7', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1127.19.1.el7.x86_64', java.version: '11.0.1'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [a5e3bf25-ba72-4023-b219-76406cf58660, switchToFrame {id=null}]
Capabilities {acceptInsecureCerts: true, browserName: firefox, browserVersion: 88.0, javascriptEnabled: true, moz:accessibilityChecks: false, moz:buildID: 20210415204500, moz:debuggerAddress: localhost:46562, moz:geckodriverVersion: 0.29.0, moz:headless: false, moz:processID: 9286, moz:profile: /tmp/rust_mozprofileQJRwQP, moz:shutdownTimeout: 60000, moz:useNonSpecCompliantPointerOrigin: false, moz:webdriverClick: true, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, platformVersion: 3.10.0-1127.19.1.el7.x86_64, rotatable: false, se:cdp: ws://172.18.0.3:4444/sessio..., se:cdpVersion: 85, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify}
Session ID: a5e3bf25-ba72-4023-b219-76406cf58660
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:71)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:103)
	at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
	at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:181)
	at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:39)
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:619)
	at org.openqa.selenium.remote.RemoteWebDriver$RemoteTargetLocator.defaultContent(RemoteWebDriver.java:1097)
	(...)
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
	at org.asynchttpclient.netty.NettyResponseFuture.get(NettyResponseFuture.java:206)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:65)
	... 38 more

Environment

OS: Docker containers inside a CentOS Browser: RemoteWebDriver using Firefox in selenium/standalone-firefox:4.0.0-beta-3-20210426 docker image. Also tried the selenium/standalone-firefox:4.0.0-beta-4-prerelease-20210527 docker image, but the same thing happened. Browser Driver version: RemoteWebDriver from selenium-java 4.0.0-beta-3 Language Bindings version: Java 4.0.0-beta-3 The RemoteWebDriver runs in a container that is running in the same docker host as the browser container. So all network between them is only logical in the same machine. Previously we were using Selenium 2.52, in the same docker host, and never happened anything similar to such timeout.

Do you have any tips about what I can try to fix it or investigate more about this?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:18
  • Comments:164 (50 by maintainers)

github_iconTop GitHub Comments

10reactions
diemolcommented, Jan 22, 2022

Thank you @Cybermaxke for the workaround. This has been included already by @pujagani through #10220.

We are planning a 4.1.2 release for next week (which includes this fix).

9reactions
diemolcommented, May 9, 2022

@JulienBreton I was checking the attached log and I am confused about how the test is sending those commands. I followed the command for session f24ae2d1ba0f0d398cdaa093ef912389

  • At 12:48:03.912, request in the Hub POST '/session/f24ae2d1ba0f0d398cdaa093ef912389/timeouts' which does not seem to get a response.
  • At 12:48:03.912, same request in the Node POST /session/f24ae2d1ba0f0d398cdaa093ef912389/timeouts HTTP/1.1 which does not seem to get a response.
  • At 12:48:03.931, request in the Hub DELETE /session/f24ae2d1ba0f0d398cdaa093ef912389 HTTP/1.1 which gets completed at 12:48:04.062
  • At 12:48:03.939, same request in the Node DELETE /session/f24ae2d1ba0f0d398cdaa093ef912389 HTTP/1.1 which gets completed at 12:48:03.995.
  • At 12:48:04.060 there is confirmation in the Hub logs that the session has been deleted

But then…

  • At 12:48:03.980, in the Hub there is a 3 minutes timeout after a request to http://172.18.0.4:5555/session/f24ae2d1ba0f0d398cdaa093ef912389/timeouts.
  • And later a few more requests to POST /session/f24ae2d1ba0f0d398cdaa093ef912389/timeouts HTTP/1.1

It seems our HTTP client is doing some extra retries we were not aware of (NettyRequestSender.retry) which I will deactivate in a moment. In this case, those automatic retries are happening after the session has been deleted. And yes, we need to switch to another HTTP client soon.

Deactivating those automatic retries should reduce or remove the amount of timeouts happening, and the let us rely on the configured session-timeout plus the retries that only happen between Hub and Node. I think we will do a new release in less than 1 week.

Read more comments on GitHub >

github_iconTop Results From Across the Web

selenium 4 parallel run exception org.openqa ... - Stack Overflow
TimeoutException. Save this question. Show activity on this post. It is occurring randomly on different actions(clic, findElement, switch) in ...
Read more >
TimeoutException (Java Platform SE 8 ) - Oracle Help Center
Exception thrown when a blocking operation times out. Blocking operations for which a timeout is specified need a means to indicate that the...
Read more >
java.util.concurrent.TimeoutException - Google Groups
ExecutionException: java.util.concurrent.TimeoutException: Request timeout to after 180000 ms. 33 views. Skip to first unread message.
Read more >
Selenium Java Changelog - GitHub
Changing default timeouts for HTTP clients: connect timeout is 10s, read timeout ... (#8736) * Include original stack trace when throwing TimeoutException.
Read more >
Selenium.Scripttimeoutexception: Java.Util.Concurrent ... - ADocLib
Scripttimeoutexception: Java.Util.Concurrent ... Timeouts is an interface for managing timeout ... Bug Report Netty at random times gets a read timeout at.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found