Connection leak
See original GitHub issueWhat kind an issue is this?
- Bug report.
Issue description
I have a long running job on spark that makes quite a lot of reading requests to ES.
There is definitely a connection leak related to HTTP client, because
lsof -p %sparkExecutorPID% | grep TCP | grep 9200 | sort -t '>' -k2 | wc -l
shows over 1000 connections after 1 day of work. However there is only one esRDD is used at any moment of time.
Eventually this leads to too many open files
error.
Log (for es-hadoop 6.2.1):
2018-10-29 03:28:13 INFO HttpMethodDirector: I/O exception (java.net.SocketException) caught when processing request: Too many open files
2018-10-29 03:28:13 INFO HttpMethodDirector: Retrying request
Version Info
OS: : Debian 8 JVM : java8 oracle Spark: 2.3.1 ES-Hadoop : tried elasticsearch-spark-20 versions 5.6.10 and 6.2.1. ES : 5.6.10
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Detecting and Resolving Database Connection Leaks with ...
Take a look at this tutorial that demonstrates how you can find and remove connection leaks between your application and database in Java...
Read more >Connection Leak - Metawerx Java Wiki
A situation which occurs when a connection is opened, then forgotten about. This is known as a "leak", because each time it happens,...
Read more >How to solve database connection leaks? - IBM
The database connection leaks should be identified and fixed in the code. It can be accomplished by using the dbconnection watchdog logger.
Read more >Find Connection leak in Java application - Stack Overflow
If you need to find out leaks you can use profilers like yourkit or jprofiler which is able to track socket/jdbc leaks.
Read more >The best way to detect database connection leaks
A connection leak happens when a connection is acquired without ever being closed. When should connection leaks be detected? Every relational ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ov7a thanks for the code to reproduce this – that was really helpful. I believe I have tracked down the problem. There is a connection leak as you suspected. The connections are not getting explicitly cleaned up, but are instead cleaned up when the garbage collector reclaims the Socket objects. If you’re running with a large heap and a small limit on open files this could definitely be a problem (I never saw the number of connections get above 500, but I didn’t have a huge heap). I believe that the problem is at https://github.com/elastic/elasticsearch-hadoop/blob/master/spark/core/src/main/scala/org/elasticsearch/spark/rdd/AbstractEsRDDIterator.scala#L54 where we create a RestRepository, perform a scroll, and then never close the RestRepository. The problem seems to go away when I fix that. I’ll get a PR up sometime soon.
This will be fixed in 7.16.0 and 8.0.0