Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Workers not releasing idle S3 connections to the pool when reading Avro files

See original GitHub issue

Presto 343 - 1 coordinator + 4 workers hive metastore+postgresql - avro files on s3 We’ve modified the following hive.properties

hive.s3.max-connections = 10000
hive.s3select-pushdown.max-connections = 10000
hive.s3.connect-timeout=3m
hive.s3.socket-timeout=3m

We see the connection pool progressively use all connections, never releasing them (after several days). Eventually the pool is depleted and no further work with s3 files is possible until the node service is restarted.

Issue Analytics

State:
Created 3 years ago
Comments:16 (14 by maintainers)

Top GitHub Comments

2reactions

losipiukcommented, Nov 17, 2020

Btw I validated that cherry-picking HIVE-22981 fixes the issue. So big thanks @rdsr for pointing towards that issue.

2reactions

losipiukcommented, Nov 17, 2020

In 341 we bumped Hive library from 3.0.6 to 3.1.2. I expect that is the reason for regression. I will see if we can backport the HIVE-22981 fix.

Top Results From Across the Web

Reading in-memory Avro file from S3: 'AttributeError:'

I'm trying to read Avro files stored in S3 by a vendor and write to a DW. See code below. (Was roughly working...

Using Avro Data Files | CDP Private Cloud

If you load new data into an Avro table through Hive, either through a Hive LOAD DATA or INSERT statement, or by manually...

Kafka Broker Configurations for Confluent Platform

If not explicitly configured, the default value will be null and there will be no dedicated endpoints for controller connections.If explicitly configured ...

Configuration reference - Apache Druid

A recommended way of organizing Druid configuration files can be seen in the conf ... The timeout for idle connections in connection pool....

SageMaker — Boto3 Docs 1.26.26 documentation

SageMaker does not split the files any further for model training. ... Read input data from an S3 bucket; Write model artifacts to...