question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Workers not releasing idle S3 connections to the pool when reading Avro files

See original GitHub issue

Presto 343 - 1 coordinator + 4 workers hive metastore+postgresql - avro files on s3 We’ve modified the following hive.properties

hive.s3.max-connections = 10000
hive.s3select-pushdown.max-connections = 10000
hive.s3.connect-timeout=3m
hive.s3.socket-timeout=3m

We see the connection pool progressively use all connections, never releasing them (after several days). Eventually the pool is depleted and no further work with s3 files is possible until the node service is restarted.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (14 by maintainers)

github_iconTop GitHub Comments

2reactions
losipiukcommented, Nov 17, 2020

Btw I validated that cherry-picking HIVE-22981 fixes the issue. So big thanks @rdsr for pointing towards that issue.

2reactions
losipiukcommented, Nov 17, 2020

In 341 we bumped Hive library from 3.0.6 to 3.1.2. I expect that is the reason for regression. I will see if we can backport the HIVE-22981 fix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reading in-memory Avro file from S3: 'AttributeError:'
I'm trying to read Avro files stored in S3 by a vendor and write to a DW. See code below. (Was roughly working...
Read more >
Using Avro Data Files | CDP Private Cloud
If you load new data into an Avro table through Hive, either through a Hive LOAD DATA or INSERT statement, or by manually...
Read more >
Kafka Broker Configurations for Confluent Platform
If not explicitly configured, the default value will be null and there will be no dedicated endpoints for controller connections.If explicitly configured ...
Read more >
Configuration reference - Apache Druid
A recommended way of organizing Druid configuration files can be seen in the conf ... The timeout for idle connections in connection pool....
Read more >
SageMaker — Boto3 Docs 1.26.26 documentation
SageMaker does not split the files any further for model training. ... Read input data from an S3 bucket; Write model artifacts to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found