Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Memory leak in a long running Spark Thrift Server with Delta Lake workloads

See original GitHub issue

Bug

Describe the problem

I’m facing memory leak in a long running Spark Thrift Server (STS) on Kubernetes if Delta Lake tables are used.

RSS (process) memory grows all the time while STS is being used to run workloads with Delta Lake tables. The growth correlates with the frequency of executing sql queries. After few hours, RSS reaches pod memory limit and kubelet kills the pod with 137 exit code and set OOMKilled to the pod status.

This default GCP graph demonstrates the issue: Screen Shot 2022-08-23 at 23 43 15 In VisualVM I see “used memory” growing trend for heap memory, but it doesn’t reach heap maximum. There is a graph snapshot made a few minutes before kubelet killed executor pod: Also there is a growing trend for metaspace:

STS is launched on Kubernetes by executing sbin/start-thriftserver.sh on the pod with spark.master=k8s://https://kubernetes.default.svc. It runs specified number of executors with the following default memory configuration:

-Xmx=$SPARK_EXECUTOR_MEMORY
pod memory limit = spark.executor.memory + (spark.executor.memory * spark.executor.memoryOverheadFactor)

I have tried to increase memoryOverheadFactor from default 10% to 20%, 30% and 50%, but it didn’t solve the issue, because RSS just has more space to grow, so kubelet kills it just a bit later.

If I change Delta Lake to Parquet, then STS works fine for many hours without visible memory leak: Screen Shot 2022-08-21 at 11 41 43

The issue has been existing for more than a year, so it affects at least the following Spark+Delta bundles:

Spark 3.3.0 + Delta Lake 2.1.0rc1
Spark 3.2.X + Delta Lake 2.0.0
Spark 3.2.X + Delta Lake 1.2.1
Spark 3.2.X + Delta Lake 1.1.0

Steps to reproduce

Start Spark Thrift Server in Kubernetes cluster by executing sbin/start-thriftserver.sh
Continuously execute read/write/merge queries with Delta Lake tables
Observe growing RSS and pod memory for executors:
- ps o pid,rss -p $(pidof java)
- kubectl top pod
- other tools you have

Expected results

RSS memory should not continuously grow.

Further details

I have two STS instances running different queries, but both have the issue: STS_1

CREATE TEMPORARY VIEW ${source}
    USING JDBC
    ...
--
CREATE OR REPLACE TABLE ${table}
    USING DELTA
    ...
--
CREATE TABLE IF NOT EXISTS ${table}
    USING DELTA
    ...
--
MERGE INTO ${table}
    ...

STS_2

SELECT ... FROM ${table}
--
CREATE OR REPLACE TABLE ${table}
    USING DELTA
    ...

Environment information

Delta Lake version: 2.1.0rc1, 2.0.0, 1.2.1, 1.1.0
Spark version: 3.3.0, 3.2.2, 3.2.1, 3.2.0
Scala version: 2.12

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
No. I cannot contribute a bug fix at this time.