question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Memory leak in a long running Spark Thrift Server with Delta Lake workloads

See original GitHub issue

Bug

Describe the problem

I’m facing memory leak in a long running Spark Thrift Server (STS) on Kubernetes if Delta Lake tables are used.

RSS (process) memory grows all the time while STS is being used to run workloads with Delta Lake tables. The growth correlates with the frequency of executing sql queries. After few hours, RSS reaches pod memory limit and kubelet kills the pod with 137 exit code and set OOMKilled to the pod status.

This default GCP graph demonstrates the issue: Screen Shot 2022-08-23 at 23 43 15 In VisualVM I see “used memory” growing trend for heap memory, but it doesn’t reach heap maximum. There is a graph snapshot made a few minutes before kubelet killed executor pod: Screen Shot 2022-08-23 at 23 44 19 Also there is a growing trend for metaspace: Screen Shot 2022-08-23 at 23 44 32

STS is launched on Kubernetes by executing sbin/start-thriftserver.sh on the pod with spark.master=k8s://https://kubernetes.default.svc. It runs specified number of executors with the following default memory configuration:

  • -Xmx=$SPARK_EXECUTOR_MEMORY
  • pod memory limit = spark.executor.memory + (spark.executor.memory * spark.executor.memoryOverheadFactor)

I have tried to increase memoryOverheadFactor from default 10% to 20%, 30% and 50%, but it didn’t solve the issue, because RSS just has more space to grow, so kubelet kills it just a bit later.

If I change Delta Lake to Parquet, then STS works fine for many hours without visible memory leak: Screen Shot 2022-08-21 at 11 41 43

The issue has been existing for more than a year, so it affects at least the following Spark+Delta bundles:

  • Spark 3.3.0 + Delta Lake 2.1.0rc1
  • Spark 3.2.X + Delta Lake 2.0.0
  • Spark 3.2.X + Delta Lake 1.2.1
  • Spark 3.2.X + Delta Lake 1.1.0

Steps to reproduce

  1. Start Spark Thrift Server in Kubernetes cluster by executing sbin/start-thriftserver.sh
  2. Continuously execute read/write/merge queries with Delta Lake tables
  3. Observe growing RSS and pod memory for executors:
    • ps o pid,rss -p $(pidof java)
    • kubectl top pod
    • other tools you have

Expected results

RSS memory should not continuously grow.

Further details

I have two STS instances running different queries, but both have the issue: STS_1

CREATE TEMPORARY VIEW ${source}
    USING JDBC
    ...
--
CREATE OR REPLACE TABLE ${table}
    USING DELTA
    ...
--
CREATE TABLE IF NOT EXISTS ${table}
    USING DELTA
    ...
--
MERGE INTO ${table}
    ...

STS_2

SELECT ... FROM ${table}
--
CREATE OR REPLACE TABLE ${table}
    USING DELTA
    ...

Environment information

  • Delta Lake version: 2.1.0rc1, 2.0.0, 1.2.1, 1.1.0
  • Spark version: 3.3.0, 3.2.2, 3.2.1, 3.2.0
  • Scala version: 2.12

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:31 (31 by maintainers)

github_iconTop GitHub Comments

1reaction
Kimahrimancommented, Sep 7, 2022

Yeah the zstd thing is specifically within how Parquet uses it, and only really applies to reading Parquet files

0reactions
dnskrcommented, Dec 6, 2022

The issue still exists on Spark 3.3.1 + Delta Lake 2.2.0

Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory Leak when running scheduled job using ... - GitHub
There is massive memory leak when running scheduled jobs using org.apache.spark.deploy.dotnet.DotnetRunner . As a result it causes Spark Driver ...
Read more >
Databricks Runtime 8.4 (Unsupported) - Azure - Microsoft Learn
Improved read query performance in certain workloads due to tuned checkpoints. Delta Lake now tunes how frequently it does enhanced checkpoints.
Read more >
How Disney+ Debugs Memory Leaks in Spark Streaming
A Step-by-step Guide for Debugging Memory Leaks in Spark Applications ... These applications run on the Databricks Runtime(DBR) environment ...
Read more >
Why is Spark not (commonly) used as a data warehouse?
Data warehouses aren't very useful when only people who can interface with Spark can run queries. Spark has the Thriftserver for this:.
Read more >
Thrift driver OutOfMemory when running multiple Hive queries ...
Try decreasing spark shuffle partition to 100 then 10 if possible. Try offheap (never tested but could help). Share.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found