question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

etcdClient.watch lead to memory usuage increasing all the time

See original GitHub issue

Env

etcd server: v3.4.0

below is one of the etcd cluster node, run it via docker container

b24afab33c18        quay.io/coreos/etcd:v3.4.0                            "/usr/local/bin/etcd…"   15 hours ago        Up 15 hours        0.0.0.0:2381->2381/tcp, 0.0.0.0:2482->2482/tcp, 2379-2380/tcp   etcd2

etcd client: 0.4.1

compile ("io.etcd:jetcd-core:0.4.1")

Problem phenomenon

the watch or close api seems has memory leak problem, so i used below scala codes to prove it. (I deployed a etcd cluster including three nodes, the 3rd node’s memory is increasing all the time during blow codes running, other nodes’s memory seems keep stable)

  def main(args: Array[String]): Unit = {
    val hostAndPorts = "xxx.xxx.xxx.xxx:2379,xxx.xxx.xxx.xxx:2380,xxx.xxx.xxx.xxx:2381"
    val addresses: List[URI] = hostAndPorts
      .split(",")
      .toList
      .map(hp => {
        val host :: port :: Nil = hp.split(":").toList
        URI.create(s"http://$host:$port")
      })
    val client = Client.builder().endpoints(addresses.asJava).build();
    val watchClient = client.getWatchClient

    for (count1 <- 1 to 100) {
      var watcherQueue = Queue.empty[Watcher]
      for (count2 <- 1 to 5000) {
        val key = s"namespace/${count1}/${count2}"
        val option = WatchOption
          .newBuilder()
          .withPrevKV(true)
          .withPrefix(key)
          .build()

        val watcher = watchClient.watch(key, option, Watch.listener((res: WatchResponse) => onKeyChange(res)))
        watcherQueue = watcherQueue.enqueue(watcher)
      }

      Thread.sleep(1000 * 10)

      // Close all watcher
      for (watcher <- watcherQueue) {
        watcher.close()
      }
    }
  }

In above codes, create 5000 watchers per loop, after sleep 10.seconds, close these 5000 watchers, total execute 100 loop.

During testing, in spite of closed the watcher in above codes, but the result showes it will not release the memory(the momery usage is increasing all the time via docker stats , finally can’t execute docker ps or docker stats(it seems etcd2 crash here)), if use free -m to check memory, the memory is consumed over.

During testing, i also used pprof to check the memory

go tool pprof http://xxx.xxx.xxx.xxx:2381/debug/pprof/heap?debug=1&seconds=10

Found go.etcd.io/etcd/mvcc.(*watchableStore).NewWatchStream’s memory usuage is increasing all the time.

So here, we can get a preliminary conclusion: the watch api lead to the memory leak, maybe watcher.close() doesn’t release the memory.

PS: I also did other test to prove the problem from another angle

  1. I Removed all etcdclient.watch/close logic from our own application, and test the application , monit the etcd memory usgage, the memory usuage keeps stable.
  2. I used go verion API to test again, all three etcd nodes’s memory usuage keep stable.
ackage main

import (
	"context"
	"fmt"
	"github.com/coreos/etcd/clientv3"
	"time"
)

func main() {
	cli, _ := clientv3.New(clientv3.Config{
		Endpoints:   []string{"localhost:2379"},
		DialTimeout: 5 * time.Second,
	})


	defer cli.Close()

	for j := 1; j <= 1000; j++ {
		var watchers []clientv3.Watcher
		for i := 1; i <= 5000; i++ {
			println("starting watcher: ", i)
			watcher := clientv3.NewWatcher(cli)
			key := fmt.Sprintf("foo-%d-%d", j, i)
			_ = watcher.Watch(context.Background(), key, clientv3.WithPrefix())

			watchers = append(watchers, watcher)

		}
		time.Sleep(10 * time.Second)

		for _, watcher := range watchers {
			println("closing watcher: ", watcher)
			watcher.Close()
		}
		println("done: ", j)
	}
}

the etcd0’s memory usuage keep stable (< 200M) the etcd1’s memory usuage keep stable (< 200M) the etcd2’s memory usuage keep stable (round 700M)

So obviousely,the etcd server itself and go version API have no problem also. The problem is in jetcd.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
lburgazzolicommented, Dec 13, 2019

this should eventually increase the memory on the client but not on the server side, right ?

0reactions
johnyannjcommented, Dec 24, 2019

@lburgazzoli Can you pay attention to this issue https://github.com/etcd-io/jetcd/issues/659

Thank you~~

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tuning - etcd
By default, snapshots will be made after every 10,000 changes. If etcd's memory usage and disk usage are too high, try lowering the...
Read more >
Operating etcd clusters for Kubernetes
etcd is a consistent and highly-available key value store used as Kubernetes' backing store for all cluster data. If your Kubernetes cluster ...
Read more >
High memory usage by etcd in OpenShift 3.x
Issue. High memory usage by etcd; After increasing the memory on etcd nodes from 16GB -> 32GB, memory still being consumed quickly ...
Read more >
etcd is Now a CNCF Incubating Project | AWS Open Source Blog
The etcd v3.2 default configuration shows higher memory usage, while giving more time for slow followers to catch up. This is a tradeoff ......
Read more >
Scaling Kubernetes to Thousands of CRDs - Upbound Blog
After earlier reports of increased memory usage following CRD creation the API server maintainers had identified and begun work on a fix that ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found