question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lease expiration managed by the PersistentLease?

See original GitHub issue

Hello! I’m facing some spurious lease expiration managed by the PersistentLease even if there’re no network partition or hardware overloading issue.

The problem is that, sometimes, all leases managed by PersistentLeases get expired at the etcd server side and never go back active again (or re-granted) until the etcd server restarts. Actually a persistent lease instance is not notified of LeaseState.EXPIRED state even when a lease is actually expired at the etcd server when the issue hits. Interestingly, an EXPIRED event is fired immediately followed by an ACTIVE event fired when the client is reconnected to the restarted etcd server.

I believe (from some observation and code inspection) that the persistent lease monitors lease state and re-creates expired (not closed) lease and exposes its id through PersistentLease.getLeaseID() once a lease id renewed so I send ttl request to assure the lease is OK to be related to an entity. (if ttl > 0 part) Here’s roughly what I’m doing to create/refresh a PersistentLease-tied entity.

long getValidLease(PersistentLease lease) {
    validLease = -1;
    // lightly spin until I get a valid ttl response and id.
    // normally the body gets executed exactly once.
    do {
        // omitted: throw if the lease is CLOSED
        // since lease.getLeaseId() not guarantees a validness of the lease id,
        // I chose to use direct TTL request to query its state.
        ttlResp = etcdLease.ttl(lease.getLeaseId()); // lease id is updated by the event loop
        if (ttl > 0)
            validLease = ttlResp.getID();
    } while (lease.getCurrentTtlSecs() < 1); // also gets updated by the event loop
    return validLease;
}
long count(ByteString key) {
    return etcdKV.get(key).countOnly().async()
        .get(1000ms).getCount(); // 1 second timed wait-and-get
}

// operation PUT
long validLease = getValidLease(persistentLease);
etcdKV.put(key, data, validLease);

// operation REFRESH
if (count(key) == 0) {
    PUT_OPERATION(key, data); // put operation right above
}

All the entities (not many, < 20) get refreshed every 5 seconds. But after the spurious lease expiration all operations hang at the do-while loop in the getValidLease get expired lease ids through getValidLease and following operations fail because given lease id is already expired.

The etcd server looks OK: at that moment the etcd debug log shows that TTL requests from the do-while loop arrive and get answered at very high rate (due to the do-while loop) and further requests from clients (like etcdctl provided with the server distribution) get properly handled, and even granting a new lease from the same etcd-java client and making it persistent succeeds! It seems that the internal grpc client and event loop assigned with a persistent lease fail to handle responses from the server for some reason.

The issue appears randomly regardless of the server load status. As mentioned earlier, one simple solution for this is to restart the etcd server. After etcd-java reconnects to the restarted server and then all the operations work as expected again.

The etcd server (single instance configuration) is deployed in a small testbed and a spring boot application using etcd-java is also running at the same host, which means the client connects to the etcd server using localhost as the address.

Is there any recommended way dealing with the validness of a persistent lease, or am I missing something crucial?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:21 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
njhillcommented, Apr 5, 2019

Thanks @hsyhsw, no need to include a jar, just (preferably minimal) source code, e.g. just a class with main method would be great.

1reaction
njhillcommented, Sep 6, 2019

Great, thanks @hsyhsw! (though I know it took a long time to show up last time you tried so maybe it’s not definite yet…)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why our DHCP server lease expiration showing in infinite ...
The lease term of a DHCP reserved client is determined by the lease assigned to the reservation.
Read more >
How the DHCP Lease Renewal Process Works
The DHCP lease negotiation and renewal process can be governed by parameters such as lease time, renewal time, rebinding time, and expiration time....
Read more >
Lease Time
Specifying the Lease interface constant FOREVER, requests a lease that never expires. When granted such a lease, the grantor is responsible for ensuring...
Read more >
Options for Expiring Leases - Penn State Extension
The company will allow the lease to expire, and no other leasing opportunities will be offered.
Read more >
Expire the Lease for a Managed Device
Right-click the folder and select Expire Lease. The Expire Lease dialog box appears. By default, the check box for all managed devices in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found