question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

When checking liveness, TLC can stall with 0s/min when behavior graph gets huge

See original GitHub issue

Hi, I have a weird issue that I can’t find a similar issue to. I have a model that runs fine in smaller configurations but in a large configuration it stalls on me after around 20 hours. I am currently trying to figure out whether that is reproducible but wanted to check in whether anyone has an idea before I spend days on hunting this 😃.

$ tlc MCraft.tla -coverage 99999 -lncheck final  -checkpoint 0 > 06-22_term3_2msg_allchecks.txt
TLC2 Version 2.16 of 31 December 2020 (rev: cdddf55)
Running breadth-first search Model-Checking with fp 72 and seed -2309152779752331898 with 120 workers on 120 cores with 421973MB heap and 64MB offheap memory [pid: 132520] (Linux 5.8.0-1033-azure amd64, Ubuntu 11.0.11 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /mnt/tla/MCraft.tla
Parsing file /mnt/tla/ccfraft.tla
Parsing file /tmp/TLC.tla
Parsing file /tmp/Naturals.tla
Parsing file /tmp/FiniteSets.tla
Parsing file /tmp/Sequences.tla
Semantic processing of module Naturals
Semantic processing of module Sequences
Semantic processing of module FiniteSets
Semantic processing of module TLC
Semantic processing of module ccfraft
Semantic processing of module MCraft
Starting... (2021-06-22 10:59:22)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2021-06-22 10:59:22.
Progress(18) at 2021-06-22 10:59:25: 573,895 states generated (573,895 s/min), 146,981 distinct states found (146,981 ds/min), 77,550 states left on queue.
Progress(23) at 2021-06-22 11:00:31: 17,922,320 states generated (17,348,425 s/min), 3,432,850 distinct states found (3,285,869 ds/min), 1,445,487 states left on queue.
Progress(24) at 2021-06-22 11:01:34: 36,621,166 states generated (18,698,846 s/min), 6,665,301 distinct states found (3,232,451 ds/min), 2,637,133 states left on queue.
Progress(25) at 2021-06-22 11:02:34: 58,618,771 states generated (21,997,605 s/min), 10,348,249 distinct states found (3,682,948 ds/min), 3,909,671 states left on queue.
<< run for around 20 hours >>
Progress(53) at 2021-06-23 07:15:18: 32,406,652,959 states generated (25,823,222 s/min), 3,535,734,968 distinct states found (1,436,529 ds/min), 11,120,988 states left on queue.
Progress(54) at 2021-06-23 07:16:18: 32,431,882,837 states generated (25,229,878 s/min), 3,537,027,607 distinct states found (1,292,639 ds/min), 9,366,288 states left on queue.
Progress(54) at 2021-06-23 07:17:18: 32,458,152,177 states generated (26,269,340 s/min), 3,538,450,442 distinct states found (1,422,835 ds/min), 7,839,412 states left on queue.
Progress(54) at 2021-06-23 07:18:18: 32,483,661,631 states generated (25,509,454 s/min), 3,539,748,560 distinct states found (1,298,118 ds/min), 6,098,679 states left on queue.
Progress(55) at 2021-06-23 07:19:18: 32,509,352,248 states generated (25,690,617 s/min), 3,540,979,705 distinct states found (1,231,145 ds/min), 4,259,825 states left on queue.
Progress(55) at 2021-06-23 07:20:18: 32,535,021,749 states generated (25,669,501 s/min), 3,542,198,441 distinct states found (1,218,736 ds/min), 2,411,459 states left on queue.
Progress(56) at 2021-06-23 07:21:18: 32,560,155,957 states generated (25,134,208 s/min), 3,543,225,982 distinct states found (1,027,541 ds/min), 286,379 states left on queue.
Progress(58) at 2021-06-23 07:22:18: 32,562,582,230 states generated (2,426,273 s/min), 3,543,297,600 distinct states found (71,618 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:23:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:24:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:25:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:26:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:27:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:28:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:29:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
<<stay like this for another hour before I killed it>>

In this stale state, the model used about 200GB of memory (of ~400 available), no core of the 120 available was busy and the model used about 40Gb on disk (and did not have any I/O running). My first intuition was that the -lncheck final deferred check was the reason for this but I expected some load on the machine for that vs none that was present. I’m currently rerunning without that and without coverage but would appreciate any ideas while waiting for the next run to get there 😃.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
fritzaldercommented, Jun 30, 2021

Sounds good, I will do that 👍

1reaction
lemmycommented, Jun 23, 2021

If TLC stalls again for extended periods of time, please don’t kill it. I’d like to investigate if possible.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Configure Liveness, Readiness and Startup Probes
This page shows how to configure liveness, readiness and startup probes for containers. The kubelet uses liveness probes to know when to restart ......
Read more >
Kubernetes Liveness and Readiness Probes: How to Avoid ...
Kubernetes uses liveness probes to know when to restart a container. If a container is unresponsive—perhaps the application is deadlocked due to ...
Read more >
You (probably) need liveness and readiness probes
I will share my opinion about the best way to use liveness and readiness probes in applications deployed to Red Hat OpenShift.
Read more >
Readiness vs liveliness probes: How to set them up and when ...
Kubernetes best practices: Setting up health checks with readiness and liveness probes · Sandeep Dinesh · Anchoring on Containers.
Read more >
Liveness and Readiness Probes with Spring Boot
If an application is too busy processing a task queue, then it could declare itself as busy until its load is manageable again....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found