When checking liveness, TLC can stall with 0s/min when behavior graph gets huge
See original GitHub issueHi, I have a weird issue that I can’t find a similar issue to. I have a model that runs fine in smaller configurations but in a large configuration it stalls on me after around 20 hours. I am currently trying to figure out whether that is reproducible but wanted to check in whether anyone has an idea before I spend days on hunting this 😃.
$ tlc MCraft.tla -coverage 99999 -lncheck final -checkpoint 0 > 06-22_term3_2msg_allchecks.txt
TLC2 Version 2.16 of 31 December 2020 (rev: cdddf55)
Running breadth-first search Model-Checking with fp 72 and seed -2309152779752331898 with 120 workers on 120 cores with 421973MB heap and 64MB offheap memory [pid: 132520] (Linux 5.8.0-1033-azure amd64, Ubuntu 11.0.11 x86_64, MSBDiskFPSet, DiskStateQueue).
Parsing file /mnt/tla/MCraft.tla
Parsing file /mnt/tla/ccfraft.tla
Parsing file /tmp/TLC.tla
Parsing file /tmp/Naturals.tla
Parsing file /tmp/FiniteSets.tla
Parsing file /tmp/Sequences.tla
Semantic processing of module Naturals
Semantic processing of module Sequences
Semantic processing of module FiniteSets
Semantic processing of module TLC
Semantic processing of module ccfraft
Semantic processing of module MCraft
Starting... (2021-06-22 10:59:22)
Computing initial states...
Finished computing initial states: 1 distinct state generated at 2021-06-22 10:59:22.
Progress(18) at 2021-06-22 10:59:25: 573,895 states generated (573,895 s/min), 146,981 distinct states found (146,981 ds/min), 77,550 states left on queue.
Progress(23) at 2021-06-22 11:00:31: 17,922,320 states generated (17,348,425 s/min), 3,432,850 distinct states found (3,285,869 ds/min), 1,445,487 states left on queue.
Progress(24) at 2021-06-22 11:01:34: 36,621,166 states generated (18,698,846 s/min), 6,665,301 distinct states found (3,232,451 ds/min), 2,637,133 states left on queue.
Progress(25) at 2021-06-22 11:02:34: 58,618,771 states generated (21,997,605 s/min), 10,348,249 distinct states found (3,682,948 ds/min), 3,909,671 states left on queue.
<< run for around 20 hours >>
Progress(53) at 2021-06-23 07:15:18: 32,406,652,959 states generated (25,823,222 s/min), 3,535,734,968 distinct states found (1,436,529 ds/min), 11,120,988 states left on queue.
Progress(54) at 2021-06-23 07:16:18: 32,431,882,837 states generated (25,229,878 s/min), 3,537,027,607 distinct states found (1,292,639 ds/min), 9,366,288 states left on queue.
Progress(54) at 2021-06-23 07:17:18: 32,458,152,177 states generated (26,269,340 s/min), 3,538,450,442 distinct states found (1,422,835 ds/min), 7,839,412 states left on queue.
Progress(54) at 2021-06-23 07:18:18: 32,483,661,631 states generated (25,509,454 s/min), 3,539,748,560 distinct states found (1,298,118 ds/min), 6,098,679 states left on queue.
Progress(55) at 2021-06-23 07:19:18: 32,509,352,248 states generated (25,690,617 s/min), 3,540,979,705 distinct states found (1,231,145 ds/min), 4,259,825 states left on queue.
Progress(55) at 2021-06-23 07:20:18: 32,535,021,749 states generated (25,669,501 s/min), 3,542,198,441 distinct states found (1,218,736 ds/min), 2,411,459 states left on queue.
Progress(56) at 2021-06-23 07:21:18: 32,560,155,957 states generated (25,134,208 s/min), 3,543,225,982 distinct states found (1,027,541 ds/min), 286,379 states left on queue.
Progress(58) at 2021-06-23 07:22:18: 32,562,582,230 states generated (2,426,273 s/min), 3,543,297,600 distinct states found (71,618 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:23:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:24:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:25:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:26:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:27:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:28:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
Progress(58) at 2021-06-23 07:29:18: 32,562,582,230 states generated (0 s/min), 3,543,297,600 distinct states found (0 ds/min), 22,454 states left on queue.
<<stay like this for another hour before I killed it>>
In this stale state, the model used about 200GB of memory (of ~400 available), no core of the 120 available was busy and the model used about 40Gb on disk (and did not have any I/O running). My first intuition was that the -lncheck final deferred check was the reason for this but I expected some load on the machine for that vs none that was present. I’m currently rerunning without that and without coverage but would appreciate any ideas while waiting for the next run to get there 😃.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (4 by maintainers)
Sounds good, I will do that 👍
If TLC stalls again for extended periods of time, please don’t kill it. I’d like to investigate if possible.