Incomplete node list
See original GitHub issueOn lumi the command sacct -S <start_tim> -P -j <jobid> -o jobid,state,exitcode,end,nodelist
doesn’t always give the complete node list since the beginning of the job. It would print for instance
JobID|State|ExitCode|End|NodeList
1305969|PENDING|0:0|Unknown|nid005828
1305969.batch|RUNNING|0:0|Unknown|nid005828
and a bit later it starts giving the whole information
JobID|State|ExitCode|End|NodeList
1305969|RUNNING|0:0|Unknown|nid[005828-005831]
1305969.batch|RUNNING|0:0|Unknown|nid005828
1305969.0|RUNNING|0:0|Unknown|nid[005829-005831]
This is a problem for reframe because it updates the node list only once https://github.com/reframe-hpc/reframe/blob/a5b66c7c41d7cc884893642fd4d9331b146a3c16/reframe/core/schedulers/slurm.py#L383-L392
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Incomplete view in Node list · Issue #1208 · rundeck/rundeck · GitHub
With 2.5 I can no longer see all of my nodes or have an option to expand the list from the Nodes tab....
Read more >NodeJs API responding with incomplete object - Stack Overflow
The objects are updating successfully but the API response that I recieve on Frontend is not updated with second collection values. router.get(' ...
Read more >What makes this definition of a linked list in C an incomplete ...
A linked list is a set of dynamically allocated nodes, arranged in such a way that each node contains one value and one...
Read more >How to deal with DataIncomplete error when accessing ...
Ideally, I'd like to have shapely polygons / multipolygons for the nodes forming part of a way. python · overpass-api · overpass-turbo ·...
Read more >35 Incomplete Types - Beej's Guide to C Programming
But what if we're doing a linked list? Each linked list node needs to have a reference to another node. But how can...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Getting the nodelist through job.nodelist does not work consistently, regardless of which stage i try to do it. I do not expect it to work before or during the compile stage. But, for performance or sanity stage (or, any time after the run stage), it is incredibly useful to know which nodes i have and how many. At the moment, i do not know how to reliably get this information through the framework without hacks.
The problem is that we peek into the nodelist only once, the first time we get back a non-empty node list by
sacct
orsqueue
as @rsarm has pointed out. Apparently, Slurm does not fill it up at once, so that’s why we miss it sometimes and sometimes we don’t. I think the best solution is to retrieve the value everytime, but do not issuescontrol
to unwrap the node list until the job has finished, so that we issuescontrol
once.