Unexpected exit status 1 while running "logging"
See original GitHub issueI’m running a task array of 22 jobs (one per autosome) and I’ve noticed a message I have not seen before. Seemingly, the logging component of the job failed(?). (That may be inaccurate, it’s just my interpretation of the message.)
The message that I get when I run dstat
on the job using -f
is:
script-name: dsub-command.sh
start-time: '2019-05-22 16:35:13.783803'
status: RUNNING
status-detail: |-
logging:
Unexpected exit status 1 while running "logging"
status-message: Unexpected exit status 1 while running "logging"
task-attempt: 1
task-id: '19'
user-id: jamesp
The other tasks seem to be continuing without issue. Am not yet sure whether this task will complete. (It says it is RUNNING, as you can see above.)
This is dsub version: 0.3.1
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Troubleshooting 'terminated with exit code 1' error - ContainIQ
As exit code 1 is issued from within a pod, checking its logs should be your first troubleshooting step. Although containers may seem...
Read more >How to Fix 'Terminated With Exit Code 1' Error - Komodor
Exit Code 1 means that a container terminated, typically due to an application error or an invalid reference. An application error is a...
Read more >Unexpected exit code when running a batch file from ...
The calling powershell has the correct result. echo $LASTEXITCODE is 1 . Example #2: .\test2.bat. ECHO ON setlocal enabledelayedexpansion if ...
Read more >Child process /sbin/parted - unexpected exit status 1: Error ...
I try to define volume to VM but failed. The same result I receive in GUI. # virsh pool-list Name State Autostart ------------ ......
Read more >Two Source Match Stage, long running job and fails at last
Then job runs for long time and at end it fails with following log ... APT_PMsectionLeader(1, node1), player 12 - Unexpected exit status...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for following up with those details. It is good that the logging failure did ultimately fail the workflow. That is expected. It did take longer to fail the workflow than I would have expected. I’ll follow up with Cloud Health to see if a failure in a background action should trigger failure more quickly.
I want to correct one thing I indicated yesterday - we actually do retry
gsutil cp
3 times due to the occasional arbitrary failure (see https://github.com/DataBiosphere/dsub/blob/master/dsub/providers/google_v2.py#L103). However, we put no delay in between failures which prevents us recovering from this particular auth issue. We will add a modest delay here. The refresh of credentials should not take long and we don’t want to overly delay reporting a genuine failure.Specific follow-ups:
gsutil cp
retry logicdstat --full
(“ServiceException: 401…” was there in the underlying operation)logging
failure can be made to fail the operation more quicklyVarious environments, including GCE and Cloud Shell, episodically have problems where the credentials used become unavailable. So in this particular case, the GCE service account token has become unavailable or stale and
gsutil
is unable to successfully make necessary calls.In various places,
dsub
has explicit retries for such conditions. See https://github.com/DataBiosphere/dsub/blob/master/dsub/providers/google_base.py#L69:The case observed here (logging) does not explicitly include any retries. Unfortunately, catching
gsutil
errors is a little clunky since you have to capture STDERR and do string parsing.It will be interesting to see if this problem transiently caused the
logging
action to fail and if the credentials will have been refreshed by the timedelocalization
andfinal_logging
occur.