question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

All nodes in pool in state `starttaskfailed`: "Docker root dir $rootdir not within $USER_MOUNTPOINT"

See original GitHub issue

Problem Description

I have been using Azure Batch Shipyard with VMs of type STANDARD_NC6 succesfully for a while. Usually, I create a pool, submit some jobs (with several tasks) and kill the pool again, all over the course of at most a couple of days.

As of today, when creating the pool and submitting a job, all nodes enter the “starttaskfailed” state. I have deleted and recreated the pool and job several times. Using the Azure Batch Explorer I have checked the node startup logs and find the following text at the bottom of stdout.txt:

Client: Docker Engine - Community
 Version:           19.03.0
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.12.5
 Git commit:        aeac949
 Built:             Wed Jul 17 18:16:07 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false
2019-07-23T11:11:50,787210380+00:00 - ERROR - Docker root dir Dir: not within /mnt

This seems to originate from shipyard_nodeprep.sh line 730-737:

    local rootdir
    rootdir=$(docker info | grep "Docker Root Dir" | cut -d' ' -f 4)
    if echo "$rootdir" | grep "$USER_MOUNTPOINT" > /dev/null; then
        log DEBUG "Docker root dir: $rootdir"
    else
        log ERROR "Docker root dir $rootdir not within $USER_MOUNTPOINT"
        exit 1
    fi

It looks like the cut command does not properly extract the “Docker Root Dir” from the output of docker info (note that $rootdir = "Dir:" !).

Batch Shipyard Version

3.7.0

Steps to Reproduce

Create pool, then create job.

Expected Results

The pool gets created and the job + tasks start properly.

Actual Results

All nodes in the pool get stuck in “starttaskfailed”

Redacted Configuration

pool:

pool_specification:
  id: my-pool
  vm_configuration:
    platform_image:
      offer: UbuntuServer
      publisher: Canonical
      sku: 16.04-LTS
  vm_count:
    dedicated: 0
    low_priority: 10
  vm_size: STANDARD_NC6

Additional Logs

Header part from stdout.txt:

Configuration:
--------------
Custom image: 0
Native mode: 0
OS Distribution: ubuntu 16.04
Batch Shipyard version: 3.7.0
Blobxfer version: 1.7.0
Singularity version: 
User mountpoint: /mnt
Mount path: /mnt/batch/tasks/mounts
Batch Insights: 0
Prometheus: NE=, CA=,
Network optimization: 1
Encryption cert thumbprint: 
Install Kata Containers: 0
Default container runtime: runc
Install BeeGFS BeeOND: 0
Storage cluster mount: 
Custom mount: 
Install LIS: 
GPU: False:nvidia-driver_cc37.run
Azure Blob: 1
Azure File: 0
GlusterFS on compute: 0
HPN-SSH: 0
Enable Azure Batch group for Docker access: 
Fallback registry: 
Docker image preload delay: 0
Cascade via container: 1
P2P: 0
Block on images: REDACTED#

Additonal Comments

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8

github_iconTop GitHub Comments

2reactions
canoascommented, Jul 23, 2019

Thank you @alfpark ! workaround is working

rootdir=$(awk -F' ' '{print $NF}' <<< $(docker info | grep "Docker Root Dir"))

but we had to recreate our pool from a clean state using a recompiled version of shipyard

1reaction
elemakilcommented, Jul 24, 2019

@alfpark Thanks a ton for providing such a quick workaround! Using native: true has indeed resolved this issue for me.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Docker installing yarn at root causing tsc rootDir error
Update your WORKDIR command in Dockerfile to the following: WORKDIR /app. The base image has yarn already installed in the /opt directory.
Read more >
TSConfig Option: rootDir - TypeScript
Importantly, rootDir does not affect which files become part of the compilation. It has no interaction with the include , exclude , or...
Read more >
File is not under 'rootDir' error in TypeScript | bobbyhadz
The "File is not under 'rootDir'" error occurs when we try to import something from a file that is not located under the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found