Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

All nodes in pool in state `starttaskfailed`: "Docker root dir $rootdir not within $USER_MOUNTPOINT"

See original GitHub issue

Problem Description

I have been using Azure Batch Shipyard with VMs of type STANDARD_NC6 succesfully for a while. Usually, I create a pool, submit some jobs (with several tasks) and kill the pool again, all over the course of at most a couple of days.

As of today, when creating the pool and submitting a job, all nodes enter the “starttaskfailed” state. I have deleted and recreated the pool and job several times. Using the Azure Batch Explorer I have checked the node startup logs and find the following text at the bottom of stdout.txt:

Client: Docker Engine - Community
 Version:           19.03.0
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.12.5
 Git commit:        aeac949
 Built:             Wed Jul 17 18:16:07 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false
2019-07-23T11:11:50,787210380+00:00 - ERROR - Docker root dir Dir: not within /mnt

This seems to originate from shipyard_nodeprep.sh line 730-737:

    local rootdir
    rootdir=$(docker info | grep "Docker Root Dir" | cut -d' ' -f 4)
    if echo "$rootdir" | grep "$USER_MOUNTPOINT" > /dev/null; then
        log DEBUG "Docker root dir: $rootdir"
    else
        log ERROR "Docker root dir $rootdir not within $USER_MOUNTPOINT"
        exit 1
    fi

It looks like the cut command does not properly extract the “Docker Root Dir” from the output of docker info (note that $rootdir = "Dir:" !).

Batch Shipyard Version

3.7.0

Steps to Reproduce

Create pool, then create job.

Expected Results

The pool gets created and the job + tasks start properly.

Actual Results

All nodes in the pool get stuck in “starttaskfailed”

Redacted Configuration

pool:

pool_specification:
  id: my-pool
  vm_configuration:
    platform_image:
      offer: UbuntuServer
      publisher: Canonical
      sku: 16.04-LTS
  vm_count:
    dedicated: 0
    low_priority: 10
  vm_size: STANDARD_NC6

Additional Logs

Header part from stdout.txt:

Configuration:
--------------
Custom image: 0
Native mode: 0
OS Distribution: ubuntu 16.04
Batch Shipyard version: 3.7.0
Blobxfer version: 1.7.0
Singularity version: 
User mountpoint: /mnt
Mount path: /mnt/batch/tasks/mounts
Batch Insights: 0
Prometheus: NE=, CA=,
Network optimization: 1
Encryption cert thumbprint: 
Install Kata Containers: 0
Default container runtime: runc
Install BeeGFS BeeOND: 0
Storage cluster mount: 
Custom mount: 
Install LIS: 
GPU: False:nvidia-driver_cc37.run
Azure Blob: 1
Azure File: 0
GlusterFS on compute: 0
HPN-SSH: 0
Enable Azure Batch group for Docker access: 
Fallback registry: 
Docker image preload delay: 0
Cascade via container: 1
P2P: 0
Block on images: REDACTED#

Additonal Comments

Issue Analytics

State:
Created 4 years ago
Comments:8

Top GitHub Comments

2reactions

canoascommented, Jul 23, 2019

Thank you @alfpark ! workaround is working

rootdir=$(awk -F' ' '{print $NF}' <<< $(docker info | grep "Docker Root Dir"))

but we had to recreate our pool from a clean state using a recompiled version of shipyard

1reaction

elemakilcommented, Jul 24, 2019

@alfpark Thanks a ton for providing such a quick workaround! Using native: true has indeed resolved this issue for me.

Top Results From Across the Web

Docker installing yarn at root causing tsc rootDir error

Update your WORKDIR command in Dockerfile to the following: WORKDIR /app. The base image has yarn already installed in the /opt directory.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

All nodes in pool in state `starttaskfailed`: "Docker root dir $rootdir not within $USER_MOUNTPOINT"

Problem Description

Batch Shipyard Version

Steps to Reproduce

Expected Results

Actual Results

Redacted Configuration

Additional Logs

Additonal Comments

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Job Scheduling (shipyard-jmtask) Error Message: "ModuleNotFoundError: No module named 'ruamel'"

Where to find docker logs?