question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Agent breaks when deploying template with VM and runCommand

See original GitHub issue

When using a Bicep template to deploy an Ubuntu VM while also doing a runCommand in the same template after the VM has been deployed, the agent completely breaks and doesn’t seem to be able to repair itself. It’s as if the agent never finishes setting up to begin with. Removing the runCommand resource from the template will make sure the agent is properly installed, and deploying the same template with runCommand after the VM has been provisioned works as it should. Restarting the walinuxagent service or the VM doesn’t help either.

waagent --version
WALinuxAgent-2.2.45 running on ubuntu 18.04
Python: 3.6.9
Goal state agent: 2.2.45

Log file keeps retrying to connect. Same thing over and over.

2022/02/28 21:14:02.183944 INFO Daemon WireServer endpoint is not found. Rerun dhcp handler
2022/02/28 21:14:02.184409 INFO Daemon Test for route to 168.63.129.16
2022/02/28 21:14:02.187106 INFO Daemon Route to 168.63.129.16 exists
2022/02/28 21:14:02.187485 INFO Daemon Wire server endpoint:168.63.129.16
2022/02/28 21:14:02.200153 INFO Daemon Fabric preferred wire protocol version:2015-04-05
2022/02/28 21:14:02.200873 INFO Daemon Wire protocol version:2012-11-30
2022/02/28 21:14:02.201970 INFO Daemon Server preferred version:2015-04-05
2022/02/28 21:14:02.362426 INFO Daemon Found private key matching thumbprint 93D0A3DD1C418BEC2457ACF4412128541581F3E5
2022/02/28 21:14:02.445277 INFO Daemon Found private key matching thumbprint 93D0A3DD1C418BEC2457ACF4412128541581F3E5
2022/02/28 21:14:02.535893 INFO Daemon Found private key matching thumbprint 93D0A3DD1C418BEC2457ACF4412128541581F3E5
2022/02/28 21:14:02.549331 ERROR Daemon Exception processing goal state, giving up: [the JSON object must be str, bytes or bytearray, not 'NoneType']
2022/02/28 21:14:02.556208 INFO Daemon WireServer is not responding. Reset endpoint
2022/02/28 21:14:02.560111 INFO Daemon Protocol endpoint not found: WireProtocol, [ProtocolError] Exceeded max retry updating goal state
2022/02/28 21:14:02.569141 INFO Daemon Protocol endpoint not found: MetadataProtocol, [ProtocolError] 404 - GET: http://169.254.169.254/Microsoft.Compute/identity?api-version=2015-05-01-preview
2022/02/28 21:14:02.577300 INFO Daemon Retry detect protocols: retry=62

Steps to reproduce.

  1. Deploy VM and runCommand in the same template.
  2. Deploy will get stuck forever and runCommand is never executed and agent breaks.

Template used.

param location string = resourceGroup().location
param vmName string = 'VmTestRunCmd2'
param adminUsername string = 'hadmin'
@secure()
param adminPassword string
param vnetRGName string = 'a-natfw-rg'
param vnetName string = 'a-natfw-vnet'
param subnetBackend string = 'snet-pls'

resource vnet 'Microsoft.Network/virtualNetworks@2021-05-01' existing = {
  name: vnetName
  scope: resourceGroup(vnetRGName)
}

resource networkInterface 'Microsoft.Network/networkInterfaces@2020-11-01' = {
  name: '${vmName}-nic'
  location: location
  properties: {
    ipConfigurations: [
      {
        name: 'ipconfig1'
        properties: {
          privateIPAllocationMethod: 'Dynamic'
          subnet: {
            id: '${vnet.id}/subnets/${subnetBackend}'
          }
        }
      }
    ]
  }
}

resource virtualMachine 'Microsoft.Compute/virtualMachines@2020-12-01' = {
  name: vmName
  location: location
  properties: {
    hardwareProfile: {
      vmSize: 'Standard_A2_v2'
    }
    osProfile: {
      computerName: vmName
      adminUsername: adminUsername
      adminPassword: adminPassword
    }
    storageProfile: {
      imageReference: {
        publisher: 'Canonical'
        offer: 'UbuntuServer'
        sku: '18.04-LTS'
        version: 'latest'
      }
      osDisk: {
        name: '${vmName}-OSDisk'
        caching: 'ReadWrite'
        createOption: 'FromImage'
      }
    }
    networkProfile: {
      networkInterfaces: [
        {
          id: networkInterface.id
        }
      ]
    }
    diagnosticsProfile: {
      bootDiagnostics: {
        enabled: true
      }
    }
  }
}

resource runCommand 'Microsoft.Compute/virtualMachines/runCommands@2021-07-01' = {
  name: '${virtualMachine.name}/runCommandnow'
  location: location
  properties: {
    asyncExecution: false
    errorBlobUri: 'https://anatfwlogs.blob.core.windows.net/error/error.txt'
    outputBlobUri: 'https://anatfwlogs.blob.core.windows.net/error/output.txt'
    source: {
      script: 'touch /home/hadmin/test.txt'
    }
    timeoutInSeconds: 120
  }
}

vmagentnotready

vmproperties

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:39 (23 by maintainers)

github_iconTop GitHub Comments

1reaction
bpkrothcommented, Oct 27, 2022

It was a fresh VM setup this morning. I tried reinstalling via apt, but it didn’t seem to help. Wonder if the package got corrupt somehow. Anyways, working on creating a new one.

1reaction
narrietacommented, Oct 27, 2022

@bpkroth you would look if RunCommand shows up in the instance view, but your issue is unrelated, see my previous post

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting Azure Windows VM extension failures
The VM Agent is required to manage, install and execute extensions. If the VM Agent isn't running or is failing to report a...
Read more >
Run as user virtual machine environment resource
I have created a VM resource in a 'environment'. When I trigger a deployment job the job runs as 'root' user. Is there...
Read more >
Helping understanding bashing scripting in run-command
But the run-command just runs the shell script through the agent with the VM and it just runs one script at one time....
Read more >
VMware ESXi 6.5, Patch Release ESXi650-201811002
This is because the ESXi host caches the first availability check at VM power on and does not update it. This issue is...
Read more >
Tutorial: Local troubleshooting of a Cloud Run service
This tutorial shows how a service developer can troubleshoot a broken Cloud ... If you skip this step, the docker run command below...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found