[BUG] Agent spins for hours before provisioning with "ERROR Daemon /proc/net/route contains no routes"
See original GitHub issueWe make agents by spinning up gallery VMs, applying packages and other artifacts to them, deprovisioning them and publishing the result to a Shared image gallery. This approach works fine on most linuxes, but on Ubuntu 18.04 it’s become extremely slow because code inside the waagent is trying to get proc/net/route and until it gives up (after it does, things tend to succeed) our custom script extensions don’t even run.
Note: Please add some context which would help us understand the problem better
- Section of the log where the error occurs.
2020/10/13 05:09:59.891580 INFO Daemon Send dhcp request
2020/10/13 05:09:59.895887 INFO Daemon Examine /proc/net/route for primary interface
2020/10/13 05:09:59.901499 ERROR Daemon /proc/net/route contains no routes
2020/10/13 05:09:59.910029 WARNING Daemon Could not determine primary interface, please ensure /proc/net/route is correct
2020/10/13 05:09:59.920703 WARNING Daemon Contents of /proc/net/route:
Iface Destination Gateway Flags RefCnt Use Metric Mask MTU Window IRTT
2020/10/13 05:09:59.944223 WARNING Daemon Primary interface examination will retry silently
2020/10/13 05:10:02.081780 ERROR Daemon /proc/net/route contains no routes
2020/10/13 05:10:04.094002 ERROR Daemon /proc/net/route contains no routes
. .. over an hour of retries ensues ...
2020/10/13 06:32:11.984748 ERROR Daemon /proc/net/route contains no routes
2020/10/13 06:32:13.998737 ERROR Daemon /proc/net/route contains no routes
2020/10/13 06:32:41.714765 INFO Daemon Azure Linux Agent Version:2.2.45
2020/10/13 06:32:41.732735 INFO Daemon OS: ubuntu 18.04
2020/10/13 06:32:41.736017 INFO Daemon Python: 3.7.5
...VM, and its agent works fine after this ..
- Serial console output
See above (attached full agent log)
- Steps to reproduce the behavior.
Distro and WALinuxAgent details (please complete the following information):
- Distro and Version: Ubuntu 18.04
- WALinuxAgent version: 2.2.45
Additional context
This very well may be something we’re doing “wrong”, but I’m filing this issue in the hopes that this retry logic could be shorter, or a suggestion as to what we might be doing wrong could be made. I suspect our changes to the image may be introducing a race condition where the VM starts up differently and the network stack just isn’t “ready” yet. Possibly related to https://github.com/Azure/WALinuxAgent/issues/1938 ?
Log file attached If possible, please provide the full /var/log/waagent.log file to help us understand the problem better and get the context of the issue. Sample Agent log: waagent.log
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (3 by maintainers)
Top GitHub Comments
@trstringer could you take a look? thanks!
Agreed. We moved all servers that’s need python 3.8 to 20.04. From python and pip perspective, it’s work much better and you don’t need to manage pip by python version.
It’s more work to manage 2 versions… but the plan is to upgrade (redeploy) all servers to 20.04.