question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Agents fail to provision after restart

See original GitHub issue

After Jenkins has been restarted, agents fail to provision with the following messages in the logs:

May 22, 2020 6:38:55 AM FINE com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher
ECS: Launching agent
May 22, 2020 6:38:55 AM FINE com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher
[ecs-cloud-ecs-main-fmcpc]: Creating Task in cluster null
May 22, 2020 6:38:55 AM WARNING com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher launch
[ecs-cloud-ecs-main-fmcpc]: Error in provisioning; agent=com.cloudbees.jenkins.plugins.amazonecs.ECSSlave[ecs-cloud-ecs-main-fmcpc]
java.lang.NullPointerException
	at com.cloudbees.jenkins.plugins.amazonecs.ECSService.registerTemplate(ECSService.java:150)
	at com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher.getTaskDefinition(ECSLauncher.java:205)
	at com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher.launch(ECSLauncher.java:107)
	at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:292)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

May 22, 2020 6:38:55 AM FINER com.cloudbees.jenkins.plugins.amazonecs.ECSLauncher
[ecs-cloud-ecs-main-fmcpc]: Removing Jenkins node

All builds using ECS agents fail the same way. For context, we use Fargate agents in declarative pipelines, some with overrides on memory, cpu or image

Modifying and saving the agent config resolves the issue temporarily, but it returns as soon as Jenkins is restarted.

  • Jenkins v2.222.3
  • amazon-ecs-plugin v1.34

~The bit that caught my attention in the logs was Creating Task in cluster null - presumably that’s not a good sign? Any ideas why the cluster would be null after a restart?~ (this appears to be unrelated, even successful provisioning has this)

This only seems to have begun occurring after we upgraded from v1.26 of the plugin.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:11
  • Comments:26 (5 by maintainers)

github_iconTop GitHub Comments

6reactions
albanfcommented, May 26, 2020

Fix that worked for me: go to https://<jenkins>/configureClouds/ and click Save, then delete the nodes which were being created.

2reactions
serpentbladecommented, Sep 21, 2020

I think I have worked out the fix for at least one variation of this (the NPE on registerTemplate).

It was difficult to debug, as I believe the ECSCloud class is serialized; I could never get the debugger to pause on the constructor so concluded it may have been serialized. From my debugging ECSService would always end up with its Supplier = null after restart. Thankfully ECSService is already lazy-loaded via a call to ECSCloud.getEcsService(), so it doesn’t actually need to be preserved with ECSCloud. I switched that field to transient, and have had several successful restarts where I don’t encounter this error anymore.

I’ve created PR #216 if anybody would like to test and verify they see the same success

Read more comments on GitHub >

github_iconTop Results From Across the Web

Provisioning is failing after Agent instalation and reboot
Hello, I have a strange case on my provisioning, after installing the agent and reboot step, the client disappear from the provisioning task ......
Read more >
Troubleshoot on-premises application provisioning
Restart the provisioning agent by going to the taskbar on your VM by searching for the Microsoft Azure AD Connect provisioning agent.
Read more >
Unable to get provision certificate bytes for agent deployment
While attempting to deploy an agent the credentials test works, but the agent deployment fails with the message: Unable to deploy. agent ...
Read more >
Unable to start SAP HANA Data Provisioning Agent
Try to uninstall the DP Agent and your Java and reinstall it again. This might resolve your issue.. Regards,. Ashish. Add a Comment ......
Read more >
Troubleshooting Agent Provisioning KACE SMA Client ...
Description · Check for provisioning support files · Check for connectivity to the SMA appliance shares · Ping TARGETPC from the KACE K1000 ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found