question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fargate agent stuck "Waiting for agent to connect"

See original GitHub issue

I’m working through setting this up, but I am unable to run the suggested sample pipeline from

cloudbees/jnlp-slave-with-java-build-tools

node ("jenkins-ecs-worker") {
    git "https://github.com/cloudbees-community/game-of-life"
    withMaven(mavenSettingsConfig:'my-maven-settings') {
       sh "mvn clean deploy"
    }
}

The ECS Fargate worker appears to start fine but never connects:

2019-06-11 14:03:19.003+0000 [id=42]	INFO	c.c.j.plugins.amazonecs.ECSCloud#provision: Asked to provision 1 agent(s) for: jenkins-ecs-worker
2019-06-11 14:03:19.003+0000 [id=42]	INFO	c.c.j.plugins.amazonecs.ECSCloud#provision: In provisioning : []
2019-06-11 14:03:19.003+0000 [id=42]	INFO	c.c.j.plugins.amazonecs.ECSCloud#provision: Excess workload after pending ECS agents: 1
2019-06-11 14:03:19.003+0000 [id=42]	INFO	c.c.j.plugins.amazonecs.ECSCloud#provision: Will provision ECS Agent jenkins-ecs-worker, for label: jenkins-ecs-worker
2019-06-11 14:03:19.003+0000 [id=42]	INFO	h.s.NodeProvisioner$StandardStrategyImpl#apply: Started provisioning ECS Agent jenkins-ecs-worker from jenkins-build-cluster with 1 executors. Remaining excess workload: 0
2019-06-11 14:03:29.020+0000 [id=40]	INFO	hudson.slaves.NodeProvisioner$2#run: ECS Agent jenkins-ecs-worker provisioning successfully completed. We have now 2 computer(s)
2019-06-11 14:03:29.191+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSService#registerTemplate: Match on container definition: true
2019-06-11 14:03:29.191+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSService#registerTemplate: Match on volumes: true
2019-06-11 14:03:29.191+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSService#registerTemplate: Match on task role: true
2019-06-11 14:03:29.191+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSService#registerTemplate: Match on execution role: true
2019-06-11 14:03:29.191+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSService#registerTemplate: Match on network mode: true
2019-06-11 14:03:29.191+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#runECSTask: [jenkins-build-cluster-pbxr1]: Starting agent with task definition arn:aws:ecs:us-east-1:435433059373:task-definition/jenkins-build-cluster-jenkins-ecs-worker:4}
2019-06-11 14:03:29.885+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#runECSTask: [jenkins-build-cluster-pbxr1]: Agent started with task arn : arn:aws:ecs:us-east-1:435433059373:task/jenkins-build-cluster/ba7c88b826114baa8b7e864a0fe3faa3
2019-06-11 14:03:29.885+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: TaskArn: arn:aws:ecs:us-east-1:435433059373:task/jenkins-build-cluster/ba7c88b826114baa8b7e864a0fe3faa3
2019-06-11 14:03:29.885+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: TaskDefinitionArn: arn:aws:ecs:us-east-1:435433059373:task-definition/jenkins-build-cluster-jenkins-ecs-worker:4
2019-06-11 14:03:29.885+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: ClusterArn: arn:aws:ecs:us-east-1:435433059373:cluster/jenkins-build-cluster
2019-06-11 14:03:29.885+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: ContainerInstanceArn: null
2019-06-11 14:03:30.008+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to start
2019-06-11 14:03:31.121+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to start
...
2019-06-11 14:05:01.481+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to start
2019-06-11 14:05:02.598+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to start
2019-06-11 14:05:03.711+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to start
2019-06-11 14:05:04.816+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to start
2019-06-11 14:05:06.087+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Task started, waiting for agent to become online
2019-06-11 14:05:06.087+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to connect
2019-06-11 14:05:07.087+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to connect
2019-06-11 14:06:07.107+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to connect

... (~50 lines deleted)

2019-06-11 14:06:08.108+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to connect
2019-06-11 14:06:08.165+0000 [id=32324]	WARNING	h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #39 failed
java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:197)
	at java.io.DataInputStream.readUTF(DataInputStream.java:609)
	at java.io.DataInputStream.readUTF(DataInputStream.java:564)
	at hudson.TcpSlaveAgentListener$ConnectionHandler.run(TcpSlaveAgentListener.java:264)
2019-06-11 14:06:09.108+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to connect
2019-06-11 14:06:10.108+0000 [id=32033]	INFO	c.c.j.p.amazonecs.ECSLauncher#launch: [jenkins-build-cluster-pbxr1]: Waiting for agent to connect

... repeats until timeout

Details from the running Fargate task look like this:

Command | ["-url","https://10.0.3.51:50000/","4c2e4a0152369e721e6dba9854c039506c558a380c012e370fe433f83d524bf7","jenkins-build-cluster-m3wrm"]
-- | --
Privileged: false
Network bindings - not configured

Network mode is awsvpc and the subnet matches the Jenkins server.

I don’t see any errors or anything else that’s obviously wrong, but maybe experts here will. Thanks for any help on this.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
bwolincommented, Jun 12, 2019

I believe the issue is that when Jenkins launches the agent, it registers the alias (e.g. jenkins-build-cluster-92xch) internally and validates that.

Trying to save time, my test was to manually launch the docker with an old alias that was getting rejected. When I got the final connection URL right (port 8080 instead of 50000, http instead of https) and launched it again from Jenkins, it finally worked.

0reactions
sathish97625commented, Feb 19, 2020

Also, flag public ip in Jenkins cloud configuration, it will resolve the issue, its resolved for us.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot Amazon ECS tasks for Fargate that are stuck in ...
My Amazon Elastic Container Service (Amazon ECS) task that's running on AWS Fargate is stuck in the PENDING state.
Read more >
why are my Fargate tasks stuck on pending? - Stack Overflow
Based on the discussion in the comments it was determined that the issue is caused by the lack of internet access for the...
Read more >
dedicated agents are not able to connect
Your inbound (formerly known as "JNLP") build agent is failing to connect to your Jenkins controller.
Read more >
Running Jenkins jobs in AWS ECS with slave agents
Building Docker images with Jenkins. To run Jenkins slave containers in ECS we'll use the Fargate launch type. This means AWS takes care...
Read more >
Install EKS Fargate | New Relic Documentation
When adding the manifest of the sidecar agent manually, you can use any agent configuration option to configure the agent behavior. For help,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found