question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Opbeans" stage of release pipeline fails

See original GitHub issue

Recently in https://github.com/elastic/apm-agent-nodejs/issues/2625 we automated releases: when a version tag (“vN.N.N”) is pushed, a Jenkins “Release” stage will build and publish the Lambda layer, do a GitHub release, npm publish, and attempt to updating opbeans-node.git to use this new APM agent release.

That “Opbeans” stage is flaky (or perhaps fails every time), as discussed here: https://github.com/elastic/apm-agent-nodejs/issues/2625#issuecomment-1137881801 This issue is about making the release process reliable by doing something about this stage.

The Opbeans stage effectively does this: https://github.com/elastic/apm-agent-nodejs/pull/2723#issuecomment-1137922102

Options

Option 1: npm publish early and hope

Do the ‘npm publish’ step earlier in the pipeline and hope that the lambda layer publishing steps take enough time that the Opbeans stage will work then.

I don’t love this idea because relying on “hope” means that it may fail sometime, just less frequently, which just means a more subtle bug. Also see the “timeout” discussion below.

Option 2: wait for npm install to work

Add a spin loop at the start of the Opbeans stage process to retry the npm install if it gets an ETARGET with a timeout to account for being run soon after a publish.

The “ETARGET” is referring to the specific error you get from npm install when this issue happens:

[2022-05-25T21:23:57.820Z] + CI=true npm install --ignore-scripts elastic-apm-node@3.34.0
[2022-05-25T21:23:59.440Z] npm ERR! code ETARGET
[2022-05-25T21:23:59.440Z] npm ERR! notarget No matching version found for elastic-apm-node@3.34.0.
[2022-05-25T21:23:59.440Z] npm ERR! notarget In most cases you or one of your dependencies are requesting
[2022-05-25T21:23:59.440Z] npm ERR! notarget a package version that doesn't exist.

Theoretically this option would be straightforward to implement, but what should that timeout be? Granted the issue is old (from 2018) but user reports from https://github.com/npm/npm/issues/20574 suggest that the time for all npm servers to update could be an hour or more. That’s too long to have as a timeout in a release process.

Option 3: use dependabot to update opbeans

Configure dependabot to look for an agent update daily.

Some issues with this:

  • The current “bump-version.sh” script also updates a label in the repo’s Dockerfile, which dependabot will not update. So either we drop using that label, or an option would be to have a separate lint GitHub check that fails the dependabot PR until it is manually updated to tweak the Dockerfile as well. This is pretty indirect and laborious.
  • There is no way to have this process create a git tag on the opbeans repo, which the current process is currently doing. I am not sure those git tags are being used. They do result in tagged builds of the opbeans Docker image builds (see https://hub.docker.com/r/opbeans/opbeans-node/tags). However, I’m not sure if anyone uses anything but the “latest” of those docker images.

Option 4: use a Jenkins pipeline in the opbeans repo

Add a stage to the Jenkinsfile in the opbeans repo(s) on a cron(@daily) to look for a new agent version, then do the update, commit, and tag.

I don’t see any issues with this approach other than:

  • It means that a new opbeans update (and Docker image build) will take up to a day after an agent release.
  • It will take some dev effort to make this work.

This is my current preferred option.

@elastic/observablt-robots @astorm Thoughts?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
trentmcommented, Jun 8, 2022

@cachedout Thanks and understood. I’ll take a stab at it and get review from y’all.

As a sanity check, my plan is to add an optional stage('Update Agent Dep') { to opbeansPipeline here: https://github.com/elastic/apm-pipeline-library/blob/main/vars/opbeansPipeline.groovy#L193 that will handle updating the APM agent dep if there is a new one available. It will be off by default so the opbeans-FOO.git repos that are using opbeansPipeline() can opt into it. It will expect a new .ci/avail-agent-update-ver.sh script (beside the existing .ci/bump-version.sh script) in each opbeans repo that will use it. Please let me know if this sounds crazy. 😃

1reaction
Mpdreamzcommented, Jun 2, 2022

I’d personally prefer Option 4 as well.

Opbeans is not a public artifact that is tied to this repository. It should not influence our ability to execute the release of the agent IMO. Moving the opbeans update completely out of band seems appropriate.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deployment pipelines troubleshooting - Power BI
After a deployment fails due to schema changes, the target stage displays the Deployment failed message, followed by the Show details link. The...
Read more >
Automate agent update in opbeans-dotnet #1876 - GitHub
... packages get indexed, so we can't run the script immediately after the release - that'd fail as described in Remove Opbeans stage...
Read more >
White page when accessing a release pipeline
We are getting a white page when accessing our release pipelines. When we go to directly to the release link, we are only...
Read more >
Restarting failed build or release pipeline jobs and stages
Rerun or redeploy Azure DevOps build, release changes, or pipelines that are failed or canceled in that stage or pipeline. The reattempts display...
Read more >
Azure Release Pipeline after Build specific stage
The problem is, that sometimes the additional stages are failing, causing the Release pipeline not to be trigger, but the original initial ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found