Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Metrics dependency on deployment patterns

See original GitHub issue

The statistics works as it should if each merge to main branch is deployed to production separately. If there are more merges to main branch before the deploy to production, either the lead time or the deployment frequency will be incorrect.

Option 1: Create deployment event for each merge to main branch. Will include all commits in lead time, but will generate higher number of deployment events. Option 2: Create one deployment event for last merge to main branch. Will only include commits for last merge in lead time calculation, but will generate correct number of deployment events.

Example to hopefully make it more clear (using option 2): Create branch a, make 2 commits, merge in to master. Create branch b, make 2 commits, merge to master.

Git log afterwards:

git log --decorate=no --date-order --reverse --pretty=oneline

6bd52a46ab919384b043a40750e0aed8e0e0d43b Branch a, commit 1
5a93561896c7c04758df9fe05eaaa4f7154e53f6 Branch a, commit 2
807b8acfc1007e544941944df182d10e6f9f52fd Merge pull request #3 from org-name/branch-a
964198b26178b4203f14a77c77080af70a445750 B - 1
bb13c56e5e9b214682c28647adc787ff061183e2 B - 2
21e706a11d41e467fe055dbdb5fe21a609427a20 Merge pull request #4 from org-name/branch-b

Create deployment with GitHub API for last merge:

curl -u $GITHUB_USER:$GITHUB_TOKEN \
  -X POST \
  -H "Accept: application/vnd.github.v3+json" \
  https://$HOSTNAME/api/v3/repos/org-name/test-four-keys/deployments \
  -d '{"ref":"21e706a11d41e467fe055dbdb5fe21a609427a20"}'

curl -u $GITHUB_USER:$GITHUB_TOKEN \
  -X POST \
  -H "Accept: application/vnd.github.v3+json" \
  https://$HOSTNAME/api/v3/repos/org-name/test-four-keys/deployments/8/statuses \
  -d '{"state":"success"}'

Resulting BigQuery contents in deployments table:

  {
    "source": "github",
    "deploy_id": "8",
    "time_created": "2020-12-09 11:44:48 UTC",
    "repository": "org-name/test-four-keys",
    "changes": [
      "21e706a11d41e467fe055dbdb5fe21a609427a20",
      "964198b26178b4203f14a77c77080af70a445750",
      "bb13c56e5e9b214682c28647adc787ff061183e2"
    ]
  }

Note that the commits related to branch a are not included.

In this case only the changes on branch b will be included in the lead time dashboard, but deployment frequency will show 1 deployment. If another deployment event had been created for merge commit of branch a, the frequency would be too high.

Issue Analytics

State:
Created 3 years ago
Reactions:3
Comments:6 (4 by maintainers)

Top GitHub Comments

3reactions

dinagravescommented, Mar 29, 2021

Thank you Steve! Please feel free to join our fourkeys channel on the GCP slack: https://join.slack.com/t/googlecloud-community/shared_invite/zt-m973j990-IMij2Xh8qKPu7SaHfOcCFg. I’m also doing a panel on Wednesday if you’d like to join! https://www.crowdcast.io/e/356fkadm/register

Whenever we can, we do like to track the first commit because we find that the longer a commit stays in the deployment pipeline, the more likely we are to see incidents and bugs show up. Also, first commit to prod is a predictive measurement for higher job satisfaction, lower burnout, etc.

But you’re absolutely right that deployment frequency will capture this batching behavior! This is part of the reason why we have 4 metrics!

“If we capture that time, there will be a lot of variance and the trend will not match capabilities as much as ways of working.” I agree and this is why we use medians, to better handle the variance! This is also why I like to use every commit to deploy, not just the first commit of a PR. If we look at the example below, if we use first commits only, the lead time appears higher than if we look at all commits. Measuring lead time in this way is more resistant to anomalies and outliers.

However, not everyone agrees with this method. I know many teams that prefer first commit and first commit only. One PM expressed that he wanted to see the first commit method b/c it better captured how the developers were working, which was important to him. Why did the developer in B make one change and then do no more work on it for 25 days? How can we improve our prioritization so that this doesn’t happen again? Or maybe it’s a design or architecture problem? If we consider that the our goal is to better serve our customers, then having this insight into the way our developers is working is very useful.

All that being said, if you are using these metrics to improve your own team’s performance, the most important thing is that you consistently use the same metrics definitions and methodology. I like to compare this to using the bathroom scale vs the scale at the gym! It doesn’t really matter if my bathroom scale is 5 lbs off, as long as it’s the scale I use to measure my improvements – it’ll still be relatively correct. But if I go to the gym and compare that to my bathroom scale, then we have a problem!

I would be a little careful about the pre-prod environment argument. Obviously Dr. Forsgren is correct and it’s important to acknowledge that one cannot always be deploying to certain environments (eg app stores), in which case it is completely acceptable to use a pre-prod environment. However, we want to be very mindful that we don’t use this one example to give ourselves the leeway to count our dev or QA environments as “prod.”

I think the important thing is to remember that the goal is to improve performance, not to improve metrics. The metrics are like a diagnostic tool – they help us understand where we can improve and how far we’ve come. When we focus just on improving metrics, it can be very tempting to redefine the “prod” environment and “lead time” in ways that artificially inflate our numbers. If we hide our flaws in this way, we miss the opportunity to improve.

But again! If you define your “deployments”, “changes”, and “incidents” in a way that feels best aligned to your business and your operations, if you use that as your scale consistently, then you’re 99% of the way there, and these little details are just academic.

2reactions

StevePJ-Sainsburyscommented, Mar 25, 2021

Interesting! We have been talking about this recently as we do not do continuous delivery, and still very not sure if we should be including each merge in the lead time/deployment metrics. At the moment we think the existing behaviour of only including branch b works best, but very happy to be corrected!

Our thoughts at the moment are that this is batching, which is what the deployment frequency is capturing i.e. lots of changes but one prod deployment means deployment frequency is lower. So only including batch of a + b as 1 deployment works well.

In terms of lead time, think it’s more about interpretation of ‘time since first commit’ and ‘time to prod’. From accelerate, interpret it as measuring the predictable delivery process, where we can milk efficiency with better automation. For us, if we measure from merging b to the time merge_b is in prod, we are measuring our capability to deliver changes to customers. It’s the length of a pipeline or the time it takes for manual testing if stuck in a UAT environment. We may lose the time taken for merge_a to reach prod, but that is variable. It could be because a bug became apparent in E2E tests requiring a patch or because a feature isn’t complete but a pair want their changes on the trunk to avoid merge conflicts. If we capture that time, there will be a lot of variance and the trend will not match capabilities as much as ways of working. If we find ourselves batching a lot, then this will come up in the deployment frequency metric. If changes gets blocked a lot (and doesn’t require a patch or isn’t part of a batch) then it will flag in our lead time metric.

There is a talk with Nicole Gorsgren where this is touched on, and she even argues lead time could be the time from commit to preprod, when batching is desired.

It’s very interesting how much variance there is online about when to include something as a deployment or what to measure for lead time, adds a whole other layer of complexity when trying to generalise it

Top Results From Across the Web

Deploymeny Frequency: A Metric to Measure DevOps - Waydev

How to measure this DevOps metric and how it fits into the bigger ... Everything you need to know about Deployment Frequency and...

Risk tolerance metrics and indicators in the Deployment ...

Metrics. Deployment acceleration focuses on risks related to how cloud ... Dependencies between resources magnify the importance of ensuring ...

Pitfalls and Patterns in Microservice Dependency Management

When managing dependencies for distributed microservices, you must consider different types of growth when evolving the product, ...

4 Important Metrics for Continuous Delivery| GoCD Blog

Here are some important meaningful metrics to measure the effectiveness of your continuous deployment process.

Deployment strategies for OpenTelemetry - Lightstep

When we are searching through all of this data, we are often looking for correlations. What shifts in metrics, traffic patterns, logs messages, ......