question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Canary SMI strategy generates invalid TrafficSplit resource

See original GitHub issue

When the task is used to do a canary deployment, it makes use of SMI (service mesh interface) for doing the TrafficSplit to manage which traffic goes to the stable vs. canary version of the service. As part of that, it automatically goes to get the SMI custom resource version currently deployed so it can generate its manifest.

The manifest it generates uses XXXXm format weights. That is, you might get a manifest like this:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: my-service-azure-pipelines-rollout
  namespace: my-ns
spec:
  backends:
  - service: my-service-stable
    weight: 1000m
  - service: my-service-baseline
    weight: 0m
  - service: my-service-canary
    weight: 0m
  service: my-service

Unfortunately, while the XXXXm format is noted in the very first version of the TrafficSplit spec, as of version 2 of the spec it was removed. The official “SMI SDK for Go” has a detailed custom resource definition and it validates the weight as a number. Further, the current SMI adapter for Istio uses that SDK so it’s entirely failing to read and validate TrafficSplit resources generated during canary.

The simplest solution is to stop post-fixing m on the weights. The weights being whole/relative numbers or percentages is compatible with all versions of the spec. A correct TrafficSplit should look like this:

apiVersion: split.smi-spec.io/v1alpha2
kind: TrafficSplit
metadata:
  name: my-service-azure-pipelines-rollout
  namespace: my-ns
spec:
  backends:
  - service: my-service-stable
    weight: 1000
  - service: my-service-baseline
    weight: 0
  - service: my-service-canary
    weight: 0
  service: my-service

The SMI Adapter for Istio generates logs like this to reflect that issue:

E0902 14:45:23.035319       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1alpha2.TrafficSplit: v1alpha2.TrafficSplitList.Items: []v1alpha2.TrafficSplit: v1alpha2.TrafficSplit.Spec: v1alpha2.TrafficSplitSpec.Backends: []v1alpha2.TrafficSplitBackend: v1alpha2.TrafficSplitBackend.Weight: readUint64: unexpected character: �, error found in #10 byte of ...|"weight":"1000m"},{"|..., bigger context ...|":[{"service":"accounts-service-stable","weight":"1000m"},{"service":"accounts-service-baseline","we|...
E0902 14:45:24.038301       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1alpha2.TrafficSplit: v1alpha2.TrafficSplitList.Items: []v1alpha2.TrafficSplit: v1alpha2.TrafficSplit.Spec: v1alpha2.TrafficSplitSpec.Backends: []v1alpha2.TrafficSplitBackend: v1alpha2.TrafficSplitBackend.Weight: readUint64: unexpected character: �, error found in #10 byte of ...|"weight":"1000m"},{"|..., bigger context ...|":[{"service":"products-service-stable","weight":"1000m"},{"service":"products-service-baseline","we|...
E0902 14:45:25.042071       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1alpha2.TrafficSplit: v1alpha2.TrafficSplitList.Items: []v1alpha2.TrafficSplit: v1alpha2.TrafficSplit.Spec: v1alpha2.TrafficSplitSpec.Backends: []v1alpha2.TrafficSplitBackend: v1alpha2.TrafficSplitBackend.Weight: readUint64: unexpected character: �, error found in #10 byte of ...|"weight":"1000m"},{"|..., bigger context ...|":[{"service":"accounts-service-stable","weight":"1000m"},{"service":"accounts-service-baseline","we|...

I’m guessing this logic came from the original KubernetesManifest@V0 Azure DevOps task, which is actually where I discovered it. I’ve filed a corresponding issue there. We’re on AzDO right now but will shortly be moving to GitHub Actions (a few months?) and it’d be cool to see it fixed in both places.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:6

github_iconTop GitHub Comments

1reaction
OliverMKingcommented, Dec 1, 2021

Created a new release v1.5 fixing this.

0reactions
github-actions[bot]commented, Oct 23, 2021

This issue is idle because it has been open for 14 days with no activity.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deployments using Traffic Splitting | NGINX Service Mesh
You can use traffic splitting for most deployment scenarios, including canary, blue-green, A/B testing, and so on. The ability to control traffic flow...
Read more >
Canary Rollouts using SMI Traffic Split | Open Service Mesh
Demo · Enable permissive mode · Create an SMI TrafficSplit resource that directs all traffic to the httpbin-v1 service. · Perform the canary...
Read more >
Progressive Canary Deployments in Kubernetes - Armory.io
Progressive Canary with SMI Traffic Split ... Armory CD-as-a-Service uses SMI's TrafficSplit resource to implement progressive canary deployments.
Read more >
Canary deployments with a service mesh traffic split
This module shows how to use the Linkerd traffic split functionality to do a trial (canary) deployment of an update, and also to...
Read more >
Filtering invalid traffic to ensure quality - Display & Video 360 ...
To help further minimize your risk, we have integrated with HUMAN, which serves as an extra safety check for our invalid traffic defenses....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found