Add Istio to our nightly tests
See original GitHub issueThis is mostly related to #239 but I wanted to create a separate issue for istio in general due to the severity of problems I ran into and discuss whether there are other issues to create out of this.
In general I think istio is an excellent example of a non-trivial k8s application that pulumi should adopt as part of their test suite. Why? Istio uses helm, istio deploys more than 100 resources, it uses a lot of advanced resource types, like CRDs etc.
Currently the experience when attempting to deploying istio via pulumi is very bad. This is the code that can be used for testing:
helm repo add istio https://istio.io/charts
- deploy:
import { helm, core } from '@pulumi/kubernetes';
const appName = 'istio';
const namespaceName = `${appName}-system`;
const namespace = new core.v1.Namespace(namespaceName, {
metadata: { name: namespaceName }
});
const chart = new helm.v2.Chart(
appName,
{
repo: 'istio',
chart: appName,
namespace: namespaceName,
version: '1.0.1',
// for all options check https://github.com/istio/istio/tree/master/install/kubernetes/helm/istio
values: {
kiali: { enabled: true}
}
},
{ dependsOn: [namespace] }
);
This is the list of issues I ran into - in order:
1. getting the chart involves manual steps
I found no way to install a chart from local. Then I found istio actually has a not documented custom helm repo endpoint but using this in pulumi requires a manual helm repo add
step. Those 2 issues are already addressed in #229 and #238.
2. Pulumi crashes because of duplicate yaml key in kiali definitions with:
Previewing changes:
Type Name Plan Info
* pulumi:pulumi:Stack api-cluster-dev-b-api-cluster-dev-b no change 2 errors
~ └─ kubernetes:helm.sh:Chart istio update changes: ~ values
Diagnostics:
pulumi:pulumi:Stack: api-cluster-dev-b-api-cluster-dev-b
error: YAMLException: duplicated mapping key at line 2450, column 5:
name: http-kiali
^
error: an unhandled error occurred: Program exited with non-zero exit code: 1
error: an error occurred while advancing the preview
This is because kiali has duplicated the name
key in its service definition here https://github.com/istio/istio/blob/369bf50f45d9e2748b59726a466b840312633f2b/install/kubernetes/helm/istio/charts/kiali/templates/service.yaml#L10-L13 . This mistake was already fixed on master
of istio, but the latest official chart still contains this duplicate key. Now the thing is why I’m mentioning is that helm itself can deal with those duplicate keys and simply takes the value from the last key (similar to JSON). A short google for the documentation of the js-yaml
library that is used by pulumi reveals that this behaviour can be enabled via the json
param on the safeLoad
function: https://github.com/nodeca/js-yaml#safeload-string---options- . It is certainly debatable whether pulumi should allow duplicate keys in yamls, but since helm itself allows it I wanted to at least bring it up for discussion. There seem to be also a bunch of other projects which intentionally allow duplicate keys in yaml for compatibility reasons with earlier yaml specs etc.
3. Resources are not deployed in the right order and pulumi fails early when one resource failed to deploy
This is basically #239 , however there are a few additional adverse behaviors that surfaced in attempting to workaround this. In general for the istio deployment to succeed the CRDs and the Deployments have to be deployed first. When running pulumi up
pulumi currently by default starts 10 concurrent resource deployments and once one of them fails it stops spawning further resource deployments. For istio that means it takes a lot of pulumi up
“attempts” until all of the CRDs and Deployments randomly end up in the resource deployment queue. A workaround which actually helped to reduce the number of necessary pulumi up
attempts was to increase the concurrency by using pulumi up -p 50
which results in pulumi starting 50 concurrent deployments right from the start and which increases the likelihood for the CRDs and Deployment resources to get picked up. Based on this workaround I was wondering if pulumi could change or introduce a param which controls the behaviour when pulumi is going to stop an update and considers it as failure. In case of istio I would like pulumi to continue deploying resources even if the ones that have been picked up first have failed. I’m not 100% sure if it’s clear enough of what I’m suggesting from the description above, so please let me know if this needs further clarification.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:11 (11 by maintainers)
I have updated the title. We should not only fix the underlying set of issues, but also aim to close out M19 with Istio actually harnessed and running in our nightly tests, so that we lock in the goodness.
Thanks for opening the issue, 100% agree, we should fix this stuff. Istio has been on our radar for awhile anyway, and (1) and (3) are problems we’ve encountered from another big, important, complex chart as well (
kube-prometheus
), so we are actively discussing fixes for those.(2) is one I’ve not heard before; if Helm does it we should probably support it too, even though I think this is not a great decision individually. 😃
Even though this issue is really a meta-issue, I’m going to keep it open for now, because I suspect that once we get through these issues, we’ll find even more. My experience is that large Helm charts usually have subtle and unnoticed bugs that need to be fixed, and our pattern has been to contribute fixes upstream, and I expect we’ll likely do that here, too.