helm upgrade fails when trying to resize PVC
See original GitHub issueChart version: 6.8.13 Kubernetes version: v1.16.13-gke.401 Kubernetes provider: E.g. GKE (Google Kubernetes Engine) GKE Helm Version: v2.16.1
helm get release
output
e.g. helm get elasticsearch
(replace elasticsearch
with the name of your helm release)
Be careful to obfuscate every secrets (credentials, token, public IP, …) that could be visible in the output before copy-pasting.
If you find some secrets in plain text in helm get release
output you should use Kubernetes Secrets to managed them is a secure way (see Security Example).
Output of helm get release
REVISION: 20
RELEASED: Wed Nov 11 11:30:44 2020
CHART: elasticsearch-6.8.13
USER-SUPPLIED VALUES:
esJavaOpts: -Xmx5g -Xms5g
minimumMasterNodes: 5
replicas: 8
resources:
limits:
cpu: 1600m
memory: 9Gi
requests:
cpu: 1200m
memory: 8Gi
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
COMPUTED VALUES:
antiAffinity: hard
antiAffinityTopologyKey: kubernetes.io/hostname
clusterHealthCheckParams: wait_for_status=green&timeout=1s
clusterName: elasticsearch
enableServiceLinks: true
envFrom: []
esConfig: {}
esJavaOpts: -Xmx5g -Xms5g
esMajorVersion: ""
extraContainers: []
extraEnvs: []
extraInitContainers: []
extraVolumeMounts: []
extraVolumes: []
fsGroup: ""
fullnameOverride: ""
httpPort: 9200
image: docker.elastic.co/elasticsearch/elasticsearch
imagePullPolicy: IfNotPresent
imagePullSecrets: []
imageTag: 6.8.13
ingress:
annotations: {}
enabled: false
hosts:
- chart-example.local
path: /
tls: []
initResources: {}
keystore: []
labels: {}
lifecycle: {}
masterService: ""
masterTerminationFix: false
maxUnavailable: 1
minimumMasterNodes: 5
nameOverride: ""
networkHost: 0.0.0.0
nodeAffinity: {}
nodeGroup: master
nodeSelector: {}
persistence:
annotations: {}
enabled: true
labels:
enabled: false
podAnnotations: {}
podManagementPolicy: Parallel
podSecurityContext:
fsGroup: 1000
runAsUser: 1000
podSecurityPolicy:
create: false
name: ""
spec:
fsGroup:
rule: RunAsAny
privileged: true
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- secret
- configMap
- persistentVolumeClaim
priorityClassName: ""
protocol: http
rbac:
create: false
serviceAccountAnnotations: {}
serviceAccountName: ""
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
replicas: 8
resources:
limits:
cpu: 1600m
memory: 9Gi
requests:
cpu: 1200m
memory: 8Gi
roles:
data: "true"
ingest: "true"
master: "true"
schedulerName: ""
secretMounts: []
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
service:
annotations: {}
externalTrafficPolicy: ""
httpPortName: http
labels: {}
labelsHeadless: {}
loadBalancerIP: ""
loadBalancerSourceRanges: []
nodePort: ""
transportPortName: transport
type: ClusterIP
sidecarResources: {}
sysctlInitContainer:
enabled: true
sysctlVmMaxMapCount: 262144
terminationGracePeriod: 120
tolerations: []
transportPort: 9300
updateStrategy: RollingUpdate
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
HOOKS:
---
# elasticsearch-prod-syael-test
apiVersion: v1
kind: Pod
metadata:
name: "elasticsearch-prod-syael-test"
annotations:
"helm.sh/hook": test-success
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
containers:
- name: "elasticsearch-prod-guqjb-test"
image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.13"
imagePullPolicy: "IfNotPresent"
command:
- "sh"
- "-c"
- |
#!/usr/bin/env bash -e
curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s'
restartPolicy: Never
MANIFEST:
---
# Source: elasticsearch/templates/poddisruptionbudget.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: "elasticsearch-master-pdb"
spec:
maxUnavailable: 1
selector:
matchLabels:
app: "elasticsearch-master"
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-master-headless
labels:
heritage: "Tiller"
release: "elasticsearch-prod"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
spec:
clusterIP: None # This is needed for statefulset hostnames like elasticsearch-0 to resolve
# Create endpoints also if the related pod isn't ready
publishNotReadyAddresses: true
selector:
app: "elasticsearch-master"
ports:
- name: http
port: 9200
- name: transport
port: 9300
---
# Source: elasticsearch/templates/service.yaml
kind: Service
apiVersion: v1
metadata:
name: elasticsearch-master
labels:
heritage: "Tiller"
release: "elasticsearch-prod"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
{}
spec:
type: ClusterIP
selector:
release: "elasticsearch-prod"
chart: "elasticsearch"
app: "elasticsearch-master"
ports:
- name: http
protocol: TCP
port: 9200
- name: transport
protocol: TCP
port: 9300
---
# Source: elasticsearch/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-master
labels:
heritage: "Tiller"
release: "elasticsearch-prod"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
esMajorVersion: "6"
spec:
serviceName: elasticsearch-master-headless
selector:
matchLabels:
app: "elasticsearch-master"
replicas: 8
podManagementPolicy: Parallel
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: elasticsearch-master
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
template:
metadata:
name: "elasticsearch-master"
labels:
heritage: "Tiller"
release: "elasticsearch-prod"
chart: "elasticsearch"
app: "elasticsearch-master"
annotations:
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "elasticsearch-master"
topologyKey: kubernetes.io/hostname
terminationGracePeriodSeconds: 120
volumes:
enableServiceLinks: true
initContainers:
- name: configure-sysctl
securityContext:
runAsUser: 0
privileged: true
image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.13"
imagePullPolicy: "IfNotPresent"
command: ["sysctl", "-w", "vm.max_map_count=262144"]
resources:
{}
containers:
- name: "elasticsearch"
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.13"
imagePullPolicy: "IfNotPresent"
readinessProbe:
exec:
command:
- sh
- -c
- |
#!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file
# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no
http () {
local path="${1}"
local args="${2}"
set -- -XGET -s
if [ "$args" != "" ]; then
set -- "$@" $args
fi
if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
fi
curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running, lets check the node is healthy'
HTTP_CODE=$(http "/" "-w %{http_code}")
RC=$?
if [[ ${RC} -ne 0 ]]; then
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
exit ${RC}
fi
# ready if HTTP code 200, 503 is tolerable if ES version is 6.x
if [[ ${HTTP_CODE} == "200" ]]; then
exit 0
elif [[ ${HTTP_CODE} == "503" && "6" == "6" ]]; then
exit 0
else
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
exit 1
fi
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
exit 1
fi
fi
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 3
timeoutSeconds: 5
ports:
- name: http
containerPort: 9200
- name: transport
containerPort: 9300
resources:
limits:
cpu: 1600m
memory: 9Gi
requests:
cpu: 1200m
memory: 8Gi
env:
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.zen.minimum_master_nodes
value: "5"
- name: discovery.zen.ping.unicast.hosts
value: "elasticsearch-master-headless"
- name: cluster.name
value: "elasticsearch"
- name: network.host
value: "0.0.0.0"
- name: ES_JAVA_OPTS
value: "-Xmx5g -Xms5g"
- name: node.data
value: "true"
- name: node.ingest
value: "true"
- name: node.master
value: "true"
volumeMounts:
- name: "elasticsearch-master"
mountPath: /usr/share/elasticsearch/data
Describe the bug: Error: UPGRADE FAILED: StatefulSet.apps “elasticsearch-master” is invalid: spec: Forbidden: updates to statefulset spec for fields other than ‘replicas’, ‘template’, and ‘updateStrategy’ are forbidden Steps to reproduce:
- Freshly deployed helm release
- Increase PV size from 30Gi to e.g 50Gi in values.yaml (when working with helmfile)
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 50Gi
- helmfile apply
Expected behavior: Upgrade succeeded and PV size will be automatically increased. Provide logs and/or server output (if relevant):
Be careful to obfuscate every secrets (credentials, token, public IP, …) that could be visible in the output before copy-pasting
Any additional context: There have been some other issues about where it is mentioned that the label in statefulset is the problem. I read that to get PV automatically resizing with Statefulset work the Elasticsearch must be split into multi Helm releases. Questions:
- If that is the problem what is changed in the Statefulset which cannot be changed? Is it template?
- What is the reason behind? Why multiple releases would solve it?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Hi @jmlrt , Ah ok thanks for clearing that! I will further reading in that and thanks for providing the workaround. I will write that down if we have such an issue in productive systems.
So I guess we can close that ticket then.
You are right. Didn’t remember the operator…