Kubernetes config dags_volume_subpath breaks PVC in helm chart
See original GitHub issueApache Airflow version: 1.10.12, master
**Kubernetes version: v1.17.9-eks-4c6976 (server)/ v.1.18.6 (client)
Environment:
- Cloud provider or hardware configuration: EKS
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
What happened:
Current logic of setting dags_volume_subpath
is broken for the following use case:
Dag loaded from PVC but gitSync disabled.
I am using the chart from apache/airflow master thusly:
helm install airflow chart --namespace airflow-dev \
--set dags.persistence.enabled=true \
--set dags.persistence.existingClaim=airflow-dag-pvc \
--set dags.gitSync.enabled=false
For the longest time, even with a vanilla install, the workers kept dying. Tailing the logs clued me in that workers were not able to find the dag. I verified from the scheduler that dags were present (it showed up in the UI etc)
Further debugging (looking at the worker pod config) was the main clue … here is the volumemount
- mountPath: /opt/airflow/dags
name: airflow-dags
readOnly: true
subPath: repo/tests/dags
Why/who would add repo/tests/dags
as a subpath?? 🤦♂️
Finally found the problem logic here: https://github.com/apache/airflow/blob/9b2efc6dcc298e3df4d1365fe809ea1dc0697b3b/chart/values.yaml#L556
Note the implied connection between dags.persistence.enabled
and dags.gitSync
! This looks like some leftover code from when gitsync and external dag went hand in hand.
Ideally, an user should be able to use a PVC without using git sync
What you expected to happen: I should be able to use a PVC without gitsync logic messing up my mount path
See above.
How to reproduce it:
I am kinda surprised that this has not bitten anyone yet. I’d like to think my example is essentially a hello world
of helm chart with an dag from a PVC.
Anything else we need to know:
This is the patch that worked for me. Seems fairly reasonable – only muck with dags_volume_subpath if gitSync is enabled.
Even better would be to audit other code and clearly separate out the different competing use case
- use PVC but no gitsync
- use PVC with gitsync
- use gitsync without PVC
diff --git a/chart/values.yaml b/chart/values.yaml
index 00832b435..8f6506cd4 100644
--- a/chart/values.yaml
+++ b/chart/values.yaml
@@ -550,10 +550,10 @@ config:
delete_worker_pods: 'True'
run_as_user: '{{ .Values.uid }}'
fs_group: '{{ .Values.gid }}'
dags_volume_claim: '{{- if .Values.dags.persistence.enabled }}{{ include "airflow_dags_volume_claim" . }}{{ end }}'
- dags_volume_subpath: '{{- if .Values.dags.persistence.enabled }}{{.Values.dags.gitSync.dest }}/{{ .Values.dags.gitSync.subPath }}{{ end }}'
+ dags_volume_subpath: '{{- if .Values.dags.gitSync.enabled }}{{.Values.dags.gitSync.dest }}/{{ .Values.dags.gitSync.subPath }}{{ end }}'
git_repo: '{{- if and .Values.dags.gitSync.enabled (not .Values.dags.persistence.enabled) }}{{ .Values.dags.gitSync.repo }}{{ end }}'
git_branch: '{{ .Values.dags.gitSync.branch }}'
git_sync_rev: '{{ .Values.dags.gitSync.rev }}'
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
facing the same issue
@dimberman on master in files/pod-template-file.yaml this problematic piece of code still exists:
I’m facing the issue with using the default repo configurations and just turning on the git-sync as mentioned here.