Unreachable Secrets Backend Causes Web Server Crash
See original GitHub issueApache Airflow version:
1.10.12
Kubernetes version (if you are using kubernetes) (use kubectl version
):
n/a
Environment:
-
Cloud provider or hardware configuration: Amazon MWAA
-
OS (e.g. from /etc/os-release): Amazon Linux (latest)
-
Kernel (e.g.
uname -a
): n/a -
Install tools: n/a
What happened:
If an unreachable secrets.backend is specified in airflow.cfg the web server crashes
What you expected to happen:
An invalid secrets backend should be ignored with a warning, and the system should default back to the metadatabase secrets
How to reproduce it:
In an environment without access to AWS Secrets Manager, add the following to your airflow.cfg:
[secrets]
backend = airflow.contrib.secrets.aws_secrets_manager.SecretsManagerBackend
or an environment without access to SSM specifiy:
[secrets]
backend = airflow.contrib.secrets.aws_systems_manager.SystemsManagerParameterStoreBackend
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:17 (17 by maintainers)
Top Results From Across the Web
[GitHub] [airflow] potiuk commented on issue #14592: Unreachable ...
If we treat the secrets backend as meta-DB like quintessential, ... the metadata URL - which makes webserver crash > The problem is...
Read more >[GitHub] [airflow] fhoda commented on issue #14592: Unreachable ...
[GitHub] [airflow] fhoda commented on issue #14592: Unreachable Secrets Backend Causes Web Server Crash. Posted to commits@airflow.apache.org.
Read more >How To Fix The Web: Obscure Back-End Techniques And ...
If port 80 is down, there's a good chance that the control panel won't be available either. You will need to log in...
Read more >Troubleshooting Vault - HashiCorp Developer
Vault has two types of logs - Vault server operational logs and audit logs. ... and root cause of Vault failure may be...
Read more >How to use airflow secret backend with aws secret manager
cfg. Also, i have added role to ec2 server which has secret manager read/write access but still it is not taking value from...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As of now it seems the expected behavior is not what is happening and is inconsistent across different secret backends.
I have tried to reproduce this issue with Airflow 2.0 (main branch) and am not able to do so for any AWS secrets backends. I was only able to reproduce a crashing webserver for GCP Secret Manager and not any other secrets backend.
The GCP Secret Manager error seems more to do with the function to get the credentials and not the actual connection.
I used the
airflow.providers.*
secrets packages for each. I noticed that the original post on the issue uses thecontrib
package and Airflow1.10.12
.Here are my findings:
I believe we should evaluate what the expected behavior should be as compared to what is actually happening.
Also after discussing with @kaxil there may be a middle ground for fail over implementation that could make sense here.
Agree we have consistency issue here - Interestingly, the AWS secret manager crashed originally for @subashcanapathy and @john-jac but did not crash for you @fhoda. Not sure what the reason is for that (maybe the 1.10 vs 2.* behavioral difference)?
I really like the idea of different behavior for different type of access. I think it answers my concerns perfectly and what it really boils down to is “who” is the “client” - whether it is “airflow” or the “DAG/task writer”.
I think the main difference of configuration vs. variables and connections is that Airflow has default values for most of the configurations and when they are not found, they will fall-back to the default values - which might alter behavior of airflow. So lack of secrets backend when it is configured and configuration is retrieved is very dangerous. And since it is accessed under-the hood by Airflow, without the “dag” or “task” using it, it’s airflow that is the “client” and it’s airflow that should handle it (and crashing is the only reasonable behavior IMHO). Simply “dag writer” is not in a control to make any decision here.
This is (as you rightfully noticed), far less of a concern for connections and variables - “clients” for those are “dag writers”. Whoever uses them should be prepared for what happens when the secret backend is missing. Either the “writers” will prepare fallback values for those in the DB or they will have to handle “missing” value somehow (and this is up to the ‘user’ what to do in this case). But they are in full control, there is no need to crash Airflow (yet! - until configuration is not accessed by Airflow itself).
Reopening it as it might actually be an actionable item to do 😃
@subashcanapathy , @john-jac - would that be a reasonable approach for you as well ?