question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unreachable Secrets Backend Causes Web Server Crash

See original GitHub issue

Apache Airflow version:

1.10.12

Kubernetes version (if you are using kubernetes) (use kubectl version):

n/a

Environment:

  • Cloud provider or hardware configuration: Amazon MWAA

  • OS (e.g. from /etc/os-release): Amazon Linux (latest)

  • Kernel (e.g. uname -a): n/a

  • Install tools: n/a

What happened:

If an unreachable secrets.backend is specified in airflow.cfg the web server crashes

What you expected to happen:

An invalid secrets backend should be ignored with a warning, and the system should default back to the metadatabase secrets

How to reproduce it:

In an environment without access to AWS Secrets Manager, add the following to your airflow.cfg:

[secrets]
backend = airflow.contrib.secrets.aws_secrets_manager.SecretsManagerBackend

or an environment without access to SSM specifiy:

[secrets]
backend = airflow.contrib.secrets.aws_systems_manager.SystemsManagerParameterStoreBackend

Reference: https://airflow.apache.org/docs/apache-airflow/1.10.12/howto/use-alternative-secrets-backend.html#aws-ssm-parameter-store-secrets-backend

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:17 (17 by maintainers)

github_iconTop GitHub Comments

2reactions
fhodacommented, Jun 7, 2021

As of now it seems the expected behavior is not what is happening and is inconsistent across different secret backends.

I have tried to reproduce this issue with Airflow 2.0 (main branch) and am not able to do so for any AWS secrets backends. I was only able to reproduce a crashing webserver for GCP Secret Manager and not any other secrets backend.

The GCP Secret Manager error seems more to do with the function to get the credentials and not the actual connection.

google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authenticati│google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentica
on/getting-started    

I used the airflow.providers.* secrets packages for each. I noticed that the original post on the issue uses the contrib package and Airflow 1.10.12.

#export AIRFLOW__SECRETS__BACKEND=airflow.providers.hashicorp.secrets.vault.VaultBackend
#export AIRFLOW__SECRETS__BACKEND_KWARGS='{"connections_path": "airflow/connections", "variables_path": "airflow/variables", "config_path": "airflow/config", "url": "http://127.0.0.1:8200", "token": "$VAULT_TOKEN"}'


export AIRFLOW__SECRETS__BACKEND=airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
export AIRFLOW__SECRETS__BACKEND_KWARGS='{"connections_prefix": "airflow-connections", "variables_prefix": "airflow-variables", "gcp_keyfile_dict": $GCP_SECRET_MANAGER_SA_KEY}'


#export AIRFLOW__SECRETS__BACKEND=airflow.providers.microsoft.azure.secrets.azure_key_vault.AzureKeyVaultBackend
#export AIRFLOW__SECRETS__BACKEND_KWARGS='{"connections_prefix": "airflow-connections", "variables_prefix": null, "vault_url": "https://example-akv-resource-name.vault.azure.net/"}'


#export AIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.secrets_manager.SecretsManagerBackend
#export AIRFLOW__SECRETS__BACKEND_KWARGS='{"connections_prefix": "airflow/connections", "variables_prefix": "airflow/variables", "profile_name": "default"}'



#export AIRFLOW__SECRETS__BACKEND=airflow.providers.amazon.aws.secrets.systems_manager.SystemsManagerParameterStoreBackend
#export AIRFLOW__SECRETS__BACKEND_KWARGS='{"connections_prefix": "/airflow/connections", "variables_prefix": "/airflow/variables", "profile_name": "default"}'

Here are my findings:

  • AWS Secret Manager - No Crash
  • AWS SSM - No Crash
  • Vault - No Crash
  • Azure Key Vault - No Crash
  • GCP Secret Manager - Crash

I believe we should evaluate what the expected behavior should be as compared to what is actually happening.

Also after discussing with @kaxil there may be a middle ground for fail over implementation that could make sense here.

  1. If configs are being retrieved through the secrets backend then a failure makes sense.
  2. If connections and/or variables are not able to be retrieved, then fail over could be a strategy used by users to ensure DAG/task success and predictable execution.
1reaction
potiukcommented, Jun 7, 2021

Agree we have consistency issue here - Interestingly, the AWS secret manager crashed originally for @subashcanapathy and @john-jac but did not crash for you @fhoda. Not sure what the reason is for that (maybe the 1.10 vs 2.* behavioral difference)?

I really like the idea of different behavior for different type of access. I think it answers my concerns perfectly and what it really boils down to is “who” is the “client” - whether it is “airflow” or the “DAG/task writer”.

I think the main difference of configuration vs. variables and connections is that Airflow has default values for most of the configurations and when they are not found, they will fall-back to the default values - which might alter behavior of airflow. So lack of secrets backend when it is configured and configuration is retrieved is very dangerous. And since it is accessed under-the hood by Airflow, without the “dag” or “task” using it, it’s airflow that is the “client” and it’s airflow that should handle it (and crashing is the only reasonable behavior IMHO). Simply “dag writer” is not in a control to make any decision here.

This is (as you rightfully noticed), far less of a concern for connections and variables - “clients” for those are “dag writers”. Whoever uses them should be prepared for what happens when the secret backend is missing. Either the “writers” will prepare fallback values for those in the DB or they will have to handle “missing” value somehow (and this is up to the ‘user’ what to do in this case). But they are in full control, there is no need to crash Airflow (yet! - until configuration is not accessed by Airflow itself).

Reopening it as it might actually be an actionable item to do 😃

@subashcanapathy , @john-jac - would that be a reasonable approach for you as well ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [airflow] potiuk commented on issue #14592: Unreachable ...
If we treat the secrets backend as meta-DB like quintessential, ... the metadata URL - which makes webserver crash > The problem is...
Read more >
[GitHub] [airflow] fhoda commented on issue #14592: Unreachable ...
[GitHub] [airflow] fhoda commented on issue #14592: Unreachable Secrets Backend Causes Web Server Crash. Posted to commits@airflow.apache.org.
Read more >
How To Fix The Web: Obscure Back-End Techniques And ...
If port 80 is down, there's a good chance that the control panel won't be available either. You will need to log in...
Read more >
Troubleshooting Vault - HashiCorp Developer
Vault has two types of logs - Vault server operational logs and audit logs. ... and root cause of Vault failure may be...
Read more >
How to use airflow secret backend with aws secret manager
cfg. Also, i have added role to ec2 server which has secret manager read/write access but still it is not taking value from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found