SparkSubmitOperator only masks one "form" of password arguments
See original GitHub issueHello there, everyone. 😃
Apache Airflow version: 1.10.9, 1.10.10, trunk
- OS (e.g. from /etc/os-release): Linux
- Others: Bash/sh
What happened:
Password masking was added to SparkSubmitOperator
(SparkSubmitHook
, to be precise) in December 2019 (under AIRFLOW-6350; PR: #6917) - but it only masks passwords as long as they are in the --foo.password='value'
form; i.e. it must be put in single-quotes and be joined with the argument’s name via an equal sign.
What you expected to happen:
I would expect the forms a) with double-quotes or with no quotes at all b) with whitespace instead of an equal sign to also be covered by this mechanism, e.g.
--foo.password=value
--foo.password="value"
--foo.password 'value'
--foo.password value
--foo.password "value"
But I may be missing something. Is there any reason the initial version only covers the single-quoted-with-equal-sign form? The regular expression used in the masking code (1.10.9 version, trunk version) looks pretty intentional:
def _mask_cmd(self, connection_cmd):
# Mask any password related fields in application args with key value pair
# where key contains password (case insensitive), e.g. HivePassword='abc'
connection_cmd_masked = re.sub(
r"(\S*?(?:secret|password)\S*?\s*=\s*')[^']*(?=')",
r'\1******', ' '.join(connection_cmd), flags=re.I)
How to reproduce it:
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator # Airflow 1.10.9
dag = DAG(...)
SparkSubmitOperator(
...,
conf={"spark.foo.password": "this_should_get_masked_but_it_doesnt"},
dag=dag,
)
Running such a task will leak the password into Airflow logs.
Anything else we need to know:
Again, I may be missing something, e.g. sth OS-specific. I’d be happy to learn something here. 😃
In case all/part of the other forms I mentioned should also get the masking treatment, I have a change ready for opening a PR.
(Note there’s no JIRA issue referenced in the commit messages: I cannot create issues in Airflow’s Jira for some reason)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6 (5 by maintainers)
FYI @Unit03. You can put
Closes #ISSUE
in the commit message and it will close related issue at merge 😃.Looks like!