question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SparkSubmitOperator only masks one "form" of password arguments

See original GitHub issue

Hello there, everyone. 😃

Apache Airflow version: 1.10.9, 1.10.10, trunk

  • OS (e.g. from /etc/os-release): Linux
  • Others: Bash/sh

What happened:

Password masking was added to SparkSubmitOperator (SparkSubmitHook, to be precise) in December 2019 (under AIRFLOW-6350; PR: #6917) - but it only masks passwords as long as they are in the --foo.password='value' form; i.e. it must be put in single-quotes and be joined with the argument’s name via an equal sign.

What you expected to happen:

I would expect the forms a) with double-quotes or with no quotes at all b) with whitespace instead of an equal sign to also be covered by this mechanism, e.g.

  • --foo.password=value
  • --foo.password="value"
  • --foo.password 'value'
  • --foo.password value
  • --foo.password "value"

But I may be missing something. Is there any reason the initial version only covers the single-quoted-with-equal-sign form? The regular expression used in the masking code (1.10.9 version, trunk version) looks pretty intentional:

    def _mask_cmd(self, connection_cmd):
        # Mask any password related fields in application args with key value pair
        # where key contains password (case insensitive), e.g. HivePassword='abc'

        connection_cmd_masked = re.sub(
            r"(\S*?(?:secret|password)\S*?\s*=\s*')[^']*(?=')",
            r'\1******', ' '.join(connection_cmd), flags=re.I)

How to reproduce it:

from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator  # Airflow 1.10.9

dag = DAG(...)
SparkSubmitOperator(
    ...,
    conf={"spark.foo.password": "this_should_get_masked_but_it_doesnt"},
    dag=dag,
)

Running such a task will leak the password into Airflow logs.

Anything else we need to know:

Again, I may be missing something, e.g. sth OS-specific. I’d be happy to learn something here. 😃

In case all/part of the other forms I mentioned should also get the masking treatment, I have a change ready for opening a PR.

(Note there’s no JIRA issue referenced in the commit messages: I cannot create issues in Airflow’s Jira for some reason)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
potiukcommented, Jul 11, 2020

FYI @Unit03. You can put Closes #ISSUE in the commit message and it will close related issue at merge 😃.

1reaction
potiukcommented, Jul 11, 2020

Looks like!

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - How to mask a password when password is sent as ...
I used the env argument in the BashOperator to set a variable that can be used by the bash command like so: PASSWORD...
Read more >
Package apache-airflow-providers-apache-spark
Mask other forms of password arguments in SparkSubmitOperator (#9615). 13a827d80f. 2020-07-09. Ensure Kerberos token is valid in SparkSubmitOperator before ...
Read more >
apache-airflow-providers-apache-spark 1.0.0rc1 - PyPI
Commit Committed Subject b40dffa08 2020‑12‑08 Rename remaing modules to match AIP‑21 (#12917) 7825e8f59 2020‑11‑13 Docs installation improvements (#12304) b2a28d159 2020‑11‑09 Moves provider packages scripts to dev...
Read more >
airflow example with spark submit operator - YouTube
airflow example with spark submit operator will explain about spark submission via apache airflow scheduler.Hi Team,Our New online batch ...
Read more >
Airflow might be leaking your passwords - POATEK
One approach is to use EmrAddStepsOperator and pass Cassandra's password through the steps argument. Note: the following code is just an example ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found