question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add `is_airflow_caller` or similar to snowplow environment context

See original GitHub issue

Per https://github.com/meltano/internal-general/issues/391 and related to https://github.com/meltano/files-airflow/issues/19 we should bump the environment_context schema version and include a new field called is_airflow_caller derived from whether the env var from https://github.com/meltano/files-airflow/issues/19 is present in the environment.

The presence of this env var would signal that meltano is being invoked from Airflow.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
pnadolny13commented, Jun 15, 2022

@pnadolny13 where do you place the urgency on this?

@tayloramurphy I’m not quite sure since I havent been able to dive into the actual process data were getting. If we can’t get it through the process ID then it will be higher priority since we’ll only be able to count airflow when its started up or added/installed.

1reaction
pandemicsyncommented, Jun 13, 2022

@aaronsteers @pnadolny13 did a quick test and just injected a ps -axf at the head of all the meltano calls in our dag generator:

AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=meltano_g-to-p_g-to-p-job
AIRFLOW_CTX_TASK_ID=meltano_g-to-p_g-to-p-job_task0
AIRFLOW_CTX_EXECUTION_DATE=2022-06-09T14:43:27.375054+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2022-06-09T14:43:27.375054+00:00
[2022-06-09 14:43:31,228] {subprocess.py:52} INFO - Tmp dir root location: 
 /tmp
[2022-06-09 14:43:31,229] {subprocess.py:63} INFO - Running command: ['bash', '-c', 'ps axf; cd /home/syn/speedrun/meltano-projects/my-meltano-project; .meltano/run/bin run tap-gitlab target-postgres']
[2022-06-09 14:43:31,233] {subprocess.py:74} INFO - Output:
[2022-06-09 14:43:31,242] {subprocess.py:78} INFO -     PID TTY      STAT   TIME COMMAND
<--- output trimmed -->
[2022-06-09 14:43:31,248] {subprocess.py:78} INFO -     863 ?        Ss     1:19 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
[2022-06-09 14:43:31,248] {subprocess.py:78} INFO -  848114 ?        Ss     0:00  \_ sshd: syn [priv]
[2022-06-09 14:43:31,248] {subprocess.py:78} INFO -  848235 ?        S      0:00  |   \_ sshd: syn@pts/0
[2022-06-09 14:43:31,248] {subprocess.py:78} INFO -  848236 pts/0    Ss     0:00  |       \_ -zsh
[2022-06-09 14:43:31,248] {subprocess.py:78} INFO -  848852 pts/0    Sl+    0:01  |           \_ /home/syn/.local/pipx/venvs/meltano/bin/python /home/syn/.local/bin/meltano invoke airflow scheduler
[2022-06-09 14:43:31,248] {subprocess.py:78} INFO -  848912 pts/0    S+     0:05  |               \_ /home/syn/speedrun/meltano-projects/my-meltano-project/.meltano/orchestrators/airflow/venv/bin/python /home/syn/speedrun/meltano-projects/my-meltano-project/.meltano/orchestrators/airflow/venv/bin/airflow scheduler
[2022-06-09 14:43:31,248] {subprocess.py:78} INFO -  848915 pts/0    S+     0:00  |                   \_ airflow serve-logs
[2022-06-09 14:43:31,248] {subprocess.py:78} INFO -  848916 pts/0    S      0:00  |                   \_ airflow scheduler -- DagFileProcessorManager
[2022-06-09 14:43:31,249] {subprocess.py:78} INFO -  851087 pts/0    S+     0:01  |                   \_ /home/syn/speedrun/meltano-projects/my-meltano-project/.meltano/orchestrators/airflow/venv/bin/python /home/syn/speedrun/meltano-projects/my-meltano-project/.meltano/orchestrators/airflow/venv/bin/airflow tasks run meltano_g-to-p_g-to-p-job meltano_g-to-p_g-to-p-job_task0 2022-06-09T14:43:27.375054+00:00 --local --pool default_pool --subdir /home/syn/speedrun/meltano-projects/my-meltano-project/orchestrate/dags/meltano.py
[2022-06-09 14:43:31,249] {subprocess.py:78} INFO -  851093 pts/0    S      0:00  |                       \_ airflow task runner: meltano_g-to-p_g-to-p-job meltano_g-to-p_g-to-p-job_task0 2022-06-09T14:43:27.375054+00:00 7
[2022-06-09 14:43:31,249] {subprocess.py:78} INFO -  851094 ?        Ss     0:00  |                           \_ bash -c ps axf; cd /home/syn/speedrun/meltano-projects/my-meltano-project; .meltano/run/bin run tap-gitlab target-postgres
[2022-06-09 14:43:31,249] {subprocess.py:78} INFO -  851095 ?        R      0:00  |                               \_ ps axf

So a couple of things.

  1. Does look like at least airflow task runner is in the chain when using meltano invoke scheduler airflow, but no clue what that looks like if you have k8s in the mix.
  2. Airflow already sets some airflow specific env’s we could just use.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Adding data to your events: context and more
Using event context, you can add any details you like to your events, as long as you can describe them in a self-describing...
Read more >
Understanding Snowplow Analytics Custom Contexts
In Google Analytics and Adobe Analytics, the way to add additional context to an event is to map an event property to a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found