Utilize tags for metrics sent to SafeDogStatsdLogger
See original GitHub issueDescription
A recent pr enabled dogstatsd support for Airflow metrics: https://github.com/apache/airflow/pull/7376. While this enables the use of dogstatsd, the code sending metrics to SafeDogStatsdLogger doesn’t utilize tagging and instead sends a unique, monolithic metric that cant be aggregated across identifiers such as <dag_id>
. This isn’t scalable when someone wants to monitor metrics across multiple DAG as each metric sent by each DAG is unique. The amount of monitors increases with the amount of DAGs.
An example here are the timer metrics sent by a DagRun, such as dagrun.duration.failed.<dag_id>
. When sent by the DagRun object, <dag_id>
isn’t a tag but part of the entire metric itself: https://github.com/apache/airflow/blob/master/airflow/models/dagrun.py#L412-L420
What is the problem here?
By sending metrics to DataDog without tags, it becomes impossible to aggregate metrics across <dag_id>
because each dagrun.duration.failed.<dag_id>
sent by a DAG is completely unique to that <dag_id>
.
If I have 20 dags in production and want to monitor dagrun.duration.failed.<dag_id>
, that means I’ll need 20 separate monitors!
But if <dag_id>
is sent as a tag, a single monitor could be used and DataDog can group the metric by <dag_id>
.
Use case / motivation
The current way metrics are sent to DataDog isn’t scalable as its preventing a user from aggregating common metrics across unique tags.
Following the DagRun example given above, the information needed to send this metric as a tag is available. Given this line of code: https://github.com/apache/airflow/blob/master/airflow/models/dagrun.py#L418 and the accompanying function definition: https://github.com/apache/airflow/blob/master/airflow/stats.py#L172 we can modify the function call to send <dag_id>
as a tag:
toy example:
duration = (self.end_date - self.start_date)
if self.state is State.SUCCESS:
if isinstance(Stats, SafeDogStatsdLogger)
Stats.timing('dagrun.duration.success', duration, tags=[self.dag_id])
else:
Stats.timing('dagrun.duration.success.{}'.format(self.dag_id), duration)
The preference here is probably not to do type checking before submitting the metric. I’m willing to discuss other solutions here or as part of a PR, and to implement the agreed upon solution.
Related Issues
This is the ticket that created the SafeDogStatsdLogger
class: https://github.com/apache/airflow/pull/7376
Issue Analytics
- State:
- Created 3 years ago
- Reactions:5
- Comments:7 (3 by maintainers)
Top GitHub Comments
cc: @howardyoo
@williamBartos I stumbled upon this today which could be a good work around: https://docs.datadoghq.com/developers/dogstatsd/dogstatsd_mapper/. Also seems like this use case is common enough that airflow metrics are the actual example that DataDog uses