question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Airflow ElasticSearch provider issue

See original GitHub issue

Apache Airflow version

2.3.3 (latest released)

What happened

Durign usage of Airflow v2.1.3 in my project this issue appeared, and was solved by adding the Offset_Key to the Fluent Bit configuration. This Offset_Key appends the offset field to the logs, so we can retrieve the logs in correct order. We specified the AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset" and logs were retrieved correctly based on the custom_offset and then displayed in Airflow UI.

Now, I updated the version to the v2.3.3 and this behavior is no longer valid. I tested some combinations:

  • AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key has the same value - no offset key is created in the logs and logs cannot be obtained from ElasticSearch
  • AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key has different values - both offset keys are added to the logs and I can see the logs on UI (logs are obtained based on AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and not custom one). Due to backward compatibility I need to achieve config in which custom_offset has higher precedence than the one Airflow inserts.

As suggested here I tried to lower the elasticsearch provider version and see which one will work for this scenario.

It turned out that the version which we used with Airflow v2.1.3 was OK, so the apache-airflow-providers-elasticsearch==2.0.2. I think that this change break our use case, as the version 2.0.3 is first that does not work for us - changelog. With the version 2.0.2 I can see that custom_offset and the Airflow’s offset are added to the logs, but thanks to AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset" logs are displayed in correct order.

What you think should happen instead

Offset from Airflow should not conflict with the offset added by third party tool since Airflow does not support sending logs to the ElasticSearch, but supports reading from it.

Most probably, there will be an issue with flow of the logs. Right now it is like:

Airflow -> LogFile <- Fluent Bit -> ElasticSearch <- Airflow

so Airflow does not know about the (in that specific case) Fluent Bit config and it’s offset name.

It would be nice to make the change in version 2.0.3 I linked above optional, so we can instruct Airflow if it should create a offset with given AIRFLOW__ELASTICSEARCH__OFFSET_FIELD name or just use that name to obtain logs (I do not know the whole logic behind the Airflow logs retrieval, so not sure if this is a good idea). I think that the bool flag like AIRFLOW__ELASTICSEARCH__ADD_OFFSET_FIELD could determine the creation of Airflow’s offset field and the AIRFLOW__ELASTICSEARCH__OFFSET_FIELD could determine what name to use to either create and retrieve logs OR just retrieve the logs.

How to reproduce

Use Airflow in v2.3.3. Use Fluent Bit in v1.9.6 and add the Offset_Key to it’s INPUT config Use ElasticSearch to store logs and read logs from ElasticSearch in Airflow UI.

Operating System

AKS

Versions of Apache Airflow Providers

Working case (Airflow 2.1.3):

  • apache-airflow-providers-amazon==2.1.0
  • apache-airflow-providers-celery==2.0.0
  • apache-airflow-providers-cncf-kubernetes==2.0.2
  • apache-airflow-providers-docker==2.1.0
  • apache-airflow-providers-elasticsearch==2.0.2
  • apache-airflow-providers-ftp==2.0.0
  • apache-airflow-providers-google==5.0.0
  • apache-airflow-providers-grpc==2.0.0
  • apache-airflow-providers-hashicorp==2.0.0
  • apache-airflow-providers-http==2.0.0
  • apache-airflow-providers-imap==2.0.0
  • apache-airflow-providers-microsoft-azure==3.1.0
  • apache-airflow-providers-mysql==2.1.0
  • apache-airflow-providers-odbc==2.0.0
  • apache-airflow-providers-postgres==2.0.0
  • apache-airflow-providers-redis==2.0.0
  • apache-airflow-providers-sendgrid==2.0.0
  • apache-airflow-providers-sftp==2.1.0
  • apache-airflow-providers-slack==4.0.0
  • apache-airflow-providers-sqlite==2.0.0
  • apache-airflow-providers-ssh==2.1.0

Not working case (Airflow v2.3.3):

  • apache-airflow-providers-amazon==4.0.0
  • apache-airflow-providers-celery==3.0.0
  • apache-airflow-providers-cncf-kubernetes==4.1.0
  • apache-airflow-providers-docker==3.0.0
  • apache-airflow-providers-elasticsearch==4.0.0
  • apache-airflow-providers-ftp==3.0.0
  • apache-airflow-providers-google==8.1.0
  • apache-airflow-providers-grpc==3.0.0
  • apache-airflow-providers-hashicorp==3.0.0
  • apache-airflow-providers-http==3.0.0
  • apache-airflow-providers-imap==3.0.0
  • apache-airflow-providers-microsoft-azure==4.0.0
  • apache-airflow-providers-mysql==3.0.0
  • apache-airflow-providers-odbc==3.0.0
  • apache-airflow-providers-postgres==5.0.0
  • apache-airflow-providers-redis==3.0.0
  • apache-airflow-providers-sendgrid==3.0.0
  • apache-airflow-providers-sftp==3.0.0
  • apache-airflow-providers-slack==5.0.0
  • apache-airflow-providers-sqlite==3.0.0
  • apache-airflow-providers-ssh==3.0.0

Airflow v2.3.3 is working with apache-airflow-providers-elasticsearch==2.0.2

Deployment

Other 3rd-party Helm chart

Deployment details

We are using Airflow Community Helm chart + Azure Kubernetes Service

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR! (If the fix will be provided in the far future I can work on the PR to get it sooner)

Code of Conduct

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:15 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
potiukcommented, Jul 22, 2022

Cool. I will add the flag - it used to be there in old breeze (and will just turn this error into warning - it’s not nessary to be run, it’s more to make sure we have latest version of tags 😃 .

https://github.com/apache/airflow/pull/25236 to skip the fetch error and turn it into warning @PatrykKlimowicz

1reaction
potiukcommented, Jul 22, 2022

COOOL. I am merging it now then 😃.

We release providers ~ monthly last release was last week, so expect this one in ~3 weeks or so

Read more comments on GitHub >

github_iconTop Results From Across the Web

Package apache-airflow-providers-elasticsearch
This is a provider package for elasticsearch provider. All classes for this provider package are in airflow.providers.elasticsearch python package.
Read more >
apache-airflow-providers-elasticsearch 4.3.1
This is a provider package for elasticsearch provider. All classes for this provider package are in airflow.providers.elasticsearch python package.
Read more >
Elasticsearch Backport Provider Incompatible with Airflow ...
Apache Airflow version: 1.10.12 Kubernetes version (if you are using kubernetes) (use kubectl version): 1.16.9 Environment: Cloud provider ...
Read more >
apache-airflow-backport-providers-elasticsearch
A security vulnerability was detected in an indirect dependency that is added to your project when the latest version of apache-airflow-backport-providers- ...
Read more >
Airflow module | Metricbeat Reference [7.17]
Elasticsearch is a trademark of Elasticsearch B.V., registered in the U.S. and in other countries. Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found