Airflow ElasticSearch provider issue
See original GitHub issueApache Airflow version
2.3.3 (latest released)
What happened
Durign usage of Airflow v2.1.3 in my project this issue appeared, and was solved by adding the Offset_Key
to the Fluent Bit configuration. This Offset_Key appends the offset field to the logs, so we can retrieve the logs in correct order. We specified the AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"
and logs were retrieved correctly based on the custom_offset
and then displayed in Airflow UI.
Now, I updated the version to the v2.3.3 and this behavior is no longer valid. I tested some combinations:
- AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key has the same value - no offset key is created in the logs and logs cannot be obtained from ElasticSearch
- AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and Offset_Key has different values - both offset keys are added to the logs and I can see the logs on UI (logs are obtained based on AIRFLOW__ELASTICSEARCH__OFFSET_FIELD and not custom one).
Due to backward compatibility I need to achieve config in which
custom_offset
has higher precedence than the one Airflow inserts.
As suggested here I tried to lower the elasticsearch provider version and see which one will work for this scenario.
It turned out that the version which we used with Airflow v2.1.3 was OK, so the apache-airflow-providers-elasticsearch==2.0.2
.
I think that this change break our use case, as the version 2.0.3
is first that does not work for us - changelog. With the version 2.0.2 I can see that custom_offset
and the Airflow’s offset
are added to the logs, but thanks to AIRFLOW__ELASTICSEARCH__OFFSET_FIELD="custom_offset"
logs are displayed in correct order.
What you think should happen instead
Offset from Airflow should not conflict with the offset added by third party tool since Airflow does not support sending logs to the ElasticSearch, but supports reading from it.
Most probably, there will be an issue with flow of the logs. Right now it is like:
Airflow -> LogFile <- Fluent Bit -> ElasticSearch <- Airflow
so Airflow does not know about the (in that specific case) Fluent Bit config and it’s offset name.
It would be nice to make the change in version 2.0.3 I linked above optional, so we can instruct Airflow if it should create a offset with given AIRFLOW__ELASTICSEARCH__OFFSET_FIELD
name or just use that name to obtain logs (I do not know the whole logic behind the Airflow logs retrieval, so not sure if this is a good idea). I think that the bool flag like AIRFLOW__ELASTICSEARCH__ADD_OFFSET_FIELD
could determine the creation of Airflow’s offset field and the AIRFLOW__ELASTICSEARCH__OFFSET_FIELD
could determine what name to use to either create and retrieve logs OR just retrieve the logs.
How to reproduce
Use Airflow in v2.3.3. Use Fluent Bit in v1.9.6 and add the Offset_Key to it’s INPUT config Use ElasticSearch to store logs and read logs from ElasticSearch in Airflow UI.
Operating System
AKS
Versions of Apache Airflow Providers
Working case (Airflow 2.1.3):
- apache-airflow-providers-amazon==2.1.0
- apache-airflow-providers-celery==2.0.0
- apache-airflow-providers-cncf-kubernetes==2.0.2
- apache-airflow-providers-docker==2.1.0
- apache-airflow-providers-elasticsearch==2.0.2
- apache-airflow-providers-ftp==2.0.0
- apache-airflow-providers-google==5.0.0
- apache-airflow-providers-grpc==2.0.0
- apache-airflow-providers-hashicorp==2.0.0
- apache-airflow-providers-http==2.0.0
- apache-airflow-providers-imap==2.0.0
- apache-airflow-providers-microsoft-azure==3.1.0
- apache-airflow-providers-mysql==2.1.0
- apache-airflow-providers-odbc==2.0.0
- apache-airflow-providers-postgres==2.0.0
- apache-airflow-providers-redis==2.0.0
- apache-airflow-providers-sendgrid==2.0.0
- apache-airflow-providers-sftp==2.1.0
- apache-airflow-providers-slack==4.0.0
- apache-airflow-providers-sqlite==2.0.0
- apache-airflow-providers-ssh==2.1.0
Not working case (Airflow v2.3.3):
- apache-airflow-providers-amazon==4.0.0
- apache-airflow-providers-celery==3.0.0
- apache-airflow-providers-cncf-kubernetes==4.1.0
- apache-airflow-providers-docker==3.0.0
- apache-airflow-providers-elasticsearch==4.0.0
- apache-airflow-providers-ftp==3.0.0
- apache-airflow-providers-google==8.1.0
- apache-airflow-providers-grpc==3.0.0
- apache-airflow-providers-hashicorp==3.0.0
- apache-airflow-providers-http==3.0.0
- apache-airflow-providers-imap==3.0.0
- apache-airflow-providers-microsoft-azure==4.0.0
- apache-airflow-providers-mysql==3.0.0
- apache-airflow-providers-odbc==3.0.0
- apache-airflow-providers-postgres==5.0.0
- apache-airflow-providers-redis==3.0.0
- apache-airflow-providers-sendgrid==3.0.0
- apache-airflow-providers-sftp==3.0.0
- apache-airflow-providers-slack==5.0.0
- apache-airflow-providers-sqlite==3.0.0
- apache-airflow-providers-ssh==3.0.0
Airflow v2.3.3 is working with apache-airflow-providers-elasticsearch==2.0.2
Deployment
Other 3rd-party Helm chart
Deployment details
We are using Airflow Community Helm chart + Azure Kubernetes Service
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR! (If the fix will be provided in the far future I can work on the PR to get it sooner)
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created a year ago
- Comments:15 (9 by maintainers)
https://github.com/apache/airflow/pull/25236 to skip the fetch error and turn it into warning @PatrykKlimowicz
COOOL. I am merging it now then 😃.
We release providers ~ monthly last release was last week, so expect this one in ~3 weeks or so