Airflow web UI is slow
See original GitHub issueApache Airflow version:
1.10.10
Kubernetes version (if you are using kubernetes) (use kubectl version
):
1.13.12
Environment:
- Cloud provider or hardware configuration: Azure
- OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a
): - Install tools:
- Others:
What happened:
Every HTTP requests of the UI takes at least 5s, even static content.
/admin/metrics/
, /health
endpoints and 404 page have the same problems.
Here a graphs showing CPU usage of all ariflow components:
Each container has a 1s limit (left Y axis) so none of them is currently CPU bound.
What you expected to happen:
How to reproduce it:
Anything else we need to know:
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:26 (6 by maintainers)
Top Results From Across the Web
Airflow UI feels really slow : r/dataengineering - Reddit
When running airflow webserver, any request takes more than 5 seconds to load, and I don't know if that's normal or if something...
Read more >Airflow UI loading extremely slow after upgrading version
I upgraded my airflow cluster from 1.7.1.3 to 1.10.1. After the upgradation the main page UI of airflow is loading very slowly.
Read more >Airflow New Dag File Processing Slow - Astronomer Forum
When there is a new dag file created inside “dags” directory then Airflow takes more than 30 minutes to load new dag file...
Read more >Performance tuning for Apache Airflow on Amazon MWAA
... for Apache Airflow (MWAA) environment using Airflow configuration options. ... and increase the time it takes for DAGs to appear in the...
Read more >Why Is Airflow 1.10.12 Much Slower Than 1.10.10 - ADocLib
There are no "obvious" candidates for slow spots - "plateaus" where. ... The Airflow Scheduler, Web UI, and Worker will pick up the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I fixed this by changing the default worker gunicorn from sync to an asynchronous class namely gevent. Please see the below AWS thread for the weird behaviour between gunicorn and ELB’s.
https://forums.aws.amazon.com/thread.jspa?messageID=419138
So, simply set AIRFLOW__WEBSERVER__WORKER_CLASS: “gevent” in your config, should be better
I had a similar issue with airflow running on kubernetes cluster that I was, fortunately, able to solve.
Regular HTTP connections were taking a minimal of 5 seconds to complete. Even using curl to fetch static content, like a CSS file, was taking 5 seconds. When looking at the logs of the airflow web process it didn’t show anything. Although following them ‘realtime’ showed that the GET was showing up in the logs with the same 5 second delays as the curl command was taking.
by-passing everything and running curl directly from the container, and thus by-passing all the kubernetes networking stuff, was still having the delays.
It took me a few days to realize what was going on with my setup.
As it turned out it was due to the ‘type: LoadBalancer’ service I was using to expose the airflow webserver to outside the cluster. The loadbalancer was a external network load balancer that connected to the service via NodePort on each virtual machine in the node. For whatever reason this meant that there was a large number of connections just kept open to the webserver at any time.
In a 20 node cluster this meant 20 connections.
So when I killed the LoadBalancer service and started using nginx-ingress instead then the problem instantly resolved itself. No more delays. Admin web UI went back to normal.
I am not exactly sure what was going on here. But I suspect that having a large number of connections always open was causing gunicorn process to delay routing new connections to the pool of webserver worker processes. I was only using 4 processes at the time.
So if you are seeing these strange 5 second delays then use netstat or similar tool to count the number of “ESTABLISHED” connections to the webserver process. If you have a lot of connections and you are using service ‘type: LoadBalancer’ then try switching to using a ingress controller. Also increasing the number of worker processes to exceed the number of established connections will probably work too.
Hope that helps.