Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Airflow web UI is slow

See original GitHub issue

Apache Airflow version:

1.10.10

Kubernetes version (if you are using kubernetes) (use kubectl version):

1.13.12

Environment:

Cloud provider or hardware configuration: Azure
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

What happened:

Every HTTP requests of the UI takes at least 5s, even static content.

/admin/metrics/, /health endpoints and 404 page have the same problems.

Here a graphs showing CPU usage of all ariflow components:

Each container has a 1s limit (left Y axis) so none of them is currently CPU bound.

What you expected to happen:

How to reproduce it:

Anything else we need to know:

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:26 (6 by maintainers)

Top GitHub Comments

19reactions

danielnazareth89commented, Aug 28, 2020

I fixed this by changing the default worker gunicorn from sync to an asynchronous class namely gevent. Please see the below AWS thread for the weird behaviour between gunicorn and ELB’s.

https://forums.aws.amazon.com/thread.jspa?messageID=419138

So, simply set AIRFLOW__WEBSERVER__WORKER_CLASS: “gevent” in your config, should be better

10reactions

natemosemancommented, Aug 26, 2020

I had a similar issue with airflow running on kubernetes cluster that I was, fortunately, able to solve.

Regular HTTP connections were taking a minimal of 5 seconds to complete. Even using curl to fetch static content, like a CSS file, was taking 5 seconds. When looking at the logs of the airflow web process it didn’t show anything. Although following them ‘realtime’ showed that the GET was showing up in the logs with the same 5 second delays as the curl command was taking.

by-passing everything and running curl directly from the container, and thus by-passing all the kubernetes networking stuff, was still having the delays.

It took me a few days to realize what was going on with my setup.

As it turned out it was due to the ‘type: LoadBalancer’ service I was using to expose the airflow webserver to outside the cluster. The loadbalancer was a external network load balancer that connected to the service via NodePort on each virtual machine in the node. For whatever reason this meant that there was a large number of connections just kept open to the webserver at any time.

In a 20 node cluster this meant 20 connections.

So when I killed the LoadBalancer service and started using nginx-ingress instead then the problem instantly resolved itself. No more delays. Admin web UI went back to normal.

I am not exactly sure what was going on here. But I suspect that having a large number of connections always open was causing gunicorn process to delay routing new connections to the pool of webserver worker processes. I was only using 4 processes at the time.

So if you are seeing these strange 5 second delays then use netstat or similar tool to count the number of “ESTABLISHED” connections to the webserver process. If you have a lot of connections and you are using service ‘type: LoadBalancer’ then try switching to using a ingress controller. Also increasing the number of worker processes to exceed the number of established connections will probably work too.

Hope that helps.