Save cost for CI infrastructure by having more types of build machines
See original GitHub issueAfter https://github.com/apache/airflow/pull/14531 we are running the CI infrastructiure of ours using r5a.2xlarge
machines. They are needed for tests (thanks to having 8xCPUs and 64 GB RAM we can run full suite of tests in 15 minutes instead of 1h by running the tests in parallel. Likely (due to ~ 2x less cost of running tests (2x more price/ 4x less time) we are saving money. We also achieve 30%-40% speedups on Pylint, Static Checks (thanks to 8 CPUS available).
However not all jobs can use the memory/parallelism and we can use much cheaper machines without impacting CI execution time:
-
Docs (not yet but possible)
-
K8S tests (not yet but possible)
-
Building Images
-
Providers building and testing
-
Image verification
-
Image waiting (this one is particularly wasteful)
By introducing a smaller type of machines and assigning the “big” machine type only to the jobs that can utilise higher we can drive the cost down significantly.
Small machine = r5a.large with 2 CPUs and 16GB memory should likely be enough
Spot Price in Frankfurt:
r5.2xlarge = $0.1549 per Hour r5a.large = $0.0365 per Hour
This means that for non-test jobs we can have further 4 x smaller cost if we do it.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:8 (8 by maintainers)
Top GitHub Comments
Before I go the “small instance” route, I am going to experiment now with the bigger instances. I think even now they will give us significant boost and cheaper prices (because they will be needed for shorter time). We got some 30 USD/day with parallel tests (which is ~ 800 USD/month give or take including lower traffic during weekends) which is still quite a bit too high.
I set the autoscaling like this now:
The tests will run ~2 times faster now and I hope it will bring the cost down (I am also working on few more optimizations).
I think we will not do it - we should only do something like that when/if we move to K8S driven CI.