Unit tests use up to 20GB of memory on circleci
See original GitHub issueProblem While debugging intermittent test failures on PR #1410, @christopherbunn and I measured memory usage of the unit tests on circleci and found a complete end-to-end run can use up to 20GB at peak.
That’s way more than I would have expected… the question is, why?
Observations
We ssh’ed into a circleci box running on main
and ran the following using memory-profiler
:
mprof run --include-children pytest evalml/ -n 8 --doctest-modules --cov=evalml --junitxml=test-reports/junit.xml --doctest-continue-on-failure -v
Which created the following plot, visible with mprof plot
:
I ran this twice and got a similar plot, so the results appear to be consistent across runs.
This is dangerously close to the max memory allowed on the circleci worker size we’re using. That’s why we started looking into this – on #1410, we saw that the memory usage went 5GB higher for some reason.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:11 (8 by maintainers)
Top Results From Across the Web
Collecting test data - CircleCI
A guide to collecting test data in your CircleCI projects.
Read more >Running GitLab in a memory-constrained environment
GitLab requires a significant amount of memory when running with all features enabled. There are use-cases such as running GitLab on smaller installations ......
Read more >docker using too much memory
Docker Using Too Much MemoryAs of right now, there are 6145 Docker containers in the Registry tab of the Docker …. If they...
Read more >Fixing Jest Memory Usage on CircleCI | hey it's violet
Why? On my current project, we're using Jest and Enzyme to create unit tests which we run as a step in our CircleCI...
Read more >Free for Developers
API Mocha - Completely free online API mocking for testing and prototyping. ... cloud offers management of 1 cluster with up to 10...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We noticed that we can shave 1.5gb from just the automl tests (almost half!) by manually setting
n_jobs=1
for all estimators used by automl (plots below). We verified that the value ofn_jobs
is a factor only in the few automl tests that don’t mockfit
andscore
. Based on this, we have come up with the current plan:n_jobs=-1
, to verify that works properly for that component.fit
, setn_jobs=1
for all components to avoid memory and threading issuesn_jobs=-1
, which i believe we currently are since the default value ofn_jobs
for relevant estimators is -1Hopefully once this is done, we’ll see some nice improvements upon the overall memory footprint of the unit tests!
@thehomebrewnerd @freddyaboulton We could have an inline import for the sklearn import (so it only runs when you call the mutual info function). I have seen us explicitly do this in a few in other libraries. We generally do it to avoid circular imports. It would feel weird to do it just to save memory…