Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unit tests use up to 20GB of memory on circleci

See original GitHub issue

Problem While debugging intermittent test failures on PR #1410, @christopherbunn and I measured memory usage of the unit tests on circleci and found a complete end-to-end run can use up to 20GB at peak.

That’s way more than I would have expected… the question is, why?

Observations We ssh’ed into a circleci box running on main and ran the following using memory-profiler:

mprof run  --include-children pytest evalml/ -n 8 --doctest-modules --cov=evalml --junitxml=test-reports/junit.xml --doctest-continue-on-failure -v

Which created the following plot, visible with mprof plot:

I ran this twice and got a similar plot, so the results appear to be consistent across runs.

This is dangerously close to the max memory allowed on the circleci worker size we’re using. That’s why we started looking into this – on #1410, we saw that the memory usage went 5GB higher for some reason.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:11 (8 by maintainers)

Top GitHub Comments

1reaction

freddyaboultoncommented, Nov 24, 2020

We noticed that we can shave 1.5gb from just the automl tests (almost half!) by manually setting n_jobs=1 for all estimators used by automl (plots below). We verified that the value of n_jobs is a factor only in the few automl tests that don’t mock fit and score. Based on this, we have come up with the current plan:

for each component which accepts n_jobs as a parameter (i.e. the sklearn-based estimators), make sure we have one unit test which sets n_jobs=-1, to verify that works properly for that component.
for all other unit tests which don’t mock the underlying fit , set n_jobs=1 for all components to avoid memory and threading issues
make sure that in looking glass we’re running with n_jobs=-1, which i believe we currently are since the default value of n_jobs for relevant estimators is -1

Hopefully once this is done, we’ll see some nice improvements upon the overall memory footprint of the unit tests!

automl_tests_jobs-1 automl_tests_njobs_1

1reaction

gshenicommented, Nov 20, 2020

@thehomebrewnerd @freddyaboulton We could have an inline import for the sklearn import (so it only runs when you call the mutual info function). I have seen us explicitly do this in a few in other libraries. We generally do it to avoid circular imports. It would feel weird to do it just to save memory…

Top Results From Across the Web

Collecting test data - CircleCI

A guide to collecting test data in your CircleCI projects.

Running GitLab in a memory-constrained environment

GitLab requires a significant amount of memory when running with all features enabled. There are use-cases such as running GitLab on smaller installations ......

docker using too much memory

Docker Using Too Much MemoryAs of right now, there are 6145 Docker containers in the Registry tab of the Docker …. If they...

Fixing Jest Memory Usage on CircleCI | hey it's violet

Why? On my current project, we're using Jest and Enzyme to create unit tests which we run as a step in our CircleCI...

Free for Developers

API Mocha - Completely free online API mocking for testing and prototyping. ... cloud offers management of 1 cluster with up to 10...