Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[memo] High memory consumption and the places of doubts

See original GitHub issue

I write down the current memory usage as a memo just in case when we encounter memory leak issues in the future. This post is based on the current implementation.

When we run a dataset with the size of 300B, AutoPytorch consumes ~1.5GB and the followings are the major source of the memory consumptions:

Source	Consumption [GB]
Import modules	0.35
Dask Client	0.35
Logger (Thread safe)	0.4
Running of context.Process in multiprocessing module	0.4
Model	0 ~ inf
Total	1.5 ~ inf

When we run a dataset with the size of 300MB (400,000 instances x 80 features) such as Albert, AutoPytorch consumes ~2.5GB and the followings are the major source of the memory consumptions:

Source	Consumption [GB]
Import modules	0.35
Dask Client	0.35
Logger (Thread safe)	0.4
Dataset itself	0.3
self.categories in InputValidator	0.3
Running of context.Process in multiprocessing module	0.4
Model (e.g. LightGBM)	0.4 ~ inf
Total	2.5 ~ inf

All the information was obtained by:

$ mprof run --include-children python -m examples.tabular.20_basics.example_tabular_classification

and the logger which I set for the debugging. Note that I also added time.sleep(0.5) before and after the line of interest to eliminate the possibilities of the influences from other elements and checked each line in detail.

Issue Analytics

State:
Created 2 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

ArlindKadracommented, Apr 21, 2021

Interesting 😃, I think the analysis in the future should also be extended to the following datasets:

https://archive.ics.uci.edu/ml/datasets/covertype
https://archive.ics.uci.edu/ml/datasets/HIGGS https://archive.ics.uci.edu/ml/datasets/Poker+Hand

They proved tricky.

0reactions

nabenabe0928commented, Aug 12, 2021

Check if we can use generator instead of np.ndarray

Top Results From Across the Web

How To Avoid Performance Pitfalls in React with memo ...

You'll look at how different actions can trigger re-renders and how you can use Hooks and memoization to minimize expensive data calculations.

Visual studio 2022 highly used memory - Stack Overflow

A coworker of mine found the memory leak. If you disable these two checkboxes your problem is fixed. Tools -> options -> Text...

How to find which processes are taking all the memory?

Press SHIFT + f; Press the Letter corresponding to %MEM; Press ENTER ... This will give the top 5 processes by memory usage....

Memory usage - Advanced R. - Hadley Wickham

Memory usage and garbage collection introduces you to the mem_used() and mem_change() functions that will help you understand how R allocates and frees ......

How to find memory usage of individual Windows services?

And apart from observing CPU usage for each service it also makes it easy ... "Network Location Awareness" SC Config NLA Type= own...