question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[memo] High memory consumption and the places of doubts

See original GitHub issue

I write down the current memory usage as a memo just in case when we encounter memory leak issues in the future. This post is based on the current implementation.

When we run a dataset with the size of 300B, AutoPytorch consumes ~1.5GB and the followings are the major source of the memory consumptions:

Source Consumption [GB]
Import modules 0.35
Dask Client 0.35
Logger (Thread safe) 0.4
Running of context.Process in multiprocessing module 0.4
Model 0 ~ inf
Total 1.5 ~ inf

When we run a dataset with the size of 300MB (400,000 instances x 80 features) such as Albert, AutoPytorch consumes ~2.5GB and the followings are the major source of the memory consumptions:

Source Consumption [GB]
Import modules 0.35
Dask Client 0.35
Logger (Thread safe) 0.4
Dataset itself 0.3
self.categories in InputValidator 0.3
Running of context.Process in multiprocessing module 0.4
Model (e.g. LightGBM) 0.4 ~ inf
Total 2.5 ~ inf

All the information was obtained by:

$ mprof run --include-children python -m examples.tabular.20_basics.example_tabular_classification

and the logger which I set for the debugging. Note that I also added time.sleep(0.5) before and after the line of interest to eliminate the possibilities of the influences from other elements and checked each line in detail.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ArlindKadracommented, Apr 21, 2021

Interesting 😃, I think the analysis in the future should also be extended to the following datasets:

https://archive.ics.uci.edu/ml/datasets/covertype
https://archive.ics.uci.edu/ml/datasets/HIGGS https://archive.ics.uci.edu/ml/datasets/Poker+Hand

They proved tricky.

0reactions
nabenabe0928commented, Aug 12, 2021

Check if we can use generator instead of np.ndarray

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Avoid Performance Pitfalls in React with memo ...
You'll look at how different actions can trigger re-renders and how you can use Hooks and memoization to minimize expensive data calculations.
Read more >
Visual studio 2022 highly used memory - Stack Overflow
A coworker of mine found the memory leak. If you disable these two checkboxes your problem is fixed. Tools -> options -> Text...
Read more >
How to find which processes are taking all the memory?
Press SHIFT + f; Press the Letter corresponding to %MEM; Press ENTER ... This will give the top 5 processes by memory usage....
Read more >
Memory usage - Advanced R. - Hadley Wickham
Memory usage and garbage collection introduces you to the mem_used() and mem_change() functions that will help you understand how R allocates and frees ......
Read more >
How to find memory usage of individual Windows services?
And apart from observing CPU usage for each service it also makes it easy ... "Network Location Awareness" SC Config NLA Type= own...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found