[memo] High memory consumption and the places of doubts
See original GitHub issueI write down the current memory usage as a memo just in case when we encounter memory leak issues in the future. This post is based on the current implementation.
When we run a dataset with the size of 300B, AutoPytorch consumes ~1.5GB and the followings are the major source of the memory consumptions:
Source | Consumption [GB] |
---|---|
Import modules | 0.35 |
Dask Client | 0.35 |
Logger (Thread safe) | 0.4 |
Running of context.Process in multiprocessing module | 0.4 |
Model | 0 ~ inf |
Total | 1.5 ~ inf |
When we run a dataset with the size of 300MB (400,000 instances x 80 features) such as Albert, AutoPytorch consumes ~2.5GB and the followings are the major source of the memory consumptions:
Source | Consumption [GB] |
---|---|
Import modules | 0.35 |
Dask Client | 0.35 |
Logger (Thread safe) | 0.4 |
Dataset itself | 0.3 |
self.categories in InputValidator | 0.3 |
Running of context.Process in multiprocessing module | 0.4 |
Model (e.g. LightGBM) | 0.4 ~ inf |
Total | 2.5 ~ inf |
All the information was obtained by:
$ mprof run --include-children python -m examples.tabular.20_basics.example_tabular_classification
and the logger which I set for the debugging. Note that I also added time.sleep(0.5)
before and after the line of interest to eliminate the possibilities of the influences from other elements and checked each line in detail.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Interesting 😃, I think the analysis in the future should also be extended to the following datasets:
https://archive.ics.uci.edu/ml/datasets/covertype
https://archive.ics.uci.edu/ml/datasets/HIGGS https://archive.ics.uci.edu/ml/datasets/Poker+Hand
They proved tricky.
Check if we can use
generator
instead ofnp.ndarray