Possible memory leak when inferencing BLOOM 176B
See original GitHub issueSystem Info
- `Accelerate` version: 0.11.0
- Platform: Linux-4.18.0-305.25.1.el8_4.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.13
- Numpy version: 1.22.3
- PyTorch version (GPU?): 1.11.0a0+gitbc2c6ed (True)
- `Accelerate` default config:
Not found
Information
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - My own task or dataset (give details below)
Reproduction
Usage: python scripts/inference/bloom-accelerate-server.py --model_name bigscience/bloom --dtype bf16 --log_file data.log --host $ADDRESS --port $PORT
Memory blowup over time discussed here: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/308#issuecomment-1205757494
Expected behavior
This memory leak should not occur I think.
Issue Analytics
- State:
- Created a year ago
- Comments:20 (3 by maintainers)
Top Results From Across the Web
bigscience/bloom · Hosting bloom 176B model for inference
I had a look at the AWS instance types with GPU available and of the G5 instance family, not even the biggest instance...
Read more >[Project] Run and fine-tune BLOOM-176B at home using a ...
We made a library for inference/fine-tuning of open 175B+ language ... approach to make BLOOM consume as few memory as possible in Petals....
Read more >Stas Bekman (@StasBekman) / Twitter
Sharing my investigation process notes when trying to understand why MMAP'ed datasets appear to be leaking memory on each iteration - but there...
Read more >BLOOM: The largest open multilingual language model
It is possible to run inference locally, even without very much RAM/VRAM (though it will be very slow). Get the most recent versions...
Read more >(PDF) BLOOM: A 176B-Parameter Open-Access Multilingual ...
BLOOM is a decoder-only Transformer language model that was trained on ... Figures - available via license: Creative Commons Attribution 4.0 ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is not an issue anymore. Thanks for helping guys. Closing this 😃
what @ydshieh said and more:
to track real memory usage / debug potential leaks always:
gc.collect()
first - since python’s GC is scheduled and w/o it you might miss object and its associated memory releasebut don’t do any of the above for production.