question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Possible memory leak when inferencing BLOOM 176B

See original GitHub issue

System Info

- `Accelerate` version: 0.11.0
- Platform: Linux-4.18.0-305.25.1.el8_4.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.13
- Numpy version: 1.22.3
- PyTorch version (GPU?): 1.11.0a0+gitbc2c6ed (True)
- `Accelerate` default config:
	Not found

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

Script: https://github.com/mayank31398/Megatron-DeepSpeed/blob/add-generation-server/scripts/inference/bloom-accelerate-server.py

Usage: python scripts/inference/bloom-accelerate-server.py --model_name bigscience/bloom --dtype bf16 --log_file data.log --host $ADDRESS --port $PORT

Memory blowup over time discussed here: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/308#issuecomment-1205757494

Expected behavior

This memory leak should not occur I think.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:20 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
mayank31398commented, Sep 20, 2022

This is not an issue anymore. Thanks for helping guys. Closing this 😃

3reactions
stas00commented, Aug 23, 2022

what @ydshieh said and more:

to track real memory usage / debug potential leaks always:

  1. call gc.collect() first - since python’s GC is scheduled and w/o it you might miss object and its associated memory release
  2. then clear the cuda cache
  3. measure

but don’t do any of the above for production.

Read more comments on GitHub >

github_iconTop Results From Across the Web

bigscience/bloom · Hosting bloom 176B model for inference
I had a look at the AWS instance types with GPU available and of the G5 instance family, not even the biggest instance...
Read more >
[Project] Run and fine-tune BLOOM-176B at home using a ...
We made a library for inference/fine-tuning of open 175B+ language ... approach to make BLOOM consume as few memory as possible in Petals....
Read more >
Stas Bekman (@StasBekman) / Twitter
Sharing my investigation process notes when trying to understand why MMAP'ed datasets appear to be leaking memory on each iteration - but there...
Read more >
BLOOM: The largest open multilingual language model
It is possible to run inference locally, even without very much RAM/VRAM (though it will be very slow). Get the most recent versions...
Read more >
(PDF) BLOOM: A 176B-Parameter Open-Access Multilingual ...
BLOOM is a decoder-only Transformer language model that was trained on ... Figures - available via license: Creative Commons Attribution 4.0 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found