question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading Big Model exceed max_memory

See original GitHub issue

System Info

- `Accelerate` version: 0.11.0.dev0
- Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Numpy version: 1.21.6
- PyTorch version (GPU?): 1.11.0+cu113 (True)
- `Accelerate` default config:
	Not found

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

https://colab.research.google.com/drive/1lh9rduNcnGNPHgqWfgTmRK_5k51gF75q?usp=sharing

Expected behavior

When using "max_memory" parameter, the script should only use the specified max memory. However, it exceed the max_memory parameter and consumes the whole memory, then it crashes Colab.

Any idea why it doesn't respect the parameter?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
xnohatcommented, Jul 14, 2022

I success load big model on Colab Pro with this guide from Huggingface

https://huggingface.co/docs/transformers/big_models https://huggingface.co/docs/accelerate/big_modeling

model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-6b3", torch_dtype=torch.float16, use_cache=True, low_cpu_mem_usage=True)

import os
import tempfile

offload_dir='/content/offload'
os.makedirs(offload_dir) if not os.path.exists(offload_dir) else None

with tempfile.TemporaryDirectory() as tmp_dir:
    model.save_pretrained(tmp_dir, max_shard_size="200MB")
    print('Temp Dir Path:', tmp_dir)
    print(sorted(os.listdir(tmp_dir)))
    new_model = AutoModelForCausalLM.from_pretrained(tmp_dir, low_cpu_mem_usage=True, device_map="auto", offload_folder=offload_dir)
0reactions
github-actions[bot]commented, Aug 7, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Handling big models - Hugging Face
Load those weights inside the model. While this works very well for regularly sized models, this workflow has some clear limitations when we...
Read more >
Memory management best practices | Memorystore for Redis
If system memory usage ratio exceeds 80% you should lower maxmemory-gb , but first view how the system memory usage ratio has changed...
Read more >
Maximum Memory or File Size Exceeded - Microsoft Support
This message appears when the maximum memory or file size limit for a Data Model is exceeded. In the 32-bit version of Office,...
Read more >
What does Redis do when it runs out of memory?
Another way to use Redis as a cache is the maxmemory directive, a feature that allows specifying a maximum amount of memory to...
Read more >
Min and Max memory configurations in SQL Server Database ...
SQL Server uses buffer cache to load pages from the disk depending on the workload requirements. It is necessary to minimize the disk...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found