Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

use_multiprocessing = False doesn't seem to work?

See original GitHub issue

Hi!

I’ve implemented a T5.1-large fine tune training pipeline in Colab but I keep getting the following error message, even after use_multiprocessing = False:

Epoch 1 of 1: 0%
0/1 [00:00<?, ?it/s]
Epochs 0/1. Running Loss: 2.2631: 100%
75331/75331 [5:26:14<00:00, 3.90it/s]
0%
1/93864 [00:30<797:05:36, 30.57s/it]

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
    task = get()
  File "/usr/lib/python3.7/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
    storage = cls._new_shared_fd(fd, size)
RuntimeError: unable to mmap 360 bytes from file <filename not specified>: Cannot allocate memory (12)

I’ve tried numerous combinations of batch size and eval size and whatnot, to no avail; it seems like use_multiprocessing = False is being ignored by modelargs, or perhaps this is tokenizer multithreading that is at issue?

TIA

Issue Analytics

State:
Created 2 years ago
Comments:5 (1 by maintainers)

Top GitHub Comments

2reactions

supernovalxcommented, Apr 24, 2022

os.environ["TOKENIZERS_PARALLELISM"] = "false"
model_args.dataloader_num_workers = 0
model_args.process_count = 1
model_args.use_multiprocessing_for_evaluation = False

this would do the trick

0reactions

stale[bot]commented, Sep 21, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.