question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TypeError: cannot pickle '_LazyModule' object

See original GitHub issue

@stas00 edit: please see https://github.com/huggingface/transformers/issues/12549#issuecomment-875287701 for the short reproduction script.


Environment info

  • transformers version: 4.9.0.dev0
  • Platform: Linux with Nvidia P40
  • Python version: 3.8.0
  • PyTorch version (GPU?): 1.8.0
  • Tensorflow version (GPU?):
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: yes

Who can help

@stas00 @patrickvonplaten, @LysandreJik

Information

Model I am using (Bert, XLNet …): GPT2

The problem arises when using:

  • the official example scripts: (give details below)
  • [√] my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • [√] my own task or dataset: (give details below)

To reproduce

I am running the minimal command:

python run_clm.py \
    --model_name_or_path /mycheckpoin/ \
    --train_file train.txt \
    --validation_file  eval.txt \
    --do_train \
    --do_eval \
    --output_dir ./models/ \
    --no_cuda False \
    --fp16 \
    --sharded_ddp simple \
    --num_train_epochs 3.0 \
    --disable_tqdm False \
    --save_steps 100 \
    --preprocessing_num_workers 32 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4

and I modified the following parts of the script ‘run_clm.py’, and the parameter rank passed in training_args.local_rank

def init_process(rank, size, fn, backend='gloo'):
    """ Initialize the distributed environment. """
    os.environ['MASTER_ADDR'] = '127.0.0.1'
    os.environ['MASTER_PORT'] = '29500'
    dist.init_process_group(backend, rank=rank, world_size=size)
    fn(rank)

if __name__ == "__main__":
    # main()
    # size = int(os.environ['WORLD_SIZE'])
    size = int(torch.cuda.device_count())
    print(size)
    processes = []
    mp.set_start_method("spawn")
    for rank in range(size):
        p = mp.Process(target=init_process, args=(rank, main))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()

the traceback informations are:

Process Process-2:
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/media/cfs/gonglixing/9Nctl/gpt_v2/run_clm_v3.py", line 511, in init_process
    fn(rank, size)
  File "/media/cfs/gonglixing/9Nctl/gpt_v2/run_clm_v3.py", line 367, in main
    tokenized_datasets = raw_datasets.map(
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/datasets/dataset_dict.py", line 471, in map
    {
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/datasets/dataset_dict.py", line 472, in <dictcomp>
    k: dataset.map(
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1736, in map
    transformed_shards = [r.get() for r in results]
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1736, in <listcomp>
    transformed_shards = [r.get() for r in results]
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/multiprocess/pool.py", line 771, in get
    raise self._value
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/multiprocess/pool.py", line 537, in _handle_tasks
    put(task)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/multiprocess/connection.py", line 209, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/multiprocess/reduction.py", line 54, in dumps
    cls(buf, protocol, *args, **kwds).dump(obj)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/dill/_dill.py", line 498, in dump
    StockPickler.dump(self, obj)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 487, in dump
    self.save(obj)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 901, in save_tuple
    save(element)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 997, in _batch_setitems
    save(v)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/dill/_dill.py", line 1493, in save_function
    pickler.save_reduce(_create_function, (obj.__code__,
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 692, in save_reduce
    save(args)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 901, in save_tuple
    save(element)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 997, in _batch_setitems
    save(v)
  File "/usr/local/anaconda3/envs/py38/lib/python3.8/pickle.py", line 578, in save
    rv = reduce(self.proto)
TypeError: cannot pickle '_LazyModule' object

I run the following command based on the original script, it works well. The reason why I don’t use this command is that our cluster doesn’t support this way of passing parameters: "-m torch.distributed.launch --nproc_per_node=4 "

python -m torch.distributed.launch --nproc_per_node=4 run_clm.py \
    --model_name_or_path /mycheckpoin/ \
    --train_file train.txt \
    --validation_file  eval.txt \
    --do_train \
    --do_eval \
    --output_dir ./models/ \
    --no_cuda False \
    --fp16 \
    --sharded_ddp simple \
    --num_train_epochs 3.0 \
    --disable_tqdm False \
    --save_steps 100 \
    --preprocessing_num_workers 32 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4

Expected behavior

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:15 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
lhoestqcommented, Jul 7, 2021

Note that we can easily make _LazyModule picklable. I can open a PR if needed to implement a __reduce__ method for _LazyModule. It’s the only object that prevents transformers from being picklable.

EDIT: here it is: https://github.com/huggingface/transformers/pull/12552

This is just a way to easily fix this issue, but I think we should definitely keep trying to figure out why it tried to pickle transformers in the first place. This might come from dill that pickles the globals of some environments when pickling any object

2reactions
stas00commented, Jul 7, 2021

OK, here is the minimal reproducible script. Totally unrelated to transformers it seems except for the import of transformers

import logging
import math
import os
import sys
from dataclasses import dataclass, field
from typing import Optional

import datasets
from datasets import load_dataset

import transformers

import torch
import torch.distributed as dist
import torch.multiprocessing as mp

def main(rank, size):

    def tokenize_function(examples):
        return None

    raw_datasets = load_dataset("wikitext", "wikitext-2-raw-v1")
    tokenized_datasets = raw_datasets.map(
        tokenize_function,
        num_proc=32,
    )

def _mp_fn(index):
    # For xla_spawn (TPUs)
    main()

def init_process(rank, size, fn, backend='gloo'):
    """ Initialize the distributed environment. """
    os.environ['MASTER_ADDR'] = '127.0.0.1'
    os.environ['MASTER_PORT'] = '29500'
    dist.init_process_group(backend, rank=rank, world_size=size)
    fn(rank, size)


if __name__ == "__main__":
    # main()
    # size = int(os.environ['WORLD_SIZE'])
    size = int(torch.cuda.device_count())
    print(size)
    processes = []
    mp.set_start_method("spawn")
    for rank in range(size):
        p = mp.Process(target=init_process, args=(rank, size, main))
        p.start()
        processes.append(p)

    for p in processes:
        p.join()

this still fails with the same error.

python run_clm.py
2
Reusing dataset wikitext (/home/stas/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20)
Reusing dataset wikitext (/home/stas/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20)
Process Process-1:
Process Process-2:
Traceback (most recent call last):
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/nvme1/code/huggingface/users/lancekung/run_clm.py", line 60, in init_process
    fn(rank, size)
  File "/mnt/nvme1/code/huggingface/users/lancekung/run_clm.py", line 46, in main
    tokenized_datasets = raw_datasets.map(
  File "/mnt/nvme1/code/huggingface/datasets-master/src/datasets/dataset_dict.py", line 471, in map
    {
  File "/mnt/nvme1/code/huggingface/datasets-master/src/datasets/dataset_dict.py", line 472, in <dictcomp>
    k: dataset.map(
  File "/mnt/nvme1/code/huggingface/datasets-master/src/datasets/arrow_dataset.py", line 1736, in map
    transformed_shards = [r.get() for r in results]
  File "/mnt/nvme1/code/huggingface/datasets-master/src/datasets/arrow_dataset.py", line 1736, in <listcomp>
    transformed_shards = [r.get() for r in results]
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/multiprocess/pool.py", line 771, in get
    raise self._value
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/multiprocess/pool.py", line 537, in _handle_tasks
    put(task)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/multiprocess/connection.py", line 209, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/multiprocess/reduction.py", line 54, in dumps
    cls(buf, protocol, *args, **kwds).dump(obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/dill/_dill.py", line 498, in dump
    StockPickler.dump(self, obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 487, in dump
    self.save(obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 901, in save_tuple
    save(element)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 997, in _batch_setitems
    save(v)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/dill/_dill.py", line 1493, in save_function
    pickler.save_reduce(_create_function, (obj.__code__,
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 692, in save_reduce
    save(args)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 901, in save_tuple
    save(element)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 997, in _batch_setitems
    save(v)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 578, in save
    rv = reduce(self.proto)
TypeError: cannot pickle '_LazyModule' object
Traceback (most recent call last):
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/mnt/nvme1/code/huggingface/users/lancekung/run_clm.py", line 60, in init_process
    fn(rank, size)
  File "/mnt/nvme1/code/huggingface/users/lancekung/run_clm.py", line 46, in main
    tokenized_datasets = raw_datasets.map(
  File "/mnt/nvme1/code/huggingface/datasets-master/src/datasets/dataset_dict.py", line 471, in map
    {
  File "/mnt/nvme1/code/huggingface/datasets-master/src/datasets/dataset_dict.py", line 472, in <dictcomp>
    k: dataset.map(
  File "/mnt/nvme1/code/huggingface/datasets-master/src/datasets/arrow_dataset.py", line 1736, in map
    transformed_shards = [r.get() for r in results]
  File "/mnt/nvme1/code/huggingface/datasets-master/src/datasets/arrow_dataset.py", line 1736, in <listcomp>
    transformed_shards = [r.get() for r in results]
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/multiprocess/pool.py", line 771, in get
    raise self._value
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/multiprocess/pool.py", line 537, in _handle_tasks
    put(task)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/multiprocess/connection.py", line 209, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/multiprocess/reduction.py", line 54, in dumps
    cls(buf, protocol, *args, **kwds).dump(obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/dill/_dill.py", line 498, in dump
    StockPickler.dump(self, obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 487, in dump
    self.save(obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 901, in save_tuple
    save(element)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 997, in _batch_setitems
    save(v)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/dill/_dill.py", line 1493, in save_function
    pickler.save_reduce(_create_function, (obj.__code__,
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 692, in save_reduce
    save(args)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 901, in save_tuple
    save(element)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/dill/_dill.py", line 990, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 971, in save_dict
    self._batch_setitems(obj.items())
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 997, in _batch_setitems
    save(v)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/pickle.py", line 578, in save
    rv = reduce(self.proto)
TypeError: cannot pickle '_LazyModule' object

But if you either:

  • comment out import transformers
  • or set num_proc=1 in datasets.map (instead of n>1) all is good.

@lhoestq, @albertvillanova - does this ring any bells? Clearly transformers loads some module lazily and trips up datasets even though transformers isn’t really used here directly. Thank you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python: can't pickle module objects error - Stack Overflow
Python's inability to pickle module objects is the real problem. Is there a good reason? I don't think so. Having module objects unpicklable ......
Read more >
Issues With Pickle Module — mod_wsgi 4.9.4 documentation
It occurs because the copy of the original function object is still internally identified by the name which it was assigned at the...
Read more >
can't get attribute pickle - You.com | The search engine you control.
Your problem is (in part) because you only have one file in your program. pickle is lazy and does not serialize class definitions...
Read more >
What is "typeerror: 'module' object is not callable"
This error statement TypeError: 'module' object is not callable is raised as you are being confused about the Class name and Module name....
Read more >
TypeError : cannot pickle 'module' object - Launchpad Bugs
TypeError : cannot pickle 'module' object. Bug #1989315 reported by daniel jeans on 2022-09-12. 6. This bug affects 1 person ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found