question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

_pickle.PicklingError: Can't pickle typing.Union[str, NoneType]: it's not the same object as typing.Union

See original GitHub issue

Environment info

  • transformers version:3.4.0
  • Platform:linux
  • Python version:3.6
  • PyTorch version (GPU?):1.6 cuda10
  • Tensorflow version (GPU?):
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

When I try to train roberta-wwm from scratch for my dataset , I get this error when I follow transformers’ run_mlm_wwm.py code

!python run_mlm_wwm.py --model_name_or_path hfl/chinese-roberta-wwm-ext --train_file ../../../../pretrain_data/pretrain_train.txt --validation_file ../../../../pretrain_data/pretrain_val.txt --train_ref_file ../../../../pretrain_data/ref_train.txt --validation_ref_file ../../../../pretrain_data/ref_val.txt --do_train --do_eval --output_dir ./output
All the weights of BertForMaskedLM were initialized from the model checkpoint at hfl/chinese-roberta-wwm-ext.
If your task is similar to the task the model of the checkpoint was trained on, you can already use BertForMaskedLM for predictions without further training.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Traceback (most recent call last):
  File "run_mlm_wwm.py", line 333, in <module>
    main()
  File "run_mlm_wwm.py", line 274, in main
    load_from_cache_file=not data_args.overwrite_cache,
  File "/usr/local/lib/python3.6/dist-packages/datasets/dataset_dict.py", line 300, in map
    for k, dataset in self.items()
  File "/usr/local/lib/python3.6/dist-packages/datasets/dataset_dict.py", line 300, in <dictcomp>
    for k, dataset in self.items()
  File "/usr/local/lib/python3.6/dist-packages/datasets/arrow_dataset.py", line 1256, in map
    update_data=update_data,
  File "/usr/local/lib/python3.6/dist-packages/datasets/arrow_dataset.py", line 156, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/datasets/fingerprint.py", line 158, in wrapper
    self._fingerprint, transform, kwargs_for_fingerprint
  File "/usr/local/lib/python3.6/dist-packages/datasets/fingerprint.py", line 105, in update_fingerprint
    hasher.update(transform_args[key])
  File "/usr/local/lib/python3.6/dist-packages/datasets/fingerprint.py", line 57, in update
    self.m.update(self.hash(value).encode("utf-8"))
  File "/usr/local/lib/python3.6/dist-packages/datasets/fingerprint.py", line 53, in hash
    return cls.hash_default(value)
  File "/usr/local/lib/python3.6/dist-packages/datasets/fingerprint.py", line 46, in hash_default
    return cls.hash_bytes(dumps(value))
  File "/usr/local/lib/python3.6/dist-packages/datasets/utils/py_utils.py", line 367, in dumps
    dump(obj, file)
  File "/usr/local/lib/python3.6/dist-packages/datasets/utils/py_utils.py", line 339, in dump
    Pickler(file, recurse=True).dump(obj)
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 454, in dump
    StockPickler.dump(self, obj)
  File "/usr/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 1447, in save_function
    obj.__dict__, fkwdefaults), obj=obj)
  File "/usr/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 1178, in save_cell
    pickler.save_reduce(_create_cell, (f,), obj=obj)
  File "/usr/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 605, in save_reduce
    save(cls)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 1374, in save_type
    obj.__bases__, _dict), obj=obj)
  File "/usr/lib/python3.6/pickle.py", line 610, in save_reduce
    save(args)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 941, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/lib/python3.6/dist-packages/dill/_dill.py", line 941, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 507, in save
    self.save_global(obj, rv)
  File "/usr/lib/python3.6/pickle.py", line 927, in save_global
    (obj, module_name, name))
**_pickle.PicklingError: Can't pickle typing.Union[str, NoneType]: it's not the same object as typing.Union**

please help me.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
LysandreJikcommented, Nov 11, 2020

Maybe @sgugger has an idea

0reactions
stale[bot]commented, Jan 16, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python multiprocessing PicklingError: Can't pickle <type ...
Here is a list of what can be pickled. In particular, functions are only picklable if they are defined at the top-level of...
Read more >
Can't pickle typing.Union[torch.Tensor, NoneType] - CSDN博客
_pickle.PicklingError: Can't pickle typing.Union[torch.Tensor, NoneType]: it's not the same object as typing.Union.
Read more >
Issue 32873: Pickling of typing types - Python tracker
In 3.6 typing types are pickled by names: >>> import pickle, ... Almost no generics are actual class objects, so they are pickled...
Read more >
PicklingError Can't pickle <class 'torch._C._VariableFunctions'>
I've created a module that contains two classes, a NetDictionary class that inherits from dict(), and a Net class that inherits from nn....
Read more >
dill module documentation — dill 0.3.7.dev0 documentation
dill: a utility for serialization of python objects. Based on code written by Oren Tirosh and ... This takes a binary file for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found