question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AttributeError: 'NoneType' object has no attribute 'type' with overlap_comm=True and zero 2

See original GitHub issue

When I train the model with zero 2 and overlap_comm=True on one single P4d instance, I received the following error:

10.0.89.152: File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/engine.py", line 850, in backward
10.0.89.152: self.allreduce_gradients()
10.0.89.152: File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/engine.py", line 770, in allreduce_gradients
10.0.89.152: self.optimizer.overlapping_partition_gradients_reduce_epilogue()
10.0.89.152: File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 581, in overlapping_partition_gradients_reduce_epilogue
10.0.89.152: self.independent_gradient_partition_epilogue()
10.0.89.152: File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 470, in independent_gradient_partition_epilogue
10.0.89.152: self.reduce_ipg_grads()
10.0.89.152: File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 954, in reduce_ipg_grads
10.0.89.152: elements_per_buffer=self.elements_in_ipg_bucket)
10.0.89.152: File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 1123, in buffered_reduce_fallback
10.0.89.152: split_buckets = split_half_float_double(grads)
10.0.89.152: File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 40, in split_half_float_double
10.0.89.152: bucket = [t for t in tensors if t.type() == dtype]
10.0.89.152: File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 40, in <listcomp>
10.0.89.152: bucket = [t for t in tensors if t.type() == dtype]
10.0.89.152: AttributeError: 'NoneType' object has no attribute 'type'

The model config is:

    "bert_model_config": {
        "vocab_size_or_config_json_file": 32003,
        "hidden_size": 1024,
        "num_hidden_layers": 38,
        "num_attention_heads": 16,
        "intermediate_size": 4096,
        "hidden_act": "gelu",
        "hidden_dropout_prob": 0.1,
        "attention_probs_dropout_prob": 0.1,
        "max_position_embeddings": 512,
        "initializer_range": 0.02
    }

If I turn off overlap_comm, it works.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:19 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
szhengaccommented, Mar 13, 2021

@tjruwase I just did some testing. It seems that Pytorch defers the hook call for tied parameters to the first time they are used in the computational graph, so hook is actually only called once. I am closing this issue. Thanks for providing the fix.

1reaction
tjruwasecommented, Mar 9, 2021

@szhengac, I have repro’d successfully. Thanks so much.

Read more comments on GitHub >

github_iconTop Results From Across the Web

AttributeError: 'NoneType' object has no attribute 'verified_email'
I am trying to create a decorator function, which checks if user's email is verified, but I am getting this error. AttributeError: 'NoneType' ......
Read more >
[BUG] 'NoneType' object has no attribute ... - GitHub
A clear and concise description of what the bug is. Exception 'NoneType' object has no attribute 'reserve_partitioned_swap_space' with params ...
Read more >
AttributeError: 'NoneType' object has no attribute 'version_tuple
On CS9 pipeline, Container Build and Image build jobs are failing with following errors: ``` TASK [oooci-build-images : Create temp venv w/ ...
Read more >
AttributeError: 'NoneType' object has no attribute 'data'
I build a custom layer in a traditional CNN, However, when I try to train the new neural network model, the error is...
Read more >
'NoneType' object has no attribute 'identify'
I'm trying to run a console script, but it gives an error: : AttributeError:'NoneType ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found