question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ZeRO 3: IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

See original GitHub issue

If I use the following snippet in BingBERT example with ZeRO 3, I received IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1) no matter if Transformer CUDA layer is enabled or not.

    with deepspeed.zero.Init(remote_device='cpu', pin_memory=True, enabled=True):
        model = BertMultiTask(args)

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
szhengaccommented, Apr 15, 2021

@tjruwase

Traceback (most recent call last):
  File "/fsx/szhengac/DeepSpeedExamples/bing_bert/deepspeed_train.py", line 597, in <module>
    main()
  File "/fsx/szhengac/DeepSpeedExamples/bing_bert/deepspeed_train.py", line 586, in main
    model, optimizer = prepare_model_optimizer(args)
  File "/fsx/szhengac/DeepSpeedExamples/bing_bert/deepspeed_train.py", line 461, in prepare_model_optimizer
    model = BertMultiTask(args)
  File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 208, in __exit__
    _disable_class(subclass)
  File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 204, in _disable_class
    cls.__init__ = cls._old_init
AttributeError: type object 'LinearActivation' has no attribute '_old_init'
04/15/2021 06:17:35 - INFO - nvidia.modelingpreln -   Init BERT pretrain model
Traceback (most recent call last):
  File "/fsx/szhengac/DeepSpeedExamples/bing_bert/deepspeed_train.py", line 461, in prepare_model_optimizer
    model = BertMultiTask(args)
  File "/fsx/szhengac/DeepSpeedExamples/bing_bert/turing/models.py", line 123, in __init__
    self.network = BertForPreTrainingPreLN(bert_config, args)
  File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 164, in wrapper
    f(module, *args, **kwargs)
  File "/fsx/szhengac/DeepSpeedExamples/bing_bert/nvidia/modelingpreln.py", line 1119, in __init__
    config, self.bert.embeddings.word_embeddings.weight)
  File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 164, in wrapper
    f(module, *args, **kwargs)
  File "/fsx/szhengac/DeepSpeedExamples/bing_bert/nvidia/modelingpreln.py", line 760, in __init__
    bert_model_embedding_weights)
  File "/usr/local/lib64/python3.7/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 164, in wrapper
    f(module, *args, **kwargs)
  File "/fsx/szhengac/DeepSpeedExamples/bing_bert/nvidia/modelingpreln.py", line 712, in __init__
    self.decoder = nn.Linear(bert_model_embedding_weights.size(1),
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

0reactions
szhengaccommented, May 3, 2021

I was using an older version where there are no these two lines

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: dimension out of range (expected to be in ...
The value of each element should be the 0-based target class ID. Here's an example. Suppose you have batch size B=2 , and...
Read more >
dimension out of range (expected to be in range of [-1, 0], but ...
The way one calls NLLLoss is loss_fn(input, target) . Right now your input and target tensors don't have the right size. That being...
Read more >
IndexError: Dimension out of range (expected to be in range of
It seems your code uses nn.CrossEntropyLoss (a custom implementation?) at one point, which calls into F.log_softmax(input, dim) . The input ...
Read more >
[Solved][PyTorch] IndexError: Dimension out of range ...
Today I got an error message as following (In a team project source code): "IndexError: Dimension out of range (expected to be in...
Read more >
PyTorch Dimension out of range (expected to be in range of
... 0], but got 1). deep-learningmachine-learningpython-3.xpytorch ... IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found