question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adafactor gives RuntimeError: tensors must be 2-D

See original GitHub issue

Environment info

  • transformers version: 4.2.2 (also tried with the latest version v.4.5.1)
  • Platform: Linux-4.4.0-1127-aws-x86_64-with-debian-stretch-sid
  • Python version: 3.6.13
  • PyTorch version (GPU?): 1.7.1 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: True
  • Using distributed or parallel set-up in script?: False

Who can help

@sgugger @patrickvonplaten

Information

Model I am using (Bert, XLNet …):

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

In my code, I replaced AdamW (which is working just fine) with Adafactor and then I get an error (see below). The code is using also gradient checkpointing. Using Adafactor from FairSeq works well

# Replacing AdamW
# optimizer = AdamW([{'params': model.parameters()}], lr=args.lr, eps=args.epsilon)
# with Adafactor

optimizer = Adafactor(
        [{'params': model.parameters()}], lr=None,
        eps=(1e-30, 1e-3),
        clip_threshold=1.0,
        decay_rate=-0.8,
        beta1=None,
        weight_decay=0.0,
        relative_step=True,
        scale_parameter=True,
        warmup_init=True
    )

Output:

home/ubuntu/transformers/src/transformers/optimization.py:557: UserWarning: This overload of add_ is deprecated:
        add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
        add_(Tensor other, *, Number alpha) (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370116979/work/torch/csrc/utils/python_arg_parser.cpp:882.)
  exp_avg_sq_row.mul_(beta2t).add_(1.0 - beta2t, update.mean(dim=-1))
  0%|▎                                                                                                                   | 19/6858 [00:37<3:42:15,  1.95s/it]
Traceback (most recent call last):
  File "main.py", line 519, in <module>
    main()
  File "main.py", line 510, in main
    train(allincl_model, epoch, optimizer, scheduler, criterion)
  File "main.py", line 384, in train
    optimizer.step()
  File "/home/ubuntu/transformers/src/transformers/optimization.py", line 561, in step
    update = self._approx_sq_grad(exp_avg_sq_row, exp_avg_sq_col)
  File "/home/ubuntu/transformers/src/transformers/optimization.py", line 492, in _approx_sq_grad
    return torch.mm(r_factor.unsqueeze(-1), c_factor.unsqueeze(0))
RuntimeError: tensors must be 2-D

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ybch14commented, Dec 10, 2021

@patrickvonplaten Thank you for your PR and hope pytorch gets better 😃

1reaction
ybch14commented, Nov 4, 2021

@ybch14 - do you think this could also be fixed in transformers Adafactor implementation?

Definitely, just change line 506-508 of transformers/optimization.py as I mentioned above then all done! I’m creating my custom optimizer just because I’m not familiar with pull request process and in a hurry with my development needs. I would really appreciate it if you can help initiate a pull request.

I will attach my local test code here to help your local test:

import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers.optimization import Adafactor

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.w = nn.Parameter(torch.randn(2, 3, 4), requires_grad=True)

    def forward(self):
        return self.w.mean().sigmoid()

device = torch.device("cuda")
target = torch.tensor(1.).to(device)
model = Model().to(device)
y = model()
loss = F.binary_cross_entropy(y, target)
loss.backward()
optimizer = Adafactor(model.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None)
optimizer.step()
Read more comments on GitHub >

github_iconTop Results From Across the Web

Adafactor step() give me "tensors must be 2-D" error - Beginners
Hi, I want to try Adafactor optimizer but when I use optimizer.step() , I got an error like below. I instantiate Adafactor optimizer...
Read more >
RuntimeError: tensors must be 2-D - Stack Overflow
I'm curious about the meaning of 'RuntimeError: tensors must be 2-D'. We would appreciate it if you could tell us why it happened...
Read more >
torch_optimizer.adafactor — pytorch-optimizer documentation
It has been proposed in: `Adafactor: Adaptive Learning Rates with Sublinear ... is not None return factored, use_first_moment def _rms(self, tensor: torch.
Read more >
abhibongale/know-tensors-with-pytorch - Jovian
Collaborate with abhibongale on know-tensors-with-pytorch notebook. ... RuntimeError: Sizes of tensors must match except in dimension 1.
Read more >
tf.gradients | TensorFlow v2.11.0
ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor , holding the gradients received...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found