Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adafactor gives RuntimeError: tensors must be 2-D

See original GitHub issue

Environment info

transformers version: 4.2.2 (also tried with the latest version v.4.5.1)
Platform: Linux-4.4.0-1127-aws-x86_64-with-debian-stretch-sid
Python version: 3.6.13
PyTorch version (GPU?): 1.7.1 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: True
Using distributed or parallel set-up in script?: False

Who can help

@sgugger @patrickvonplaten

Information

Model I am using (Bert, XLNet …):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

In my code, I replaced AdamW (which is working just fine) with Adafactor and then I get an error (see below). The code is using also gradient checkpointing. Using Adafactor from FairSeq works well

# Replacing AdamW
# optimizer = AdamW([{'params': model.parameters()}], lr=args.lr, eps=args.epsilon)
# with Adafactor

optimizer = Adafactor(
        [{'params': model.parameters()}], lr=None,
        eps=(1e-30, 1e-3),
        clip_threshold=1.0,
        decay_rate=-0.8,
        beta1=None,
        weight_decay=0.0,
        relative_step=True,
        scale_parameter=True,
        warmup_init=True
    )

Output:

home/ubuntu/transformers/src/transformers/optimization.py:557: UserWarning: This overload of add_ is deprecated:
        add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
        add_(Tensor other, *, Number alpha) (Triggered internally at  /opt/conda/conda-bld/pytorch_1607370116979/work/torch/csrc/utils/python_arg_parser.cpp:882.)
  exp_avg_sq_row.mul_(beta2t).add_(1.0 - beta2t, update.mean(dim=-1))
  0%|▎                                                                                                                   | 19/6858 [00:37<3:42:15,  1.95s/it]
Traceback (most recent call last):
  File "main.py", line 519, in <module>
    main()
  File "main.py", line 510, in main
    train(allincl_model, epoch, optimizer, scheduler, criterion)
  File "main.py", line 384, in train
    optimizer.step()
  File "/home/ubuntu/transformers/src/transformers/optimization.py", line 561, in step
    update = self._approx_sq_grad(exp_avg_sq_row, exp_avg_sq_col)
  File "/home/ubuntu/transformers/src/transformers/optimization.py", line 492, in _approx_sq_grad
    return torch.mm(r_factor.unsqueeze(-1), c_factor.unsqueeze(0))
RuntimeError: tensors must be 2-D

Issue Analytics

State:
Created 2 years ago
Comments:10 (4 by maintainers)

Top GitHub Comments

1reaction

ybch14commented, Dec 10, 2021

@patrickvonplaten Thank you for your PR and hope pytorch gets better 😃

1reaction

ybch14commented, Nov 4, 2021

@ybch14 - do you think this could also be fixed in transformers Adafactor implementation?

Definitely, just change line 506-508 of transformers/optimization.py as I mentioned above then all done! I’m creating my custom optimizer just because I’m not familiar with pull request process and in a hurry with my development needs. I would really appreciate it if you can help initiate a pull request.

I will attach my local test code here to help your local test:

import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers.optimization import Adafactor

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.w = nn.Parameter(torch.randn(2, 3, 4), requires_grad=True)

    def forward(self):
        return self.w.mean().sigmoid()

device = torch.device("cuda")
target = torch.tensor(1.).to(device)
model = Model().to(device)
y = model()
loss = F.binary_cross_entropy(y, target)
loss.backward()
optimizer = Adafactor(model.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None)
optimizer.step()

Top Results From Across the Web

Adafactor step() give me "tensors must be 2-D" error - Beginners

Hi, I want to try Adafactor optimizer but when I use optimizer.step() , I got an error like below. I instantiate Adafactor optimizer...

RuntimeError: tensors must be 2-D - Stack Overflow

I'm curious about the meaning of 'RuntimeError: tensors must be 2-D'. We would appreciate it if you could tell us why it happened...

torch_optimizer.adafactor — pytorch-optimizer documentation

It has been proposed in: `Adafactor: Adaptive Learning Rates with Sublinear ... is not None return factored, use_first_moment def _rms(self, tensor: torch.

abhibongale/know-tensors-with-pytorch - Jovian

Collaborate with abhibongale on know-tensors-with-pytorch notebook. ... RuntimeError: Sizes of tensors must match except in dimension 1.

tf.gradients | TensorFlow v2.11.0

ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor , holding the gradients received...