Adafactor gives RuntimeError: tensors must be 2-D
See original GitHub issueEnvironment info
transformers
version: 4.2.2 (also tried with the latest version v.4.5.1)- Platform: Linux-4.4.0-1127-aws-x86_64-with-debian-stretch-sid
- Python version: 3.6.13
- PyTorch version (GPU?): 1.7.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: False
Who can help
Information
Model I am using (Bert, XLNet …):
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
In my code, I replaced AdamW (which is working just fine) with Adafactor and then I get an error (see below). The code is using also gradient checkpointing. Using Adafactor from FairSeq works well
# Replacing AdamW
# optimizer = AdamW([{'params': model.parameters()}], lr=args.lr, eps=args.epsilon)
# with Adafactor
optimizer = Adafactor(
[{'params': model.parameters()}], lr=None,
eps=(1e-30, 1e-3),
clip_threshold=1.0,
decay_rate=-0.8,
beta1=None,
weight_decay=0.0,
relative_step=True,
scale_parameter=True,
warmup_init=True
)
Output:
home/ubuntu/transformers/src/transformers/optimization.py:557: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /opt/conda/conda-bld/pytorch_1607370116979/work/torch/csrc/utils/python_arg_parser.cpp:882.)
exp_avg_sq_row.mul_(beta2t).add_(1.0 - beta2t, update.mean(dim=-1))
0%|▎ | 19/6858 [00:37<3:42:15, 1.95s/it]
Traceback (most recent call last):
File "main.py", line 519, in <module>
main()
File "main.py", line 510, in main
train(allincl_model, epoch, optimizer, scheduler, criterion)
File "main.py", line 384, in train
optimizer.step()
File "/home/ubuntu/transformers/src/transformers/optimization.py", line 561, in step
update = self._approx_sq_grad(exp_avg_sq_row, exp_avg_sq_col)
File "/home/ubuntu/transformers/src/transformers/optimization.py", line 492, in _approx_sq_grad
return torch.mm(r_factor.unsqueeze(-1), c_factor.unsqueeze(0))
RuntimeError: tensors must be 2-D
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (4 by maintainers)
Top Results From Across the Web
Adafactor step() give me "tensors must be 2-D" error - Beginners
Hi, I want to try Adafactor optimizer but when I use optimizer.step() , I got an error like below. I instantiate Adafactor optimizer...
Read more >RuntimeError: tensors must be 2-D - Stack Overflow
I'm curious about the meaning of 'RuntimeError: tensors must be 2-D'. We would appreciate it if you could tell us why it happened...
Read more >torch_optimizer.adafactor — pytorch-optimizer documentation
It has been proposed in: `Adafactor: Adaptive Learning Rates with Sublinear ... is not None return factored, use_first_moment def _rms(self, tensor: torch.
Read more >abhibongale/know-tensors-with-pytorch - Jovian
Collaborate with abhibongale on know-tensors-with-pytorch notebook. ... RuntimeError: Sizes of tensors must match except in dimension 1.
Read more >tf.gradients | TensorFlow v2.11.0
ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor , holding the gradients received...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@patrickvonplaten Thank you for your PR and hope pytorch gets better 😃
Definitely, just change line 506-508 of transformers/optimization.py as I mentioned above then all done! I’m creating my custom optimizer just because I’m not familiar with pull request process and in a hurry with my development needs. I would really appreciate it if you can help initiate a pull request.
I will attach my local test code here to help your local test: