Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The problem of train

See original GitHub issue

When I run train.py, there is an error. What is the problem?The error message is as follows:

| epoch 001: 0%| | 0/820 [00:00<?, ?it/s]/home/suxia/anaconda3/envs/python36/lib/python3.6/site-packages/torch/autograd/function.py:41: UserWarning: mark_shared_storage is deprecated. Tensors with shared storages are automatically tracked. Note that calls to set_() are not tracked 'mark_shared_storage is deprecated. ’ THCudaCheck FAIL file=/home/suxia/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory | WARNING: ran out of memory, skipping batch Traceback (most recent call last): File “train.py”, line 29, in <module> main(args) File “train.py”, line 23, in main singleprocess_main(args) File “/home/suxia/fairseq-LM-0522/singleprocess_train.py”, line 80, in main train(args, trainer, dataset, epoch, batch_offset) File “/home/suxia/fairseq-LM-0522/singleprocess_train.py”, line 146, in train log_output = trainer.train_step(sample) File “/home/suxia/fairseq-LM-0522/fairseq/trainer.py”, line 103, in train_step grad_norm, ooms_bwd = self._backward_and_opt(loss, grad_denom) File “/home/suxia/fairseq-LM-0522/fairseq/trainer.py”, line 189, in backward_and_opt p.grad.data.div(grad_denom) AttributeError: ‘NoneType’ object has no attribute ‘data’

Looking forward to your reply, thank you!

Issue Analytics

State:
Created 5 years ago
Comments:14 (5 by maintainers)

Top GitHub Comments

2reactions

sankuniucommented, Jun 8, 2018

@myleott Important Notice! When I install the NCCL(https://developer.nvidia.com/nccl/nccl-download.) first and then build Pytorch, install fairseq, the dual GPUs could work well. Otherwise, Installation NCCL after building Pytorch, the result show error like “RuntimeError: the distributed NCCL backend is not available; try to recompile the THD package with CUDA and NCCL 2+ support at /home/z/pytorch/torch/lib/THD/process_group/General.cpp:17”

2reactions

edunovcommented, May 29, 2018

Also, please, make sure your dictionary size is not very big, let’s say no bigger than 50k tokens.

Top Results From Across the Web

Trolley problem - Wikipedia

The trolley problem is a series of thought experiments in ethics and psychology, involving stylized ethical dilemmas of whether to sacrifice one person...

Train Problem - TV Tropes

The Train Problem trope as used in popular culture. The official math problem of TV Land, meant to resemble grade school mathematics problems....

America's Railroads Are in Trouble–With or Without a Strike

Not to mention that if railroads lose more market share, the major rail companies will have to tear up tracks, lay off employees,...

Opinion | Our Trouble With Trains - The New York Times

As American rail lines became freight lines, they had no need to build or maintain the tracks necessary for higher-speed passenger traffic.

Problems on Trains - Concept, Tips, Tricks and ... - Byju's

Similar to the concept of speed, distance and time, train problems are specifically based on evaluating the speed, distance covered and time is ......