question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't train without --use-cpu

See original GitHub issue

Hi, when launching my training on a GPU, I get:

[ERROR:sockeye.train] Mismatch in arguments for training continuation.
[ERROR:sockeye.train] Differing arguments: use_cpu.

The command is

python3 -m sockeye.train \
  --batch-size 80 \
  --batch-type sentence \
  --checkpoint-frequency 30000 \
  --decode-and-evaluate 100 \
  --decoder rnn \
  --embed-dropout 0.2 \
  --encoder rnn \
  --initial-learning-rate 0.0001 \
  --keep-last-params 4 \
  --learning-rate-reduce-factor 0.7 \
  --learning-rate-reduce-num-not-improved 2 \
  --max-num-checkpoint-not-improved 4 \
  --max-seq-len 100 \
  --num-embed 512:512 \
  --num-layers 8:8 \
  --optimized-metric bleu \
  --optimizer adam \
  --rnn-attention-in-upper-layers \
  --rnn-attention-type bilinear \
  --rnn-decoder-hidden-dropout 0.2 \
  --rnn-num-hidden 1024 \
  --use-cpu \
  --weight-init xavier \
  --weight-init-scale 3.0 \
  --weight-init-xavier-factor-type avg \
  -d train_data \
  -o sockeye-commit-suggester \
  -vs test.3000.bpe.diff \
  -vt test.3000.bpe.msg

What am I doing wrong? Many thanks to everybody

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
tdomhancommented, Jan 31, 2019

True. Maybe we could additionally extend the Mismatch in arguments for training continuation. message though to tell the user that the existing directory should be deleted if training continuation is not intended.

1reaction
tdomhancommented, Jan 31, 2019

When you run Sockeye multiple times with the same arguments then it will try to resume training. Do you think a message that just prints something like:

Trying to resume training from existing directory %s.

would make this clearer?

Read more comments on GitHub >

github_iconTop Results From Across the Web

can't train with cpu only · Issue #323 · jwyang/faster-rcnn.pytorch
Because of some certain reason, I want to train the model with small data on a PC with no GPU. But Python gave...
Read more >
Training a neural network using CPU only - Stack Overflow
Yes, it should be straightforward to train on CPU, simply by specifying that choice as the back end when you configure your model....
Read more >
Choosing between CPU and GPU for training a neural network
Unlike some of the other answers, I would highly advice against always training on GPUs without any second thought.
Read more >
Can I train a machine learning model on a personal computer ...
Yes, it is possible to train a machine learning model on a personal computer without a GPU (Graphics Processing Unit). While a GPU...
Read more >
Training taking several days CPU - Usage & Issues
On a CPU, is it normal for training to take several days with a video that's only a few seconds long? If not,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found