question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error when training on nightly

See original GitHub issue

I’m getting this error when trying to train on the nightly version of spacy:

... File "/usr/local/lib/python3.6/dist-packages/thinc/model.py", line 288, in __call__ return self._func(self, X, is_train=is_train) File "/usr/local/lib/python3.6/dist-packages/thinc/layers/softmax.py", line 32, in forward W = cast(Floats2d, model.get_param("W")) File "/usr/local/lib/python3.6/dist-packages/thinc/model.py", line 213, in get_param f"Parameter '{name}' for model '{self.name}' has not been allocated yet." KeyError: "Parameter 'W' for model 'softmax' has not been allocated yet."

How to reproduce the behaviour

!python -m spacy train 'config.cfg' --output='model' --gpu-id=0 --verbose --paths.train train.spacy --paths.dev test.spacy

config.cfg:

`[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "es"
pipeline = ["transformer","tagger","parser","ner"]
tokenizer = {"@<!-- -->tokenizers":"spacy.Tokenizer.v1"}
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null

[components]

[components.ner]
factory = "ner"
moves = null
update_with_oracle_cut_size = 100

[components.ner.model]
@<!-- -->architectures = "spacy.TransitionBasedParser.v1"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@<!-- -->layers":"reduce_mean.v1"}

[components.parser]
factory = "parser"
learn_tokens = false
min_action_freq = 30
moves = null
update_with_oracle_cut_size = 100

[components.parser.model]
@<!-- -->architectures = "spacy.TransitionBasedParser.v1"
state_type = "parser"
extra_state_tokens = false
hidden_width = 128
maxout_pieces = 3
use_upper = false
nO = null

[components.parser.model.tok2vec]
@<!-- -->architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@<!-- -->layers":"reduce_mean.v1"}

[components.tagger]
factory = "tagger"

[components.tagger.model]
@<!-- -->architectures = "spacy.Tagger.v1"
nO = null

[components.tagger.model.tok2vec]
@<!-- -->architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@<!-- -->layers":"reduce_mean.v1"}

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@<!-- -->annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@<!-- -->architectures = "spacy-transformers.TransformerModel.v1"
name = "mrm8488/RuPERTa-base"

[components.transformer.model.get_spans]
@<!-- -->span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.tokenizer_config]
use_fast = true

[corpora]

[corpora.dev]
@<!-- -->readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@<!-- -->readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 500
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = ["tagger", "parser"]
before_to_disk = null

[training.batcher]
@<!-- -->batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
get_length = null

[training.logger]
@<!-- -->loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@<!-- -->optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@<!-- -->schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
dep_las_per_type = null
sents_p = null
sents_r = null
ents_per_type = null
tag_acc = 0.33
dep_uas = 0.17
dep_las = 0.17
sents_f = 0.0
ents_f = 0.33
ents_p = 0.0
ents_r = 0.0

[pretraining]

[initialize]
vectors = null
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null

[initialize.components]

[initialize.tokenizer]`

Any clues on what may be happening are appreciated.

Your Environment

  • spaCy version: 3.0.0a35
  • Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.9
  • Pipelines: es_core_news_md (3.0.0a0)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
svlandegcommented, Oct 9, 2020

Awesome, happy to hear it works now!

1reaction
svlandegcommented, Oct 8, 2020

I just noticed the frozen components. I think there’s a bug related to those, that they are not disabled when calling the evaluation. I’ll have a further look. Your stack trace is helpful, thanks 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Nightly build ray crashes after few training iterations using RLLib
I am running ray nightly build and after a few training iterations, ray crashes down with the following error:
Read more >
got "pure virtual method called" error when training step is ...
When max_steps > 300k, this error occurs constantly, but with a smaller max_steps, it works fine. I did some google search but no...
Read more >
Tensorflow Training Stops before all epochs are completed ...
I was trying to train the model over 90 epochs but each time after somewhere between 15 and 25 epochs the model crashes...
Read more >
Pointpillars training error - NVIDIA Developer Forums
when I use my own data for training, error happen bellow:, ... File “/home/jenkins/agent/workspace/tlt-pytorch-main-nightly/pointcloud/ ...
Read more >
Solved: Re: Error - Circular Reference occurs in nightly f...
Unless you have taken the Advanced Configuration Training we offer, you will not have this access. Thank you, Kevin Dorsey SAP Concur Community...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found