question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Segmentation fault on Ubuntu while training NER or TextCat

See original GitHub issue

How to reproduce the behaviour

Hello! I’m coming across a segmentation fault while training a model for NER and text classifier. While I was training it exclusively on my local machine, this problem did not (and still do not) occur, it seems to appear only on my VMs. Here is the context: I want to train a lot of models, while making some slight variant on different parameters, as the dropout, the batch size, the number of iterations, etc. Also, for each different set of parameters, I train 10 models, in order to be sure that the good score of the model wasn’t some kind of luck. I first of all create a new blank model from a language:

nlp = spacy.blank("fr")
nlp.add_pipe(nlp.create_pipe(SENTENCIZER))
nlp.add_pipe(nlp.create_pipe(TAGGER))
nlp.add_pipe(nlp.create_pipe(PARSER))
nlp.begin_training()

I add the the newly created model some word vectors:

from gensim.models import FastText

gensim_model = FastText.load(vectors_dir)
gensim_model.init_sims(replace=True)
nr_dim = gensim_model.wv.vector_size
nlp.vocab.reset_vectors(width=nr_dim)
for word in gensim_model.wv.index2word:
    vector = gensim_model.wv.get_vector(word)
    nlp.vocab[word]
    nlp.vocab.set_vector(word, vector)

I then call another python script to properly train the component I want, be it the NER or TextCat pipe. In this project, I have a custom “multi-ner” and “multi-textcat” to train each label as a separate submodel, as can be shown in the image: image The training is done with 5000 sentences for the NER and 2000 for the TextCat, and while it demands a bit of RAM, it’s really nothing for the machines that have 16 Gigabytes. After that, I modify the meta.json file in order to incorporate some project-related infos, and it’s done.
As I said, I want to train a lot of models, so I train series of 10 models (each series with the same parameters). The 10 models aren’t trained simultaneously but one after the other. And here is the thing: while on my local machine I can train dozens of model without a single error, there is another behaviour on the VMs. After some training (usually I can train 2 models, so 2*20 iterations), I have a Segmentation Fault error. It’s always when I try to load the model, and it can be just before a training, or before the meta file changes. I don’t really know how to investigate this error, and what I can do to solve it, any help or tip is welcome! 😃

I was trying to be as exhaustive as possible, but as I am quite a newbie at writing issues I may have missed some important info, do not hesitate in asking more detail or code preview!

Your Environment

I’m using 3 different VM to train simultaneously the models, and the configuration is slightly different from one to another, but the bug is the same. Here is also my local configuration, the errorless one. Each machine (VM or Local) has 16 Gigabytes RAM, and it appears to be more than enough.

Info about spaCy Local VM 1 VM 2 & 3
spaCy version: 2.2.4 2.2.4 2.2.4
Platform: Windows-10-10.0.17134-SP0 Linux-4.4.0-62-generic-x86_64-with-Ubuntu-16.04-xenial Linux-4.15.0-106-generic-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.5 3.6.5 3.6.9

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
mohatericommented, Jul 6, 2020

Thank you, it worked! Thanks for taking your time to help me find the problem

1reaction
svlandegcommented, Jul 3, 2020

About the serialization of custom components, the documentation provides more information here:

When spaCy loads a model via its meta.json, it will iterate over the “pipeline” setting, look up every component name in the internal factories and call nlp.create_pipe to initialize the individual components, like the tagger, parser or entity recognizer. If your model uses custom components, this won’t work – so you’ll have to tell spaCy where to find your component. You can do this by writing to the Language.factories:

from spacy.language import Language
Language.factories["my_component"] = lambda nlp, **cfg: MyComponent(nlp, **cfg)

That’s also pretty much what the error message says there. Have you tried implementing this so you can just save out the entire model in one go?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resolving Segmentation Fault (“Core dumped”) in Ubuntu - Blog
Segmentation fault is when your system tries to access a page of memory that doesn't exist. Core dumped means when a part of...
Read more >
spaCy 101: Everything you need to know
Whether you're new to spaCy, or just want to brush up on some NLP basics and implementation details – this page should have...
Read more >
Segmentation Fault Core Dumped Ubuntu - Javatpoint
A segmentation fault appears when any program attempts for accessing a memory location that it's not permitted for accessing or attempts for accessing...
Read more >
Segmentation fault - PyTorch Forums
hey,I have met this problem recently. And I only use one GPU to train my model. When I make sampler as None in...
Read more >
How to remedy "Segmentation fault (core dumped)" error ...
I'm trying to get it all up and running in regards to training neural networks in Python using Keras (in an Anaconda environment)...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found