question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not able to run textcat.pipe() after loading TextCategorizer from disk

See original GitHub issue

Issue

I’m trying to load a self-trained TextCategorizer model, and run it for an already pre-processed documents (e.g. by using textcat.pipe(docs)) I started with this tutorial: https://github.com/explosion/spaCy/blob/master/examples/training/train_textcat.py

The training and evaluation works just fine, but when I try to reload the pipeline/model I get stuck. I can not store the model, only the pipeline (workaround: disable the other components). This shouldn’t be an issue, I just load it and then select the textcat pipeline. But when I try to apply it on documents, I receive the error:

  File "<..>\Python37\lib\site-packages\thinc\api.py", line 295, in begin_update
    X, bp_layer = layer.begin_update(layer.ops.flatten(seqs_in, pad=pad), drop=drop)
  File "ops.pyx", line 150, in thinc.neural.ops.Ops.flatten
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: need at least one array to concatenate

My class where I try to load the trained TextCategorizer:

class SentenceSplitter:

    def __init__(self, path):
        self.loaded_pipeline = spacy.load(path)
        self.categorizer = self.loaded_pipeline.get_pipe(TextCategorizer.name)

    def __cal__(doc):
        for sentence in doc.sents:
            self.categorizer.pipe([sentence.as_doc()])  # This is not possible/fails

More or less (e.g. the model is created and not loaded), the exact same thing is done during training and there it works. Any ideas?

Environment

  • spaCy version: 2.2.4
  • Platform: Windows-10-10.0.19041-SP0
  • Python version: 3.7.9

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
afftekcommented, Dec 9, 2020

Hi @aftek, do I understand it correctly that you’re still experiencing a (different) issue with the ensemble architecture? If so - could you post a minimal code snippet that reproduces the error?

yes, I do have issues with ensemble - same error message, not sure what the issue itself is. I will try to find the minimal code which fails and post it as soon as I have it.

1reaction
svlandegcommented, Dec 7, 2020

Ooooh, apologies!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Load error after adding custom textcat model to the pipeline
I run the training with the en_core_web_lg as a base model. The code classifies an example sentence with the newly trained model all...
Read more >
TextCategorizer · spaCy API Documentation
Pipeline component for text classification. ... For a binary classification task, you can use textcat with two labels or textcat_multilabel with one label....
Read more >
Can spacy's text categorizer learn the logic of recognizing two ...
Yes it can, it seems impractical to use the train command for trivial examples. The following code does exactly what is requested.
Read more >
Building a Text Classification model with spaCy 3.x - Medium
TextCategorizer is a pipeline component for text classification. ... For a binary text classification, you can use textcat with two labels ...
Read more >
spacy-transformers 0.6.1 - PyPI
Use BERT, RoBERTa, XLNet and GPT-2 directly in your spaCy pipeline. ... nlp.add_pipe(textcat) optimizer = nlp.resume_training() for i in range(10): ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found