Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not able to run textcat.pipe() after loading TextCategorizer from disk

See original GitHub issue

Issue

I’m trying to load a self-trained TextCategorizer model, and run it for an already pre-processed documents (e.g. by using textcat.pipe(docs)) I started with this tutorial: https://github.com/explosion/spaCy/blob/master/examples/training/train_textcat.py

The training and evaluation works just fine, but when I try to reload the pipeline/model I get stuck. I can not store the model, only the pipeline (workaround: disable the other components). This shouldn’t be an issue, I just load it and then select the textcat pipeline. But when I try to apply it on documents, I receive the error:

  File "<..>\Python37\lib\site-packages\thinc\api.py", line 295, in begin_update
    X, bp_layer = layer.begin_update(layer.ops.flatten(seqs_in, pad=pad), drop=drop)
  File "ops.pyx", line 150, in thinc.neural.ops.Ops.flatten
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: need at least one array to concatenate

My class where I try to load the trained TextCategorizer:

class SentenceSplitter:

    def __init__(self, path):
        self.loaded_pipeline = spacy.load(path)
        self.categorizer = self.loaded_pipeline.get_pipe(TextCategorizer.name)

    def __cal__(doc):
        for sentence in doc.sents:
            self.categorizer.pipe([sentence.as_doc()])  # This is not possible/fails

More or less (e.g. the model is created and not loaded), the exact same thing is done during training and there it works. Any ideas?

Environment

spaCy version: 2.2.4
Platform: Windows-10-10.0.19041-SP0
Python version: 3.7.9

Issue Analytics

State:
Created 3 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

afftekcommented, Dec 9, 2020

Hi @aftek, do I understand it correctly that you’re still experiencing a (different) issue with the ensemble architecture? If so - could you post a minimal code snippet that reproduces the error?

yes, I do have issues with ensemble - same error message, not sure what the issue itself is. I will try to find the minimal code which fails and post it as soon as I have it.

1reaction

svlandegcommented, Dec 7, 2020

Ooooh, apologies!

Top Results From Across the Web

Load error after adding custom textcat model to the pipeline

I run the training with the en_core_web_lg as a base model. The code classifies an example sentence with the newly trained model all...

TextCategorizer · spaCy API Documentation

Pipeline component for text classification. ... For a binary classification task, you can use textcat with two labels or textcat_multilabel with one label....

Can spacy's text categorizer learn the logic of recognizing two ...

Yes it can, it seems impractical to use the train command for trivial examples. The following code does exactly what is requested.

Building a Text Classification model with spaCy 3.x - Medium

TextCategorizer is a pipeline component for text classification. ... For a binary text classification, you can use textcat with two labels ...

spacy-transformers 0.6.1 - PyPI

Use BERT, RoBERTa, XLNet and GPT-2 directly in your spaCy pipeline. ... nlp.add_pipe(textcat) optimizer = nlp.resume_training() for i in range(10): ...