Not able to run textcat.pipe() after loading TextCategorizer from disk
See original GitHub issueIssue
I’m trying to load a self-trained TextCategorizer model, and run it for an already pre-processed documents (e.g. by using textcat.pipe(docs)
)
I started with this tutorial: https://github.com/explosion/spaCy/blob/master/examples/training/train_textcat.py
The training and evaluation works just fine, but when I try to reload the pipeline/model I get stuck. I can not store the model, only the pipeline (workaround: disable the other components). This shouldn’t be an issue, I just load it and then select the textcat pipeline. But when I try to apply it on documents, I receive the error:
File "<..>\Python37\lib\site-packages\thinc\api.py", line 295, in begin_update
X, bp_layer = layer.begin_update(layer.ops.flatten(seqs_in, pad=pad), drop=drop)
File "ops.pyx", line 150, in thinc.neural.ops.Ops.flatten
File "<__array_function__ internals>", line 6, in concatenate
ValueError: need at least one array to concatenate
My class where I try to load the trained TextCategorizer
:
class SentenceSplitter:
def __init__(self, path):
self.loaded_pipeline = spacy.load(path)
self.categorizer = self.loaded_pipeline.get_pipe(TextCategorizer.name)
def __cal__(doc):
for sentence in doc.sents:
self.categorizer.pipe([sentence.as_doc()]) # This is not possible/fails
More or less (e.g. the model is created and not loaded), the exact same thing is done during training and there it works. Any ideas?
Environment
- spaCy version: 2.2.4
- Platform: Windows-10-10.0.19041-SP0
- Python version: 3.7.9
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)
Top GitHub Comments
yes, I do have issues with
ensemble
- same error message, not sure what the issue itself is. I will try to find the minimal code which fails and post it as soon as I have it.Ooooh, apologies!