spaCy Tasks
See original GitHub issueI am a big fan of the spaCy library and we should definitely support some of the basic functionalities of the framework https://spacy.io/usage/spacy-101#features
In essence I would like to create spaCy pipelines using Prefect tasks since we can chain together the outputs of each task. As a good first step we could easily support tokenization which encompasses a lot of functionality:
nlp = spacy.load('specified model')
tokens = nlp(u'task input')
However it would also be nice to break out some of the functionality into individual tasks because Prefect has mechanisms in place to support smaller units of execution. Tasks for this first pass:
- tokenizer
- tagger
- parser
- ner
- textcat
The optimal solution for this issue is if we can support this workflow (https://spacy.io/usage/spacy-101#pipelines):
^ This flow in Prefect could look something similar to:
with Flow('processing pipeline') as flow:
text = Parameter('text')
tagger = TaggerTask(input)
parser = ParserTask(tagger)
ner = NERTask(parser)
doc = DoSomethingWithResultTask(ner)
flow.run(parameters=dict(text="here is my input"))
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (3 by maintainers)
No problems when picking the pipeline or portions of the pipeline object. I’ll keep going unless the individual token finidng is a showstopper
I’ll put together a PR for these tasks this week