question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I am a big fan of the spaCy library and we should definitely support some of the basic functionalities of the framework https://spacy.io/usage/spacy-101#features

In essence I would like to create spaCy pipelines using Prefect tasks since we can chain together the outputs of each task. As a good first step we could easily support tokenization which encompasses a lot of functionality:

nlp = spacy.load('specified model')
tokens = nlp(u'task input')

However it would also be nice to break out some of the functionality into individual tasks because Prefect has mechanisms in place to support smaller units of execution. Tasks for this first pass:

  • tokenizer
  • tagger
  • parser
  • ner
  • textcat

The optimal solution for this issue is if we can support this workflow (https://spacy.io/usage/spacy-101#pipelines): image

^ This flow in Prefect could look something similar to:

with Flow('processing pipeline') as flow:
	text = Parameter('text')
	tagger = TaggerTask(input)
	parser = ParserTask(tagger)
	ner = NERTask(parser)
	doc = DoSomethingWithResultTask(ner)

flow.run(parameters=dict(text="here is my input"))

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:13 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
zangell44commented, May 16, 2019

No problems when picking the pipeline or portions of the pipeline object. I’ll keep going unless the individual token finidng is a showstopper

2reactions
zangell44commented, May 14, 2019

I’ll put together a PR for these tasks this week

Read more comments on GitHub >

github_iconTop Results From Across the Web

spaCy · Industrial-strength Natural Language Processing in ...
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more....
Read more >
Natural Language Processing With spaCy in Python
spaCy is a free and open-source library for Natural Language Processing (NLP) in Python with a lot of in-built capabilities.
Read more >
spaCy NLP Tutorial - Analytics Vidhya
spaCy is my go-to library for Natural Language Processing (NLP) tasks. I'd venture to say that's the case for the majority of NLP...
Read more >
spaCy Tasks - Prefect Docs
This module contains a collection of tasks for interacting with the spaCy library. ... Task for processing text with a spaCy pipeline.
Read more >
spaCy Tutorial – Complete Writeup - Machine Learning Plus
spaCy comes with pretrained NLP models that can perform most common NLP tasks, such as tokenization, parts of speech (POS) tagging, named entity...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found