question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Update scispacy version on streamlit demo

See original GitHub issue

I am getting different results for the same input text when I use the streamlit demo vs. when I run the code locally. The text in question:

text = "The structural unit of the secretory Na+-K+-2Cl- cotransporter (NKCC1) is a homodimer."

NER results using "en_ner_jnlpba_md" on streamlit demo

image

Then, running things locally:

import spacy

model = "en_ner_jnlpba_md"
nlp = spacy.load(model)

doc = nlp("The structural unit of the secretory Na+-K+-2Cl- cotransporter (NKCC1) is a homodimer.")

for ent in doc.ents:
    print(f"{ent.text}\t{ent.label_}")

"""
The above prints:

NKCC1	DNA
homodimer	PROTEIN
"""

My pyproject.toml has the following dependencies:

scispacy = "^0.4.0"
en_ner_jnlpba_md = {url = "https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_jnlpba_md-0.4.0.tar.gz"}

Any idea what might be causing this? I consider the streamlit demo response to be more correct, and am interesting in getting the same result locally!


Also, I am only showing one example here, but I found I could quickly come up with other examples where the streamlit demo specialized NER results were better (IMO) than the results I got locally. A second example is:

text = "Fourteen residues of the U1 snRNP-specific U1A protein are required for homodimerization, cooperative RNA binding, and inhibition of polyadenylation."

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
mbruneckycommented, Jul 26, 2021

As an original author of explosion/spaCy#8138 (which has been closed), I still keep trying to figure out what has changed. I have a case where the ‘accuracy’ in the downstream application has dropped over 20%, despite Spacy training validation scores dropping less than 5%. There is a clear, consistent case where for my triplet of entities such as: JOHN BROWN and JANE BROWN as trustees of JOHN AND JANE FAMILY TRUST the Spacy-2 correctly predicts all 3 entities above whereas Spacy-3 only predicts the first one (JANE BROWN) in 200 out of 1000 test documents. Honnibal suggested there was some change in ‘dropping entities’ that can not be predicted, and perhaps that change is doing more than envisioned. I am trying to see if I can reproduce the same behavior using other data sets.

1reaction
MichalMalyskacommented, Mar 23, 2021

You’re totally right. I couldn’t find which version of en_ner_jnlpba_md they are using on streamlit demo, but given that en_core_sci_lg was older, it wouldn’t surprise me if the en_ner_jnlpba_md was too.

EDIT:

with version 0.3.0 of en_ner_jnlpba_md and spacy 2.3.2 I got: secretory Na±K±2Cl- cotransporter PROTEIN NKCC1 PROTEIN homodimer PROTEIN

while with 0.4.0 (and spacy 3.0.5) I got: NKCC1 DNA homodimer PROTEIN

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I upgrade to the latest version of Streamlit?
If you've previously installed Streamlit and want to upgrade to the latest version, here's how to do it based on your dependency manager....
Read more >
Streamlit
You need to enable JavaScript to run this app. Connecting. RerunR Clear cacheC. Documentation Ask a question. Report a bug. Streamlit for teams....
Read more >
scispacy | SpaCy models for biomedical text processing
scispaCy is a Python package containing spaCy models for processing biomedical, scientific or clinical text. Interactive Demo. Just looking to test out the ......
Read more >
Visualizers · spaCy Usage Documentation
Visualizing a dependency parse or named entities in a text is not only a fun NLP demo – it can also be incredibly...
Read more >
Mark Neumann on Twitter: " Scispacy, our @spacy_io library ...
's great demo app, with additional Entity Linking and Specialized NER. ... , curious if you could share which version of Streamlit you...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found