Update scispacy version on streamlit demo
See original GitHub issueI am getting different results for the same input text when I use the streamlit demo vs. when I run the code locally. The text in question:
text = "The structural unit of the secretory Na+-K+-2Cl- cotransporter (NKCC1) is a homodimer."
NER results using "en_ner_jnlpba_md"
on streamlit demo
Then, running things locally:
import spacy
model = "en_ner_jnlpba_md"
nlp = spacy.load(model)
doc = nlp("The structural unit of the secretory Na+-K+-2Cl- cotransporter (NKCC1) is a homodimer.")
for ent in doc.ents:
print(f"{ent.text}\t{ent.label_}")
"""
The above prints:
NKCC1 DNA
homodimer PROTEIN
"""
My pyproject.toml
has the following dependencies:
scispacy = "^0.4.0"
en_ner_jnlpba_md = {url = "https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_jnlpba_md-0.4.0.tar.gz"}
Any idea what might be causing this? I consider the streamlit demo response to be more correct, and am interesting in getting the same result locally!
Also, I am only showing one example here, but I found I could quickly come up with other examples where the streamlit demo specialized NER results were better (IMO) than the results I got locally. A second example is:
text = "Fourteen residues of the U1 snRNP-specific U1A protein are required for homodimerization, cooperative RNA binding, and inhibition of polyadenylation."
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
How do I upgrade to the latest version of Streamlit?
If you've previously installed Streamlit and want to upgrade to the latest version, here's how to do it based on your dependency manager....
Read more >Streamlit
You need to enable JavaScript to run this app. Connecting. RerunR Clear cacheC. Documentation Ask a question. Report a bug. Streamlit for teams....
Read more >scispacy | SpaCy models for biomedical text processing
scispaCy is a Python package containing spaCy models for processing biomedical, scientific or clinical text. Interactive Demo. Just looking to test out the ......
Read more >Visualizers · spaCy Usage Documentation
Visualizing a dependency parse or named entities in a text is not only a fun NLP demo – it can also be incredibly...
Read more >Mark Neumann on Twitter: " Scispacy, our @spacy_io library ...
's great demo app, with additional Entity Linking and Specialized NER. ... , curious if you could share which version of Streamlit you...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As an original author of explosion/spaCy#8138 (which has been closed), I still keep trying to figure out what has changed. I have a case where the ‘accuracy’ in the downstream application has dropped over 20%, despite Spacy training validation scores dropping less than 5%. There is a clear, consistent case where for my triplet of entities such as: JOHN BROWN and JANE BROWN as trustees of JOHN AND JANE FAMILY TRUST the Spacy-2 correctly predicts all 3 entities above whereas Spacy-3 only predicts the first one (JANE BROWN) in 200 out of 1000 test documents. Honnibal suggested there was some change in ‘dropping entities’ that can not be predicted, and perhaps that change is doing more than envisioned. I am trying to see if I can reproduce the same behavior using other data sets.
You’re totally right. I couldn’t find which version of en_ner_jnlpba_md they are using on streamlit demo, but given that en_core_sci_lg was older, it wouldn’t surprise me if the en_ner_jnlpba_md was too.
EDIT:
with version 0.3.0 of en_ner_jnlpba_md and spacy 2.3.2 I got: secretory Na±K±2Cl- cotransporter PROTEIN NKCC1 PROTEIN homodimer PROTEIN
while with 0.4.0 (and spacy 3.0.5) I got: NKCC1 DNA homodimer PROTEIN