Custom (local) UMLS subsets
See original GitHub issueHi scispacy team,
First of all, thanks for creating a great tool, I think it’s very useful!
I have a couple of questions related to the UMLS Entity Linker:
Generic UMLS linker From the paper, I understand that scispacy links entities to UMLS concepts from “sections 0, 1, 2 and 9 (SNOMED) of the UMLS 2017 AA release”. Is that still correct? I think it would be useful to add this information to the README as well.
Custom UMLS linker Somewhat related to #234: would it also be possible to link the entities to a local UMLS subset (installed with MetamorphoSys) for people with a UMLS license?
The reason I’m asking is twofold:
- UMLS is released twice a year in the first weeks of May and November. The current version is 2020AA and contains new concepts, such as COVID-19, that will currently not be detected by scispacy. Unless you’re planning to do frequent updates on the model, I’d like to be able to use the most frequent concepts.
- UMLS is highly customisable; users can select their own subsets of the many vocabularies. It would be great if this customisability is also applied in scispacy.
My understanding is that it is possible by:
- Converting a UMLS
MRCONSO.RFF
file to JSON using export_uml_json.py - Generating a KnowledgeBase object
- Training a new linker using create_tfidf_ann_index()
Is this correct? Any help or more detailed instructions would be greatly appreciated!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6
Top GitHub Comments
I was able to get this working. I needed to include a couple of minor tweaks. Added:
from scispacy.linking_utils import KnowledgeBase
I applied to code shown by @DeNeutoy then I was able to add_pipe with the new name:
umls_nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "umls2020" } )
see linker_name is changed to the custom value “umls2020”Hi @DeNeutoy,
Thanks for the alternative! I’m adopting that, because it’s indeed a bit nicer than my previous solution. I completely understand the reasons for implementing it as you did 😃 Providing a function that does this global mutation with intelligent errors sounds like a nice addition to
scispacy
!Edit: I just posted another error here that appeared to be my own mistake, so I have deleted it again.