question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to visualize named entities in custom colors

See original GitHub issue

There’s an options in Spacy which allows us to use custom colors for named entity visualization. I’m trying to use the same options in scispacy for the named entities. I simply created two lists of entities and randomly generated colors and put them in options dictionary like the following:

options = {"ents": entities, "colors": colors}

Where entities is a list of NEs in scispacy NER models and colors is a list of the same size. But using such an option in either displacy.serve or displacy.render (for jupyter) does not work. I’m using the options like the following:

displacy.serve(doc, style="ent", options=options)

I wonder if using the color option only works for predefined named entities in the Spacy or there’s something wrong with the way I’m using the option?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10

github_iconTop GitHub Comments

7reactions
phosseinicommented, Aug 5, 2019

@DeNeutoy This is the exact code I’m using:

import scispacy
import spacy
from spacy import displacy

# nlp = spacy.load("en_ner_jnlpba_md")
nlp = spacy.load("en_core_sci_md")

text = """The purpose of our study was to learn the distribution characteristics of cancer stem cell markers (CD24, CD44) in invasive carcinomas with different grade and molecular subtype. For research was used 1324 postoperative breast cancer samples, from which were selected 393 patient with invasive ductal carcinoma samples examined 2008-2012 in Laboratory of "Pathgeo Union of Pathologist" is and N.Kipshidze Central University Hospital. The age range is between 23-73 year. For all cases were performed immunohistochemical study using ER, PR, Her2, Ki67, CK5- molecular markers (Leica Microsystems). For identify cancer stem cells mononuclear antibodies CD24 (BIOCARE MEDICAL, CD44 - Clone 156-3C11; CD24 - Clone SN3b) were used. Association of CD44/CD24 expression in different subtypes of cells, between clinicopathological parameters and different biological characteristics were performed by Pearson correlation and usind X2 tests. Obtained quantitative statistical analyses were performed by using SPSS V.19.0 program. Statistically significant were considered 95% of confidence interval. The data shows, that towards G1-G3, amount of CD44 positive cases increased twice. CD44 positive cases are evenly distributed between Luminal A, Luminal B, HER2+, triple negative basal like cell subtypes and in significantly less (4,8 times) in Her2+ cases. Maximum amount of CD44 negative cases is shown in Luminal A subtype, which could be possible cause of better prognosis and high sensitivity for chemotherapy. For one's part such aggressive subtypes of breast cancer as Luminal B and basal like cell type, are characterized by CD44 positive and antigen high expression, which can be reason of aggressive nature of this types and also reason of chemotherapy resistance. As well as amount of CD24 positive cases according to malignancy degree, also antigen expression features does not show any type of correlation between malignancy degree and CD24 positivity or with CD24 expression features, or presence of stem cells. That can be the reason of tumor aggressivity and chemoresistance. exceptions are Her2 positive tumors because they have different base of carcinogenesis."""

doc = nlp(text)
options = get_entity_options()
displacy.render(doc, style='ent', options=options)

Where get_entity_options() is a method I wrote for getting the color options like the following (everybody, feel free to use it if you find it useful):

import random 

def get_entity_options(random_colors=False):
    """
    generating color options for visualizing the named entities
    """
    def color_generator(number_of_colors):
        color = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])
                 for i in range(number_of_colors)]
        return color

    entities = ["GGP", "SO", "TAXON", "CHEBI", "GO", "CL", 
                "DNA", "CELL_TYPE", "CELL_LINE", "RNA", "PROTEIN",
                "DISEASE", "CHEMICAL",
                "CANCER", "ORGAN", "TISSUE", "ORGANISM", "CELL", "AMINO_ACID", "GENE_OR_GENE_PRODUCT", "SIMPLE_CHEMICAL", "ANATOMICAL_SYSTEM", "IMMATERIAL_ANATOMICAL_ENTITY", "MULTI-TISSUE_STRUCTURE", "DEVELOPING_ANATOMICAL_STRUCTURE", "ORGANISM_SUBDIVISION", "CELLULAR_COMPONENT"]
    
    colors = {"ENT":"#E8DAEF"}
    
    if random_colors:
        color = color_generator(len(entities))
        for i in range(len(entities)):
            colors[entities[i]] = color[i]
    else:
        entities_cat_1 = {"GGP":"#F9E79F", "SO":"#F7DC6F", "TAXON":"#F4D03F", "CHEBI":"#FAD7A0", "GO":"#F8C471", "CL":"#F5B041"}
        entities_cat_2 = {"DNA":"#82E0AA", "CELL_TYPE":"#AED6F1", "CELL_LINE":"#E8DAEF", "RNA":"#82E0AA", "PROTEIN":"#82E0AA"}
        entities_cat_3 = {"DISEASE":"#D7BDE2", "CHEMICAL":"#D2B4DE"}
        entities_cat_4 = {"CANCER":"#ABEBC6", "ORGAN":"#82E0AA", "TISSUE":"#A9DFBF", "ORGANISM":"#A2D9CE", "CELL":"#76D7C4", "AMINO_ACID":"#85C1E9", "GENE_OR_GENE_PRODUCT":"#AED6F1", "SIMPLE_CHEMICAL":"#76D7C4", "ANATOMICAL_SYSTEM":"#82E0AA", "IMMATERIAL_ANATOMICAL_ENTITY":"#A2D9CE", "MULTI-TISSUE_STRUCTURE":"#85C1E9", "DEVELOPING_ANATOMICAL_STRUCTURE":"#A9DFBF", "ORGANISM_SUBDIVISION":"#58D68D", "CELLULAR_COMPONENT":"#7FB3D5"}

        entities_cats = [entities_cat_1, entities_cat_2, entities_cat_3, entities_cat_4]
        for item in entities_cats:
            colors = {**colors, **item}
    
    options = {"ents": entities, "colors": colors}
    
    return options

Using the full model, I can’t see any visualization, but when I switch to a specific NER model I do see the visualization.

5reactions
victoriastuartcommented, Nov 26, 2019

@phosseini : very cool, thank you! I added your code (for my own use / tests) as a method, giving the following results! 😃


entity_options.py

## Source: https://github.com/allenai/scispacy/issues/141#issuecomment-518274586
## Author: https://github.com/phosseini
##   File: /mnt/Vancouver/apps/spacy/entity_options.py
##    Env: Python 3.7 venv:
##    Use:
##          import entity_options
##          from entity_options import get_entity_options
##          displacy.serve(doc, style="ent", options=get_entity_options(random_colors=True))
##    Ent: https://github.com/allenai/scispacy/issues/79#issuecomment-557766506 ## CRAFT entities

import random 

def get_entity_options(random_colors=False):
    """ generating color options for visualizing the named entities """

    def color_generator(number_of_colors):
        color = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)]) for i in range(number_of_colors)]
        return color

    entities = ["GGP", "SO", "TAXON", "CHEBI", "GO", "CL", "DNA", "CELL_TYPE", "CELL_LINE", "RNA", "PROTEIN", \
                "DISEASE", "CHEMICAL", "CANCER", "ORGAN", "TISSUE", "ORGANISM", "CELL", "AMINO_ACID", \
                "GENE_OR_GENE_PRODUCT", "SIMPLE_CHEMICAL", "ANATOMICAL_SYSTEM", "IMMATERIAL_ANATOMICAL_ENTITY", \
                "MULTI-TISSUE_STRUCTURE", "DEVELOPING_ANATOMICAL_STRUCTURE", "ORGANISM_SUBDIVISION", "CELLULAR_COMPONENT"]

    colors = {"ENT":"#E8DAEF"}

    if random_colors:
        color = color_generator(len(entities))
        for i in range(len(entities)):
            colors[entities[i]] = color[i]
    else:
        entities_cat_1 = {"GGP":"#F9E79F", "SO":"#F7DC6F", "TAXON":"#F4D03F", "CHEBI":"#FAD7A0", "GO":"#F8C471", "CL":"#F5B041"}
        entities_cat_2 = {"DNA":"#82E0AA", "CELL_TYPE":"#AED6F1", "CELL_LINE":"#E8DAEF", "RNA":"#82E0AA", "PROTEIN":"#82E0AA"}
        entities_cat_3 = {"DISEASE":"#D7BDE2", "CHEMICAL":"#D2B4DE"}
        entities_cat_4 = {"CANCER":"#ABEBC6", "ORGAN":"#82E0AA", "TISSUE":"#A9DFBF", "ORGANISM":"#A2D9CE", "CELL":"#76D7C4", \
                          "AMINO_ACID":"#85C1E9", "GENE_OR_GENE_PRODUCT":"#AED6F1", "SIMPLE_CHEMICAL":"#76D7C4", "ANATOMICAL_SYSTEM":"#82E0AA", \
                          "IMMATERIAL_ANATOMICAL_ENTITY":"#A2D9CE", "MULTI-TISSUE_STRUCTURE":"#85C1E9", "DEVELOPING_ANATOMICAL_STRUCTURE":"#A9DFBF", \
                          "ORGANISM_SUBDIVISION":"#58D68D", "CELLULAR_COMPONENT":"#7FB3D5"}

        entities_cats = [entities_cat_1, entities_cat_2, entities_cat_3, entities_cat_4]
        for item in entities_cats:
            colors = {**colors, **item}

    options = {"ents": entities, "colors": colors}
    # print(options)
    return options

Python 3.7 venv

(py3.7) [victoria@victoria spacy]$ date; pwd; ls -l

  Tue 26 Nov 2019 01:45:28 PM PST
  /mnt/Vancouver/apps/spacy
  total 20
  -rw-r--r-- 1 victoria victoria 2287 Nov 26 13:42 entity_options.py
  drwxr-xr-x 2 victoria victoria 4096 Nov 26 13:34 __pycache__
  -rw------- 1 victoria victoria 3560 Nov 26 12:02 readme-victoria-spacy.txt
  drwxr-xr-x 3 victoria victoria 4096 Nov 19 11:41 scispacy
  -rw-r--r-- 1 victoria victoria 2624 Nov 26 11:59 spacy_srl.py


(py3.7) [victoria@victoria spacy]$ pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_ner_craft_md-0.2.4.tar.gz
  ...
  Successfully installed blis-0.4.1 catalogue-0.0.8 en-ner-craft-md-0.2.4 preshed-3.0.2 spacy-2.2.3 thinc-7.3.1

(py3.7) [victoria@victoria spacy]$ env | grep -i virtual
VIRTUAL_ENV=/home/victoria/venv/py3.7

(py3.7) [victoria@victoria spacy]$ python --version
Python 3.7.4

(py3.7) [victoria@victoria spacy]$ python
  Python 3.7.4 (default, Nov 20 2019, 11:36:53) 
  [GCC 9.2.0] on linux
  Type "help", "copyright", "credits" or "license" for more information.

>>> import spacy
>>> from spacy import displacy

>>> text = "26902145. Breast cancer susceptibility gene 1 (BRCA1) is a tumor suppressor protein that functions to maintain genomic stability through critical roles in DNA repair, cell-cycle arrest, and transcriptional control. The androgen receptor (AR) is expressed in more than 70% of breast cancers and has been implicated in breast cancer pathogenesis. However, little is known about the role of BRCA1 in AR-mediated cell proliferation in human breast cancer. Here, we report that a high expression of AR in breast cancer patients was associated with shorter overall survival (OS) using a tissue microarray with 149 non-metastatic breast cancer patient samples. We reveal that overexpression of BRCA1 significantly inhibited expression of AR through activation of SIRT1 in breast cancer cells. Meanwhile, SIRT1 induction or treatment with a SIRT1 agonist, resveratrol, inhibits AR-stimulated proliferation. Importantly, this mechanism is manifested in breast cancer patient samples and TCGA database, which showed that low SIRT1 gene expression in tumor tissues compared with normal adjacent tissues predicts poor prognosis in patients with breast cancer. Taken together, our findings suggest that BRCA1 attenuates AR-stimulated proliferation of breast cancer cells via SIRT1 mediated pathway. | 30714292. Breast cancer susceptibility gene 1 (BRCA1) has been implicated in modulating metabolism via transcriptional regulation. However, direct metabolic targets of BRCA1 and the underlying regulatory mechanisms are still unknown. Here, we identified several metabolic genes, including the gene which encodes glutamate‐oxaloacetate transaminase 2 (GOT2), a key enzyme for aspartate biosynthesis, which are repressed by BRCA1. We report that BRCA1 forms a co‐repressor complex with ZBRK1 that coordinately represses GOT 2 expression via a ZBRK1 recognition element in the promoter of GOT2. Impairment of this complex results in upregulation of GOT2, which in turn increases aspartate and alpha ketoglutarate production, leading to rapid cell proliferation of breast cancer cells. Importantly, we found that GOT2 can serve as an independent prognostic factor for overall survival and disease‐free survival of patients with breast cancer, especially triple‐negative breast cancer. Interestingly, we also demonstrated that GOT2 overexpression sensitized breast cancer cells to methotrexate, suggesting a promising precision therapeutic strategy for breast cancer treatment. In summary, our findings reveal that BRCA1 modulates aspartate biosynthesis through transcriptional repression of GOT2, and provides a biological basis for treatment choices in breast cancer. | BRCA1/2. BRCA1 and BRCA2 (BRCA1/2) are human genes that produce tumor suppressor proteins."

>>> nlp = spacy.load("en_ner_craft_md")
>>> doc = nlp(text)

>>> import entity_options
>>> from entity_options import get_entity_options

>>> get_entity_options()                               ## default: (random_colors=False)
{'colors': {'AMINO_ACID': '#85C1E9',
            'ANATOMICAL_SYSTEM': '#82E0AA',
            'CANCER': '#ABEBC6',
            'CELL': '#76D7C4',
            'CELLULAR_COMPONENT': '#7FB3D5',
            'CELL_LINE': '#E8DAEF',
            'CELL_TYPE': '#AED6F1',
            'CHEBI': '#FAD7A0',
            'CHEMICAL': '#D2B4DE',
            'CL': '#F5B041',
            'DEVELOPING_ANATOMICAL_STRUCTURE': '#A9DFBF',
            'DISEASE': '#D7BDE2',
            'DNA': '#82E0AA',
            'ENT': '#E8DAEF',
            'GENE_OR_GENE_PRODUCT': '#AED6F1',
            'GGP': '#F9E79F',
            'GO': '#F8C471',
            'IMMATERIAL_ANATOMICAL_ENTITY': '#A2D9CE',
            'MULTI-TISSUE_STRUCTURE': '#85C1E9',
            'ORGAN': '#82E0AA',
            'ORGANISM': '#A2D9CE',
            'ORGANISM_SUBDIVISION': '#58D68D',
            'PROTEIN': '#82E0AA',
            'RNA': '#82E0AA',
            'SIMPLE_CHEMICAL': '#76D7C4',
            'SO': '#F7DC6F',
            'TAXON': '#F4D03F',
            'TISSUE': '#A9DFBF'},
 'ents': ['GGP',
          'SO',
          'TAXON',
          'CHEBI',
          'GO',
          'CL',
          'DNA',
          'CELL_TYPE',
          'CELL_LINE',
          'RNA',
          'PROTEIN',
          'DISEASE',
          'CHEMICAL',
          'CANCER',
          'ORGAN',
          'TISSUE',
          'ORGANISM',
          'CELL',
          'AMINO_ACID',
          'GENE_OR_GENE_PRODUCT',
          'SIMPLE_CHEMICAL',
          'ANATOMICAL_SYSTEM',
          'IMMATERIAL_ANATOMICAL_ENTITY',
          'MULTI-TISSUE_STRUCTURE',
          'DEVELOPING_ANATOMICAL_STRUCTURE',
          'ORGANISM_SUBDIVISION',
          'CELLULAR_COMPONENT']}

>>> get_entity_options(random_colors=True)
{'colors': {'AMINO_ACID': '#30CBF7',
            'ANATOMICAL_SYSTEM': '#6DF980',
            'CANCER': '#1AE0F9',
            'CELL': '#5813C7',
            'CELLULAR_COMPONENT': '#0D350E',
            'CELL_LINE': '#1AA436',
            'CELL_TYPE': '#F837CC',
            'CHEBI': '#54B69E',
            'CHEMICAL': '#BADCA1',
            'CL': '#D845FB',
            'DEVELOPING_ANATOMICAL_STRUCTURE': '#0D9CB4',
            'DISEASE': '#78A2E5',
            'DNA': '#CAD406',
            'ENT': '#E8DAEF',
            'GENE_OR_GENE_PRODUCT': '#EC2144',
            'GGP': '#A6AA7D',
            'GO': '#8312F0',
            'IMMATERIAL_ANATOMICAL_ENTITY': '#F7E433',
            'MULTI-TISSUE_STRUCTURE': '#221891',
            'ORGAN': '#786BC0',
            'ORGANISM': '#43534C',
            'ORGANISM_SUBDIVISION': '#B6F342',
            'PROTEIN': '#4454D9',
            'RNA': '#64C158',
            'SIMPLE_CHEMICAL': '#F8616A',
            'SO': '#344E4D',
            'TAXON': '#63B69D',
            'TISSUE': '#0DE67C'},
 'ents': ['GGP',
          'SO',
          'TAXON',
          'CHEBI',
          'GO',
          'CL',
          'DNA',
          'CELL_TYPE',
          'CELL_LINE',
          'RNA',
          'PROTEIN',
          'DISEASE',
          'CHEMICAL',
          'CANCER',
          'ORGAN',
          'TISSUE',
          'ORGANISM',
          'CELL',
          'AMINO_ACID',
          'GENE_OR_GENE_PRODUCT',
          'SIMPLE_CHEMICAL',
          'ANATOMICAL_SYSTEM',
          'IMMATERIAL_ANATOMICAL_ENTITY',
          'MULTI-TISSUE_STRUCTURE',
          'DEVELOPING_ANATOMICAL_STRUCTURE',
          'ORGANISM_SUBDIVISION',
          'CELLULAR_COMPONENT']}

## default: get_entity_options(random_colors=False)
## displacy.serve(doc, style="ent", options=get_entity_options())

>>> displacy.serve(doc, style="ent", options=get_entity_options(random_colors=True))

  Using the 'ent' visualizer
  Serving on http://0.0.0.0:5000 ...
  127.0.0.1 -- -- [26/Nov/2019 13:43:47] "GET / HTTP/1.1" 200 20529

Screenshots

random colors = False:

spacy_tagged_text_browser-2019-11-26c

random colors = True:

spacy_tagged_text_browser-2019-11-26d

Read more comments on GitHub >

github_iconTop Results From Across the Web

Displacy Custom Colors for custom entities using Displacy
I just copy/pasted your code and it works fine here. I'm using spaCy v3.1.1. enter image description here. What does the HTML output...
Read more >
Visualizing Named Entity Recognition (NER) - Khulood Nasher
The answer is yes we can do it. Customizing colors and Effects. There is a default color for each entity tag, however I...
Read more >
Visualizers · spaCy Usage Documentation
Visualizing a dependency parse or named entities in a text is not only a fun ... If you're using custom entity types, you...
Read more >
An open-source named entity visualiser for the modern web
If you're only looking to visualise the entities of a given text, there's no need to include any JavaScript at all. Simply head...
Read more >
1 line of code to visualize dependency trees, entity ...
Define custom colors for labels ... Some entity and relation labels will be highlighted with a pre-defined color, which you can find here....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found