Missing genes in PanelApp panel
See original GitHub issueHello, me again!
I’m experiencing some missing gene’s in the PanelApp panels loaded into scout. I was testing with scout v4.54 and tried again today with your most recent release v4.56 (and re-loading the PanelApp panels with scout -db load panel --panel-app
) and still seem to be experiencing the same thing. Apologies I’m not giving a super tidy reproducible example, just explaining what I’ve observed. If it’s not easily reproduced I’ll spend more time creating a reproducible example 😃
I checked several panels in the scout user interface, (they’re all the most recent versions) here are my observations.
PanelApp panels in scout with all the genes that are found in the green/high confidence list in the PanelApp website:
- Arrhythmogenic cardiomyopathy
- Atypical haemolytic uraemic syndrome
- Arthrogryposis
- Familial hidradenitis suppurativa
- GI tract tumours
- Infantile nystagmus
- Inherited bleeding disorders
- Inherited ovarian cancer (without breast cancer)
- Intracerebral calcification disorders
- Limb girdle muscular dystrophy
PanelApp panels in scout that are missing genes that are found in the green/high confidence list in the PanelApp website:
Genes that that were missing from these panels in scout that were “CNV Loss”:
- 17q12 recurrent (RCAD syndrome) region (includes HNF1B) Loss (I’m assuming the HGNC symbol is HNF1B)
- 22q11.2 recurrent (DGS/VCFS) region (proximal, A-B) (includes TBX1) Loss (I’m assuming the HGNC symbol is TBX1)
- 22q11.2 recurrent (DGS/VCFS) region (proximal, A-D) (includes TBX1) Loss ( I’m assuming the HGNC symbol is TBX1)
- Xp11.23 region (includes MAOA and MAOB) Loss (I’m assuming the HGNC symbol is MAOA and MAOB)
- ISCA-37440-Loss 2p21 region (includes PREPL and SLC3A1) Loss (I’m assuming the HGNC symbol is PREPL and SLC3A1)
Genes that that were missing from these panels in scout that were “CNV Gain”:
- 22q11.21 recurrent (Cat eye syndrome) region (includes CECR2) Gain (I’m assuming the HGNC symbol is CECR2)
- 8p23.1 recurrent region (includes GATA4) Gain (I’m assuming the HGNC symbol is GATA4)
- 7q36.3 ZRS (SHH cis-regulatory) duplication region (within LMBR1 intron 5) Gain (I’m assuming the HGNC symbol is LMBR1)
Genes that that were missing from these panels that were “STR”:
- ATN1_CAG (ATN1 is in OMIM and has an associated HGNC gene symbol (ATN1))
- CACNA1A_CAG (CACNA1A is in OMIM and has an associated HGNC gene symbol (CACNA1A))
- CSTB_CCCCGCCCCGCG (CSTB is in OMIM and has an associated HGNC gene symbol (CCSTB))
- TBP_CAG (TBP is in OMIM and has an associated HGNC gene symbol (TBP))
Genes that that were missing from these panels that were other types:
- C5orf42 (C5orf42 is in OMIM and has an associated HGNC gene symbol (CPLANE1))
- C12orf65 (C12orf65 is in OMIM and has an associated HGNC gene symbol (MTRFR))
- C19orf70 (C19orf70 is in OMIM and has an associated HGNC gene symbol (MICOS13))
22q11.21 recurrent (Cat eye syndrome) region (includes CECR2) Gain, 8p23.1 recurrent region (includes GATA4) Gain and 7q36.3 ZRS (SHH cis-regulatory) duplication region (within LMBR1 intron 5) Gain are present in scout and can be accessed by what I assume are their HGNC symbols. For example:
scout --port 27018 -db test-database view hgnc --hgnc-symbol CECR2
My partial output:
#hgnc_id hgnc_symbol aliases
1840 CECR2 CECR2, KIAA1740
scout --port 27018 -db test-database view hgnc --hgnc-id 1840
My partial output:
#hgnc_id hgnc_symbol aliases
1840 CECR2 CECR2, KIAA1740
ATN1_CAG, CACNA1A_CAG, CSTB_CCCCGCCCCGCG and TBP_CAG are present in scout and can be accessed by what I assume are their HGNC symbols if you drop everything after the underscore. For example:
scout --port 27018 -db test-database view hgnc --hgnc-symbol ATN1_CAG
My partial output:
#hgnc_id hgnc_symbol aliases
2022-06-16 17:01:03 leviathan scout.commands.view.hgnc[2619765] INFO No results found
scout --port 27018 -db test-database view hgnc --hgnc-id 3033
My partial output:
#hgnc_id hgnc_symbol aliases
3033 ATN1 B37, ATN1, D12S755E, DRPLA
scout --port 27018 -db test-database view hgnc --hgnc-symbol ATN1
My partial output:
#hgnc_id hgnc_symbol aliases
3033 ATN1 B37, ATN1, D12S755E, DRPLA
C5orf42, C12orf65 and C19orf70 are present in scout and can be accessed by the HGNC symbols used by PanelApp, their current HGNC symbols and their HGNC id. For example:
scout --port 27018 -db test-database view hgnc --hgnc-symbol C5orf42
My partial output:
#hgnc_id hgnc_symbol aliases
25801 CPLANE1 JBTS17, CPLANE1, FLJ13231, Hug, C5orf42
scout --port 27018 -db test-database view hgnc --hgnc-symbol CPLANE1
My partial output:
#hgnc_id hgnc_symbol aliases
25801 CPLANE1 JBTS17, Hug, CPLANE1, C5orf42, FLJ13231
scout --port 27018 -db test-database view hgnc --hgnc-id 25801
My partial output:
#hgnc_id hgnc_symbol aliases
25801 CPLANE1 JBTS17, Hug, CPLANE1, C5orf42, FLJ13231
It looks like this is where scout is mapping the gene’s in the PanelApp data to the HGNC symbols. My python isn’t great so I haven’t looked much further into the code. Either way it seems like there are some “edge cases” that this function isn’t currently accounting for so they’re just getting considered not present in the database and skipped when loading the PanelApp panels. Another thing to note, I didn’t see any of these “edge case” genes in the reduced HGNC dataset that scout uses for the demo database so these situations might be missed if testing with this reduced dataset.
Hope this is useful, again let me know if you have trouble reproducing this and I’ll spend some time creating a reproducible example 😃
Issue Analytics
- State:
- Created a year ago
- Comments:8 (3 by maintainers)
Top GitHub Comments
Yep, you are so right. When trying:
scout load panel --panel-app --panel-id 112
One can noteThank you for reporting the problem. We are probably going to release a new software version with this patch later today. It will be the last release before the summer holidays this year!