Requirements for UKB GWAS
See original GitHub issueTo run a basic GWAS on UKB data, here are some of the operations we’ll need support for:
- bgen reader (https://github.com/pystatgen/sgkit-bgen/pull/1)
- plink reader (https://github.com/pystatgen/sgkit-plink/pull/1, https://github.com/pystatgen/sgkit-plink/issues/6)
- Variant allele frequency/count (https://github.com/pystatgen/sgkit/issues/29)
- Variant call rate/count (https://github.com/pystatgen/sgkit/issues/29)
- Variant HWE test (https://github.com/pystatgen/sgkit/issues/28)
- Sample call rate/count (https://github.com/pystatgen/sgkit/issues/29)
- An
is_autosome
function to filter variants by - A function to convert genotype probabilities to hard calls (https://github.com/pystatgen/sgkit/issues/346)
- A linear regression function (https://github.com/pystatgen/sgkit/pull/52)
- A variant annotation function like vep. There are plenty of other ways to get this but an internal function would be great.
- A phenotype normalization pipeline. I don’t expect much of this to become part of sgkit, but there might be some generalizable phenotype-specific functions that are worth considering for inclusion.
There may be a few more beyond that, but I think anything remaining should be reasonable with Xarray/Dask alone.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Genome-wide association studies - GWAS Catalog
Studies are eligible for inclusion in the GWAS Catalog if they meet the following criteria: Include a primary GWAS analysis, defined as array-based...
Read more >Details and considerations of the UK Biobank GWAS - Neale lab
Furthermore, under a basic 2x2 testing framework (i.e. case/control x allele A/allele B), a fully penetrant allele would require around 25 ...
Read more >Selecting Primary UKB Round 2 Phenotypes for LDSR Analysis
The UKB Round 2 GWAS contains 11685 GWAS of 4236 unique phenotype codes (3011 PHESANT + 559 FinnGen + 633 ICD10 + 31...
Read more >How to run GWAS from UK Biobank efficiently on Hail
initiate hail environment · import phenotype info · import genotype information · add mfi information to variants · varinat qc to get allele ......
Read more >Comprehensive genomic analysis of dietary habits in UK ...
Dietary habit GWAS in UKB identifies 814 independent loci ... We have complied with all relevant ethical regulations for work with UKB and ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@eric-czech I just had a nice chat with @zietzm and @ntatonetti who are at Columbia and are experts in handling complex phenotypes and running many GWAS against them.
They’re interested in using
sgkit
and possibly contributing back, particularly on the phenotype side.Would you be open to making https://github.com/related-sciences/ukb-gwas-pipeline-nealelab public soon and potentially working with @zietzm to factor the phenotype handling code into its own repo, maybe something like
sgkit-pheno
orphenokit
?I’ve been thinking about this one too. I think we’re going to feel Dask’s poor handling of nested data when working with phenotypes, and I’d prefer to keep Spark out of this project as a dependency, so I think we put that code into a separate repo if we find we do need Spark.