question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Replicating Hail's GWAS Tutorial

See original GitHub issue

I decided to work on https://github.com/pystatgen/sgkit/issues/88 by trying to implement the Hail GWAS Tutorial with sgkit.

I’ll update this issue with my experiences. I barely know what I’m doing, so this should be fun.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:4
  • Comments:19 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
tomwhitecommented, Feb 23, 2021

With #471, it’s possible to create a histogram of DP values:

vcf_to_zarr(VCF_FILE, "1kg.zarr", format_fields=["DP"])
ds = sg.load_dataset("1kg.zarr")
dp = ds.call_DP.where(ds.call_DP >= 0) # filter out missing
xr.plot.hist(dp, range=(0,30), bins=30)

dp

1reaction
tomwhitecommented, Apr 27, 2021
  • The datasets have a substantial number of variables, which causes Xarray to collapse them in the HTML view, which means that new users might miss them. It would be nice if we could control this better. Also, the dataset attributes (that we are not so interested in) are shown expanded - it would be better if they were collapsed.

Now that https://github.com/pydata/xarray/pull/5126 is in, when the next version of Xarray is released (0.17.1), we can put the following at the top of the notebook to get the effect we want:

xr.set_options(display_expand_attrs=False, display_expand_data_vars=True)
Read more comments on GitHub >

github_iconTop Results From Across the Web

GWAS Tutorial - Hail
We walk through a genome-wide SNP association test, and demonstrate the need to control for confounding caused by population stratification.
Read more >
A tutorial on conducting genome‐wide association studies
This tutorial provides a guideline to researchers who wish to incorporate genetics into their studies but do not have a formal background in...
Read more >
Plotting Genome-Wide Association Results | Broad Institute
The interpretation of genome-wide association results can be greatly facilitated by visualization. As part of the type 2 diabetes whole-genome scan, ...
Read more >
BroadE: Hail - Practical 2: Genome Wide Association Studies ...
Hail is an open-source library that provides accessible interfaces for exploring genomic data, with a backend that automatically scales to ...
Read more >
Genome-wide association analysis and replication in ... - Nature
We describe the largest two-stage genome-wide association study of varicose veins in 401,656 individuals from UK Biobank, and replication in ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found