proposal to set up language-agnostic testing of inheritance modes
See original GitHub issuePutting this here in gemini issues for lack of a better place …
Testing variants under different inheritance modes is hard.
It’s implemented in multiple places including
- gemini
- xbrowse
- genmod
and likely more.
I propose to make a simple-to-parse, programming-language and implementation-agnostic text format to be used for testing inheritance tools.
Minimally, the format must indicate:
- genotypes
- family structure
- inheritance mode(s) that should be determined by the tool
It will support only single families.
As a first draft in json format would look like this:
{
'name_of_test':
{'genotypes': ['0/1', '0/1', '1/1'],
'pedigree': ['dad:u:m', 'mom:u:f', 'kid:a:f:dad:mom'],
'modes': {'autosomal_recessive': true, 'autosomal_dominant': false},
'comment': "simple autosomal-recessive test"
},
'dominant-1':
{'genotypes': ['0/0', '0/1', '0/1'],
'pedigree': ['dad:u:m', 'mom:a:f', 'kid:a:f:dad:mom'],
'modes': {'autosomal_dominant': true, 'x-linked-dominant': true},
'comment': "autosomal and x-linked-dominant"
},
}
where the format in pedigree is a flat list where each sample is essentially a condensed line from a ped file with ‘:’-delimited values of:
+ sample_id
+ affection-status: 'u'naffected, 'a'ffected, 'c'arrier '?'unknown
+ sex: 'm'ale, 'f'emale, 'u'nknown
+ dad-id (optional)
+ mom-id (optional)
This will be more concise than actual ped format, but still allow specifying multiple generation pedigrees.
Modes indicate what patterns the proposed test meets. Available modes are:
- ‘autosomal-dominant’
- ‘autosomal-recessive’
- ‘de-novo’
- ‘x-linked-recessive’
- ‘x-linked-dominant’
- ‘x-linked-denovo’
- ‘compount-het’ # will require a nested list of genotypes.
With this in place, any implementation can use this (hopefully) community-driven and -vetted set of tests.
cc: @bw2 @moonso @jxchong @arq5x
If anyone is interested, perhaps we can start a new repo and have peer-review on pull-requests that add test-cases to the above JSON.
The repo will consist only of the json tests and docs in the README.
We are planning to do this anyway for gemini as we support expanding genotypes into separate tables so we will need a set of implementation-independent tests.
Please let me know any comments and CC anyone else who might be interested.
-Brent
Issue Analytics
- State:
- Created 7 years ago
- Comments:6
Top GitHub Comments
Sure I agree on the problem of searching for errors in the vcf. So if we have a well defined json format we could add a small script in this new package that converts the json to .ped and .vcf?
I have a start for this here: https://github.com/quinlan-lab/mendacity