Remove slash from all data variable names
See original GitHub issueWe’re currently naming variables like call/genotype
, variant/id
, sample/id
, etc., but I think we should switch to call_genotype
, variant_id
, sample_id
, etc.
The disadvantages of using the slashes are:
- Xarray stores these as separate Zarr groups which means you can’t load an sgkit dataset with single command. You have to instead do something like this:
ds = xr.merge([xr.open_zarr(path, group=g) for g in ['call', 'variant', 'sample'])
. There is no clear advantage to having the variables split up on disk by this grouping. If they were instead grouped by something more meaningful like contig, the partitioning would make more sense but creating directories based on similar variables does not. - Assigning variables requires a kwargs splat rather than using the simpler
ds.assign(call_genotype=...)
syntax, e.g.ds.assign(**{'call/genotype': ...})
- I’ve found that for some datasets, you can’t pass custom Zarr encodings to Xarry when variables have ‘/’ in the name – the bug has been hard to reproduce on a small dataset so I’m not sure why yet.
- You cannot autocomplete variable names on a dataset instance
The only disadvantage I can see to not using the ‘/’ is that it offers a convenient delimiter for extracting the group name for a set of variables like “variant” or “call”. I don’t think that’s difficult to live without and using underscore case is more common in other pydata projects anyhow.
@alimanfoo or @tomwhite do you have any objections to this?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:11 (1 by maintainers)
Top Results From Across the Web
How to Remove Slashes in Set Variable Activity for use in ...
HELLO! I have the following code in the Web Activity (and a SET VARIABLE ACTIVITY) and I need to get rid of the...
Read more >What is the simplest way to remove a trailing slash from each ...
I just add a slash and remove all doubles. Assuming such a pattern will not be found elsewhere. WORD="abc/" WORD=$WORD'/' ...
Read more >Pandas - Remove special characters from column names
Let us see how to remove special characters like #, @, &, etc. from column names in the pandas data frame. Here we...
Read more >Delete everything before last slash in bash variable
I have a folder full of tar archives, each one of them containing multiple files. I loop through them as to get a...
Read more >stripslashes - Manual - PHP
If You want to delete all slashes from any table try to use my function: function no_slashes($array) { foreach($array as $key=>$value)
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes and I don’t want to use them: https://github.com/pystatgen/sgkit/issues/17
Hi Eric, no objection, sounds like several things will fit with xarray more naturally if we avoid slash so happy to go with your suggestion.
On Sat, 1 Aug 2020, 15:22 Eric Czech, notifications@github.com wrote: