question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Remove slash from all data variable names

See original GitHub issue

We’re currently naming variables like call/genotype, variant/id, sample/id, etc., but I think we should switch to call_genotype, variant_id, sample_id, etc.

The disadvantages of using the slashes are:

  • Xarray stores these as separate Zarr groups which means you can’t load an sgkit dataset with single command. You have to instead do something like this: ds = xr.merge([xr.open_zarr(path, group=g) for g in ['call', 'variant', 'sample']). There is no clear advantage to having the variables split up on disk by this grouping. If they were instead grouped by something more meaningful like contig, the partitioning would make more sense but creating directories based on similar variables does not.
  • Assigning variables requires a kwargs splat rather than using the simpler ds.assign(call_genotype=...) syntax, e.g. ds.assign(**{'call/genotype': ...})
  • I’ve found that for some datasets, you can’t pass custom Zarr encodings to Xarry when variables have ‘/’ in the name – the bug has been hard to reproduce on a small dataset so I’m not sure why yet.
  • You cannot autocomplete variable names on a dataset instance

The only disadvantage I can see to not using the ‘/’ is that it offers a convenient delimiter for extracting the group name for a set of variables like “variant” or “call”. I don’t think that’s difficult to live without and using underscore case is more common in other pydata projects anyhow.

@alimanfoo or @tomwhite do you have any objections to this?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:11 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
eric-czechcommented, Aug 3, 2020

Yes and I don’t want to use them: https://github.com/pystatgen/sgkit/issues/17

1reaction
alimanfoocommented, Aug 1, 2020

Hi Eric, no objection, sounds like several things will fit with xarray more naturally if we avoid slash so happy to go with your suggestion.

On Sat, 1 Aug 2020, 15:22 Eric Czech, notifications@github.com wrote:

We’re currently naming variables like call/genotype, variant/id, sample/id, etc., but I think we should switch to call_genotype, variant_id, sample_id, etc.

The disadvantages of using the slashes are:

  • Xarray stores these as separate Zarr groups which means you can’t load an sgkit dataset with single command. You have to instead do something like this: ds = xr.merge([xr.open_zarr(path, group=g) for g in [‘call’, ‘variant’, ‘sample’]). There is no clear advantage to have the variables split up on disk by this grouping. If they were instead grouped by something more meaningful like contig, the partitioning would make more sense but creating directories based on similar variables does not.
  • Assigning variables requires a kwargs splat rather than using the simpler ds.assign(call_genotype=…) syntax, e.g. ds.assign(**{‘call/genotype’: …})
  • I’ve found that for some datasets, you can’t pass custom Zarr encodings to Xarry when variables have ‘/’ in the name – the bug has been hard to reproduce on a small dataset so I’m not sure why yet.
  • You cannot autocomplete variable names on a dataset instance

The only disadvantage I can see to not using the ‘/’ is that it offers a convenient delimiter for extracting the group name for a set of variables like “variant” or “call”. I don’t think that’s difficult to live without and using underscore case is more common in other pydata projects anyhow.

@alimanfoo https://github.com/alimanfoo or @tomwhite https://github.com/tomwhite do you have any objections to this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pystatgen/sgkit/issues/81, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLYQR2SL5NMCUEVY37SY3R6QQKTANCNFSM4PRZBAQQ .

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Remove Slashes in Set Variable Activity for use in ...
HELLO! I have the following code in the Web Activity (and a SET VARIABLE ACTIVITY) and I need to get rid of the...
Read more >
What is the simplest way to remove a trailing slash from each ...
I just add a slash and remove all doubles. Assuming such a pattern will not be found elsewhere. WORD="abc/" WORD=$WORD'/' ...
Read more >
Pandas - Remove special characters from column names
Let us see how to remove special characters like #, @, &, etc. from column names in the pandas data frame. Here we...
Read more >
Delete everything before last slash in bash variable
I have a folder full of tar archives, each one of them containing multiple files. I loop through them as to get a...
Read more >
stripslashes - Manual - PHP
If You want to delete all slashes from any table try to use my function: function no_slashes($array) { foreach($array as $key=>$value)
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found