question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CLI documentation and path handling

See original GitHub issue

Describe the bug

  • Running sfaira create-dataloader created the data loader in the top level directoy of the sfaira clone, not under sfaira/data/dataloaders/loaders/ as described in step 3
  • Let s use the same path argument definition for annotate and for test, right now they are different, test requires absolute paths. I would go for sfaira/data/dataloaders/loaders/default for path
  • Can sfaira/unit_tests/template_data/ be the default of test-data throughout, eg. in annotate-dataloader? @le-ander Also relates to where the data loder directory is during generation, ie in place in the clone or somewhere else.
  • validate, annotate and test use different ways of supplying the DOI; can we use the same in all of them? The individual yamls discovered in validate with the *yaml operator could also be discovered with os.listdir. I like supplying DOI as a separate argument, ie how it works in annotate, makes it very clear and also easy to copy paste the command (slightly better than the questionaire in test I think?), we could optionally fall back to the questionaire in all of them?
  • take special characters and white spaces out of meta data items that are used to assemble loader file names, e.g. human_lamina propria of mucosa of colon_2019_10x technology_Kinchen_001.yaml should not contain white spaces, here “lamina propria of mucosa of colon” is a meta data label, similar “10x technology”. In sfaira, ie use data/base/dataset:clean_string() for this a lot, maybe you can relay all string conversions to one (maybe this) function.
  • Can we move the data loader unit test out of the unit_module and into the CLI module? I would be ok with not running it with pytest but just as standard function. I find the current format slightly confusing both on the side of reading the CLI and on reading unit tests, this is of course an effect of how we developed this. If we want, we can still later add unit tests for the entre CLI.
  • The docs documentation step 9) states # sfaira annotate <path>`` TODO but this should be sfaira annotate-dataloader sfaira/data/dataloaders/loaders --doi=<DOI> --test-data=sfaira/unit_tests/template_data/
  • In step 11, update # sfaira test-dataloader <path>`` TODO

What works for me right now:

  • sfaira annotate-dataloader sfaira/data/dataloaders/loaders --doi=<DOI> --test-data=sfaira/unit_tests/template_data/
  • sfaira annotate-dataloader sfaira/data/dataloaders/loaders/ --doi=<DOI> --test-data=sfaira/unit_tests/template_data/
  • sfaira test-dataloader <full path to sfaira top level directory> --test-data=<full path to sfaira top level directory/sfaira/unit_tests/template_data/ --doi=<DOI>

These point should result in every CLI call to receive the same arguments

  • path -> let s rename to path_loader (optional, expected in sfaira/data/dataloaders/loaders/ per default)
  • doi (optional -> triggeres questionaire if not given)
  • test-data -> let s rename to path-data (optional, expected in )

The final interface would then be:

  • sfaira annotate-dataloader [--doi] [--path-data,--path_loader]

  • sfaira annotate-dataloader [--doi] [--path-data,--path_loader]

  • sfaira test-dataloader [--doi] [--path-data,--path_loader]

  • Document this in .rst

  • Update figure in docs with merged validata & annotate

  • add sample source attribute to create-dataloader and make it a required attribure

  • make one of gene_id_ensembl_var_key / gene_id_symbols_var_key a required attribute

  • rename assay to assay_sc

  • add missing fields to yaml templates and document them (see eg. here for all the current fields: https://github.com/theislab/sfaira/blob/bug/cli_documentation/sfaira/data/dataloaders/loaders/d10_1101_2020_10_12_335331/human_blood_2020_10x_hao_001.yaml)

  • ask user explicitely if he has celltype annotations and then inform him about whether or not to run annotate-dataloader

I started improving some things here, https://github.com/theislab/sfaira/pull/320, maybe we can settle this issue on that PR. I improved reporting of a number of file not found issues there already, I think if we streamline the paths we have a lot of the hard-to-comprehend issues already addressed.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
le-andercommented, Jul 1, 2021

Hey @Zethson I have complete all the main points above to improve the CLI but did not manage to finish a couple of minor things which I have documented as a list at the bottom of the first comment above. Do you think you could take over from here and finish up the open points together with #314 when you’re normally back at work again? I’ll be back Tuesday next week. Thanks a lot!

1reaction
Zethsoncommented, Jun 29, 2021

All right, then let’s roll with this 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Navigating Files and Directories - Data Carpentry
Perform operations on files in directories outside your working directory. Work with hidden directories and hidden files. Interconvert between absolute and ...
Read more >
get-parameters-by-path — AWS CLI 1.27.32 Command ...
If the service reaches an internal limit while processing the results, it stops the operation and returns the matching values up to that...
Read more >
CLI handling of named .go files and directories inconsistent ...
Welcome. Yes, I'm using a binary release within 2 latest major releases. Only such installations are supported.
Read more >
File path formats on Windows systems | Microsoft Learn
In this article, learn about file path formats on Windows systems, such as traditional DOS paths, DOS device paths, and universal naming ...
Read more >
Vault Commands (CLI) - HashiCorp Developer
The options (flags) come after the command (or subcommand) preceding the path, and the args always follow the path to set API parameter...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found