CLI documentation and path handling
See original GitHub issueDescribe the bug
- Running
sfaira create-dataloader
created the data loader in the top level directoy of the sfaira clone, not undersfaira/data/dataloaders/loaders/
as described in step 3 - Let s use the same path argument definition for annotate and for test, right now they are different, test requires absolute paths. I would go for
sfaira/data/dataloaders/loaders/
default forpath
- Can
sfaira/unit_tests/template_data/
be the default oftest-data
throughout, eg. inannotate-dataloader
? @le-ander Also relates to where the data loder directory is during generation, ie in place in the clone or somewhere else. - validate, annotate and test use different ways of supplying the DOI; can we use the same in all of them? The individual yamls discovered in validate with the
*yaml
operator could also be discovered withos.listdir
. I like supplying DOI as a separate argument, ie how it works in annotate, makes it very clear and also easy to copy paste the command (slightly better than the questionaire in test I think?), we could optionally fall back to the questionaire in all of them? - take special characters and white spaces out of meta data items that are used to assemble loader file names, e.g.
human_lamina propria of mucosa of colon_2019_10x technology_Kinchen_001.yaml
should not contain white spaces, here “lamina propria of mucosa of colon” is a meta data label, similar “10x technology”. In sfaira, ie usedata/base/dataset:clean_string()
for this a lot, maybe you can relay all string conversions to one (maybe this) function. - Can we move the data loader unit test out of the unit_module and into the CLI module? I would be ok with not running it with pytest but just as standard function. I find the current format slightly confusing both on the side of reading the CLI and on reading unit tests, this is of course an effect of how we developed this. If we want, we can still later add unit tests for the entre CLI.
- The docs documentation step 9) states
# sfaira annotate <path>`` TODO
but this should besfaira annotate-dataloader sfaira/data/dataloaders/loaders --doi=<DOI> --test-data=sfaira/unit_tests/template_data/
- In step 11, update
# sfaira test-dataloader <path>`` TODO
What works for me right now:
sfaira annotate-dataloader sfaira/data/dataloaders/loaders --doi=<DOI> --test-data=sfaira/unit_tests/template_data/
sfaira annotate-dataloader sfaira/data/dataloaders/loaders/ --doi=<DOI> --test-data=sfaira/unit_tests/template_data/
sfaira test-dataloader <full path to sfaira top level directory> --test-data=<full path to sfaira top level directory/sfaira/unit_tests/template_data/ --doi=<DOI>
These point should result in every CLI call to receive the same arguments
- path -> let s rename to
path_loader
(optional, expected insfaira/data/dataloaders/loaders/
per default) - doi (optional -> triggeres questionaire if not given)
- test-data -> let s rename to
path-data
(optional, expected in )
The final interface would then be:
-
sfaira annotate-dataloader [--doi] [--path-data,--path_loader]
-
sfaira annotate-dataloader [--doi] [--path-data,--path_loader]
-
sfaira test-dataloader [--doi] [--path-data,--path_loader]
-
Document this in .rst
-
Update figure in docs with merged validata & annotate
-
add sample source attribute to create-dataloader and make it a required attribure
-
make one of gene_id_ensembl_var_key / gene_id_symbols_var_key a required attribute
-
rename assay to assay_sc
-
add missing fields to yaml templates and document them (see eg. here for all the current fields: https://github.com/theislab/sfaira/blob/bug/cli_documentation/sfaira/data/dataloaders/loaders/d10_1101_2020_10_12_335331/human_blood_2020_10x_hao_001.yaml)
-
ask user explicitely if he has celltype annotations and then inform him about whether or not to run annotate-dataloader
I started improving some things here, https://github.com/theislab/sfaira/pull/320, maybe we can settle this issue on that PR. I improved reporting of a number of file not found issues there already, I think if we streamline the paths we have a lot of the hard-to-comprehend issues already addressed.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:13 (13 by maintainers)
Top GitHub Comments
Hey @Zethson I have complete all the main points above to improve the CLI but did not manage to finish a couple of minor things which I have documented as a list at the bottom of the first comment above. Do you think you could take over from here and finish up the open points together with #314 when you’re normally back at work again? I’ll be back Tuesday next week. Thanks a lot!
All right, then let’s roll with this 😃