sfaira finalize-dataloader broken after cellxgene 3.0 schema introduction
See original GitHub issueThe CLI command finalize-dataloader
seems to be broken on dev. this seems to be related to the recent introduction of the new cellxgene schema. this looks like itβs simple to fix but I donβt fully grasp the cellxgene schema handling yet, so @davidsebfischer can you take a look?
β /opt/python/bin/sfaira:8 in <module> β
β β
β 5 from sfaira.cli import main β
β 6 if __name__ == '__main__': β
β 7 β sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) β
β β± 8 β sys.exit(main()) β
β 9 β
β /lustre/groups/ml01/code/katelyn.li/sfaira/sfaira/cli.py:74 in main β
β β
β 71 β # Is the latest sfaira version installed? Upgrade if not! β
β 72 β if not UpgradeCommand.check_sfaira_latest(): β
β 73 β β print('[bold blue]Run [green]sfaira upgrade [blue]to get the latest version.') β
β β± 74 β sfaira_cli() β
β 75 β
β 76 β
β 77 @click.group() β
β β
β /opt/python/lib/python3.8/site-packages/click/core.py:1128 in __call__ β
β β
β 1125 β β
β 1126 β def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any: β
β 1127 β β """Alias for :meth:`main`.""" β
β β± 1128 β β return self.main(*args, **kwargs) β
β 1129 β
β 1130 β
β 1131 class Command(BaseCommand): β
β β
β /opt/python/lib/python3.8/site-packages/click/core.py:1053 in main β
β β
β 1050 β β try: β
β 1051 β β β try: β
β 1052 β β β β with self.make_context(prog_name, args, **extra) as ctx: β
β β± 1053 β β β β β rv = self.invoke(ctx) β
β 1054 β β β β β if not standalone_mode: β
β 1055 β β β β β β return rv β
β 1056 β β β β β # it's not safe to `ctx.exit(rv)` here! β
β β
β /opt/python/lib/python3.8/site-packages/click/core.py:1659 in invoke β
β β
β 1656 β β β β super().invoke(ctx) β
β 1657 β β β β sub_ctx = cmd.make_context(cmd_name, args, parent=ctx) β
β 1658 β β β β with sub_ctx: β
β β± 1659 β β β β β return _process_result(sub_ctx.command.invoke(sub_ctx)) β
β 1660 β β β
β 1661 β β # In chain mode we create the contexts step by step, but after the β
β 1662 β β # base command has been invoked. Because at that point we do not β
β β
β /opt/python/lib/python3.8/site-packages/click/core.py:1395 in invoke β
β β
β 1392 β β β echo(style(message, fg="red"), err=True) β
β 1393 β β β
β 1394 β β if self.callback is not None: β
β β± 1395 β β β return ctx.invoke(self.callback, **ctx.params) β
β 1396 β β
β 1397 β def shell_complete(self, ctx: Context, incomplete: str) -> t.List["CompletionItem"]: β
β 1398 β β """Return a list of completions for the incomplete value. Looks β
β β
β /opt/python/lib/python3.8/site-packages/click/core.py:754 in invoke β
β β
β 751 β β β
β 752 β β with augment_usage_errors(__self): β
β 753 β β β with ctx: β
β β± 754 β β β β return __callback(*args, **kwargs) β
β 755 β β
β 756 β def forward( β
β 757 β β __self, __cmd: "Command", *args: t.Any, **kwargs: t.Any # noqa: B902 β
β β
β /lustre/groups/ml01/code/katelyn.li/sfaira/sfaira/cli.py:228 in finalize_dataloader β
β β
β 225 β Formats .tsvs and runs a full data loader test. β
β 226 β """ β
β 227 β path_loader, path_data, _ = set_paths(loader=path_loader, data=path_data) β
β β± 228 β _full_test(path_loader=path_loader, path_data=path_data, doi=doi, schema=schema, β
β clean_tsvs=True, in_phase_3=True) β
β 229 β
β 230 β
β 231 @sfaira_cli.command() β
β β
β /lustre/groups/ml01/code/katelyn.li/sfaira/sfaira/cli.py:188 in _full_test β
β β
β 185 β β dataloader_validator = DataloaderValidator(path_loader=path_loader, doi=doi, β
β schema=schema) β
β 186 β β dataloader_validator.validate() β
β 187 β β dataloader_tester = DataloaderTester(path_loader, path_data, doi) β
β β± 188 β β dataloader_tester.test_dataloader(clean_tsvs=clean_tsvs, in_phase_3=in_phase_3) β
β 189 β else: β
β 190 β β print('[bold red]The supplied DOI is malformed!') # noqa: W605 β
β 191 β
β β
β /lustre/groups/ml01/code/katelyn.li/sfaira/sfaira/commands/test_dataloader.py:35 in test_dataloader β
β β
β 32 β β Runs a predefined unit test on a given dataloader. β
β 33 β β """ β
β 34 β β self.doi_sfaira_repr = clean_doi(self.doi) β
β β± 35 β β self._test_dataloader(clean_tsvs=clean_tsvs, in_phase_3=in_phase_3) β
β 36 β β
β 37 β def _get_ds(self): β
β 38 β β return get_ds(doi_sfaira_repr=self.doi_sfaira_repr, path_data=self.path_data, β
β path_loader=self.path_loader) β
β β
β /lustre/groups/ml01/code/katelyn.li/sfaira/sfaira/commands/test_dataloader.py:129 in β
β _test_dataloader β
β β
β 126 β β β β β β print(f'[bold red]Did not find column {val} for {x} in data set β
β {k}, found: ' β
β 127 β β β β β β β f'{v.adata.var.columns}.') β
β 128 β β β β β β sys.exit() β
β β± 129 β β β v.streamline_var(match_to_release=None, schema="cellxgene:" + "2.0.0") β
β 130 β β β signal_proc = np.asarray(v.adata.X.sum()).sum() β
β 131 β β β if signal_proc < 0.01 * signal_raw and v.feature_type != "peak": β
β 132 β β β β print('[bold red]Mapping your feature space to a reference annotation β
β resulted in a heavy loss of ' β
β β
β /lustre/groups/ml01/code/katelyn.li/sfaira/sfaira/data/dataloaders/base/dataset.py:283 in β
β streamline_var β
β β
β 280 β β :param verbose: Report feature transformation statistics. β
β 281 β β """ β
β 282 β β self._assert_loaded() β
β β± 283 β β adata_target_ids = get_target_ids(schema=schema) β
β 284 β β if isinstance(match_to_release, dict): β
β 285 β β β match_to_release = match_to_release[self.organism] β
β 286 β β if subset_layer is not None: β
β β
β /lustre/groups/ml01/code/katelyn.li/sfaira/sfaira/data/dataloaders/base/dataset.py:35 in β
β get_target_ids β
β β
β 32 β β β v = schema.split(":")[1] β
β 33 β β else: β
β 34 β β β v = DEFAULT_SCHEMA β
β β± 35 β β target_ids = AdataIdsCellxgeneVersions[v] β
β 36 β else: β
β 37 β β raise ValueError(f"did not recognize schema {schema}") β
β 38 β return target_ids β
β°βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ―
KeyError: '2.0.0'
Issue Analytics
- State:
- Created 9 months ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
"cell", "na", "nucleus"
, see also https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/3.0.0/schema.md#suspension_typeFixed π