Performance improvement: sfaira.data.store.io_data.read_dao
See original GitHub issueThe sfaira.data.store.io_data.read_dao
function spends most of it’s time reading in a pickle file:
https://github.com/theislab/sfaira/blob/aeaa60ff128046b7564aa9cccd6293a9300e5a31/sfaira/data/store/io_dao.py#L117
Maybe there is a more efficient way to read in the necessary data.
Here’s a screenshot of the profiler output:
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Data stores — sfaira v0.3.11+5.g8ebd63b documentation
The DAO store format is a on-disk representation of single-cell data which is optimised for generator-based access and distributed access. In brief, DAO...
Read more >Sfaira accelerates data and model reuse in single cell genomics
Sfaira is a data and model zoo that automates common steps in exploratory single-cell RNA-seq analysis. a Overview workflow of sfaira data ......
Read more >Untitled
Magic sfera, 75 watt equivalent led candelabra bulb, Orion images 22tpd, ... Vitem vii, Ue46es6100, Data of unemployment in india, Larissa comforter, ...
Read more >Untitled
Small baby gain weight! Accuphase dp-900/dc901, Lavendel bauerngut schiefelbusch. Default page setting in google chrome, Awm 10070, Valentin speranski, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Just checked - the profiler output does not really make much sense cause the
uns
data files are super small, so should be fast to read in.Ran pyinstrument profiler instead of the cProfile and things look much more sensible now:
Looking at this, it doesn’t make much sense to change the current implementation away from using pickle.
This is fixed now with the above mentioned pull request. Issue was that the Anndata OverloadedDict saved lots of unnecessary data which made the saved pickle files extremely large.