Preprocessing of raw SpikeGLX data with spikeinterface
See original GitHub issueHi!
We (@grahamfindlay) are intending to sort very long (~48h) neuropixels 1.0 data, for which we are now hitting serious bottlenecks in terms of (pre)processing time and disk usage.
So far, we have been using Bill Karsh’s CatGT tool to preprocess the raw SpikeGLX files, before feeding the preprocessed data into sorters via spikeinterface. CatGT takes care of concatenating the successive “trigger” files (eg run_name_g{gate_index}_t{trigger_index}.ap.bin
), at the same time as it performs various preprocessing steps, the most crucial of which (for us) being sample alignment, artifact removel (“gfix”) and common average referencing.
Since spikeinterface seems to be able to perform the crucial sample alignment step, and also promises to offer the tools to perform the full “destriping” preprocessing presented in the IBL preprocessing white paper and implemented here, we would like to follow Samuel’s suggestion ( https://github.com/SpikeInterface/spikeinterface/issues/1010 ) and perform the full preprocessing and sorting from raw spikeGLX traces, bypassing CatGT.
For us, taking this step would be a serious gain in terms of disk usage (since we will avoid the useless step of writing the binary recording.dat
file), and in terms of following the evolving preprocessing/sorting standards. In short, using spikeinterface from start to finish sounds amazing and we’d be really happy to make the switch.
So, here are a couple questions:
- Has there been any evolutions regarding the preprocessing of neuropixels data presented here: https://spikeinterface.github.io/blog/spikeinterface-destripe/ ? And in particular the “kfilt” destriping?
- Does spikeinterface offer a
gfix
-type automatic detection of short large amplitude artifact? - The trickiest (although not crucial for us in the short term): is spikeinterface/neo equipped to handle the concatenation of raw files across gates/triggers, since there may be overlap or gaps (that should be 0-filled) across these files ? As Bill Karsh put it ,
“all files with the same base run-name share parameters and come from the same underlying SpikeGLX hardware run (a continuous stream of consecutive samples), so have a time relation that allows them to be sewn back together (but with possible gaps and/or overlaps that need to be trimmed). The metadata ‘firstSample’ item is the starting sample number of this file (in that common underlying stream). CatGT can sew g and t series files back together”
Thank you so much for your continuous help, and let me say it is very exciting to see this project make such huge advances! Tom
Issue Analytics
- State:
- Created a year ago
- Comments:13 (11 by maintainers)
Thanks @samuelgarcia for the thorough response.
Hi @JoeZiminski Great to hear that, we’ll be happy to keep in touch regarding the timeline for the kfilt averaging, thank you! We’ve mostly used gfix to clean stimulation artifacts, which is something that can be done by spikeinterface with the remove_artifacts function, so it is not crucial for us, but in general it might be interesting for spikeinterface to offer CatGT’s preprocessing options. As you see fit!
Not yet, but Joe Mininski from London is working on it in coordination with Olivier Winter.
No, we do not have this. Depending the complexity, this could be done easily or not. All preprocessing have to be “lazy”, so done on demand by
get_trace()
. Does this gifx is somehow a detector with simple threshold and then zero masking or is it a complex processing ? (We haveremove_artifacts()
but index have to be provided externaly.)The “concatenation” of “multi segment” (aka multi binary file) is handle also in a lazy mode with 2 differents flavors in spikeinterface (“append” = true multi segment or “conatenate” = virtual mono segment) We do not fill with zero because we handle explitly the multi segment problem.
In spikeinetrface, you can already do this
rec = si.read_binary(file_paths=['file1.dat', 'file2.dat', 'file3.dat'], ...)
andrec
will be a 3 segments recording.Please have a deep look at this : https://spikeinterface.readthedocs.io/en/latest/modules/core/plot_5_append_concatenate_segments.html#sphx-glr-modules-core-plot-5-append-concatenate-segments-py
Important, no sorter at the moment handle the “multi segment” correctly problem except the experimental ones inside spikeinetrface (“spkykingcircus2” quite advanced, “tridesclous” not working yet). This is why in many cases you need to write everything to single gigantic file with the
rec.save()
which is the same somehow as using the CatGT but more flexible.