question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Preprocessing of raw SpikeGLX data with spikeinterface

See original GitHub issue

Hi!

We (@grahamfindlay) are intending to sort very long (~48h) neuropixels 1.0 data, for which we are now hitting serious bottlenecks in terms of (pre)processing time and disk usage.

So far, we have been using Bill Karsh’s CatGT tool to preprocess the raw SpikeGLX files, before feeding the preprocessed data into sorters via spikeinterface. CatGT takes care of concatenating the successive “trigger” files (eg run_name_g{gate_index}_t{trigger_index}.ap.bin), at the same time as it performs various preprocessing steps, the most crucial of which (for us) being sample alignment, artifact removel (“gfix”) and common average referencing.

Since spikeinterface seems to be able to perform the crucial sample alignment step, and also promises to offer the tools to perform the full “destriping” preprocessing presented in the IBL preprocessing white paper and implemented here, we would like to follow Samuel’s suggestion ( https://github.com/SpikeInterface/spikeinterface/issues/1010 ) and perform the full preprocessing and sorting from raw spikeGLX traces, bypassing CatGT.

For us, taking this step would be a serious gain in terms of disk usage (since we will avoid the useless step of writing the binary recording.dat file), and in terms of following the evolving preprocessing/sorting standards. In short, using spikeinterface from start to finish sounds amazing and we’d be really happy to make the switch.

So, here are a couple questions:

  • Has there been any evolutions regarding the preprocessing of neuropixels data presented here: https://spikeinterface.github.io/blog/spikeinterface-destripe/ ? And in particular the “kfilt” destriping?
  • Does spikeinterface offer a gfix-type automatic detection of short large amplitude artifact?
  • The trickiest (although not crucial for us in the short term): is spikeinterface/neo equipped to handle the concatenation of raw files across gates/triggers, since there may be overlap or gaps (that should be 0-filled) across these files ? As Bill Karsh put it ,

“all files with the same base run-name share parameters and come from the same underlying SpikeGLX hardware run (a continuous stream of consecutive samples), so have a time relation that allows them to be sewn back together (but with possible gaps and/or overlaps that need to be trimmed). The metadata ‘firstSample’ item is the starting sample number of this file (in that common underlying stream). CatGT can sew g and t series files back together”

Thank you so much for your continuous help, and let me say it is very exciting to see this project make such huge advances! Tom

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
TomBugnoncommented, Oct 19, 2022

Thanks @samuelgarcia for the thorough response.

Hi @JoeZiminski Great to hear that, we’ll be happy to keep in touch regarding the timeline for the kfilt averaging, thank you! We’ve mostly used gfix to clean stimulation artifacts, which is something that can be done by spikeinterface with the remove_artifacts function, so it is not crucial for us, but in general it might be interesting for spikeinterface to offer CatGT’s preprocessing options. As you see fit!

1reaction
samuelgarciacommented, Oct 18, 2022

Has there been any evolutions regarding the preprocessing of neuropixels data presented here: https://spikeinterface.github.io/blog/spikeinterface-destripe/ ? And in particular the “kfilt” destriping?

Not yet, but Joe Mininski from London is working on it in coordination with Olivier Winter.

Does spikeinterface offer a gfix-type automatic detection of short large amplitude artifact?

No, we do not have this. Depending the complexity, this could be done easily or not. All preprocessing have to be “lazy”, so done on demand by get_trace() . Does this gifx is somehow a detector with simple threshold and then zero masking or is it a complex processing ? (We have remove_artifacts() but index have to be provided externaly.)

The trickiest (although not crucial for us in the short term): is spikeinterface/neo equipped to handle the concatenation of raw > files across gates/triggers, since there may be overlap or gaps (that should be 0-filled) across these files ? As https://github.com/SpikeInterface/spikeinterface/issues/628#issuecomment-1130216875

The “concatenation” of “multi segment” (aka multi binary file) is handle also in a lazy mode with 2 differents flavors in spikeinterface (“append” = true multi segment or “conatenate” = virtual mono segment) We do not fill with zero because we handle explitly the multi segment problem.

In spikeinetrface, you can already do this rec = si.read_binary(file_paths=['file1.dat', 'file2.dat', 'file3.dat'], ...) and rec will be a 3 segments recording.

Please have a deep look at this : https://spikeinterface.readthedocs.io/en/latest/modules/core/plot_5_append_concatenate_segments.html#sphx-glr-modules-core-plot-5-append-concatenate-segments-py

Important, no sorter at the moment handle the “multi segment” correctly problem except the experimental ones inside spikeinetrface (“spkykingcircus2” quite advanced, “tridesclous” not working yet). This is why in many cases you need to write everything to single gigantic file with the rec.save() which is the same somehow as using the CatGT but more flexible.

Read more comments on GitHub >

github_iconTop Results From Across the Web

API — spikeinterface documentation - Read the Docs
Class for reading data from “Raw” Multi Channel System (MCS) format. ... The format is like spikeglx (have a meta file) but contains:....
Read more >
spiketutorials/SpikeInterface_Tutorial.ipynb at master - GitHub
Tutorials for using SpikeInterface on extracellular datasets. ... This allows users to access the preprocessed data in the same way as the raw...
Read more >
SpikeInterface, a unified framework for spike sorting - eLife
The preprocessing module provides functions to process raw extracellular recordings before spike sorting. To pre-process an extracellular ...
Read more >
SpikeInterface for building seamless extracellular ... - YouTube
Presented by Alessio Buccino on Sep 21, 2020 at the NWB Remote User Days #9.
Read more >
SpikeInterface, a unified framework for spike sorting - PMC
Only four spike sorters were capable of processing this data set (HS ... The preprocessing module provides functions to process raw extracellular recordings ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found