Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Discuss preprocessing.annotate_break() API

See original GitHub issue

After #9445 has accidentally been merged prematurely, we should continue discussing the API of the new annotate_break() function. It currently looks like this:

https://github.com/mne-tools/mne-python/blob/903f66aabd769db88a7ba7973abd624ed666695a/mne/preprocessing/artifact_detection.py#L341-L348

Now it appears there was some disagreement regarding intended behavior and the resulting API. Specifically, no consensus was reached regarding the meaning and treatment of the parameter currently called min_break_duration. I’ll paste here some statements from the related discussion:

@drammock

I think @hoechenberger is saying “min_duration refers to length of a break period, i.e. the total time between events (before accounting for t_start_before_prev and t_stop_before_next)”, whereas I think @larsoner is thinking of min_duration as the minimum duration of what ends up getting annotated.

To me the naming is problematic. Any parameter called min_duration should IMO definitely refer to the minimum duration of the resulting annotations (this is, I think, how @larsoner interpreted it). If it turns out to make more sense to parameterize the duration between events (rather than the duration of resultant annotations) then we definitely need a new name for that parameter.

@drammock

to me, it’s more natural to think of this problem as:

how long of a buffer should I leave after an event before ignoring data? how long before the next event do I need to start keeping data again? if the remaining time in between is shorter than X, don’t bother annotating it. I think this indicates that we really need to get the param names and docstring super clear. For the way I’m thinking about it I might say left_buffer, right_buffer, min_duration (where buffer could also be margin or padding or gap) and another possibility is pre_event_buffer and post_event_buffer. Those names might also work for @hoechenberger’s API (where you specify minimum break rather than minimum annotation length) if everyone prefers that, assuming min_duration becomes min_break_duration or (better?) min_inter_event_duration.

@sappelhoff

I agree with the current approach. When I record and analyze my data, I know approximately how long the breaks in the paradigm were. For example every 20 minutes, there was a 1 minute break. And I know that apart from these breaks my paradigm is fast-paced. So I know I can set min_break_duration = 30, and then care about tstart… and tstop. Seems way simpler to me, but apparently that’s controversial, so more opinions would be good.

@drammock

To add a little context / argue for my version: what I’m calling left_buffer and right_buffer are always going to be experiment-dependent, depending on how long your “trial” is / when during the trial you stamp events (i.e., do you include an end-of-trial event or not). In contrast, the min_annotation_duration (as @larsoner and I conceived of it) is not experiment-dependent at all, and might have a sensible default like 1 second or 5 seconds. People should still be able to control it, but it’s possible to set a default that wouldn’t break for certain experiment designs. In @hoechenberger’s proposed API, min_break_duration is confounded with the left and right buffers and is therefore also dependent on experimental design, so there’s no way to set a safe default (I don’t think?)

@hoechenberger

I suppose we’re also thinking about different use cases here… I mostly agree with what @sappelhoff said: I would typically know that, for a given study I ran, breaks last, say, 1+ mins or so, and then I would just set min_break_duration to that value to capture those breaks, whereas …

. In contrast, the min_annotation_duration (as @larsoner and I conceived of it) is not experiment-dependent at all, and might have a sensible default like 1 second or 5 seconds. … this seems to indicate you’re really thinking about capturing rather short event-free segments too? This is not what I initially had in mind when creating this function. So maybe this is also a reason why we have a bit of confusion here?

What I like about the min_break_duration approach is that I don’t have to do any math to capture a (sufficiently long) break. It will cause headaches for short break periods though, as you suggested, yes. But that’s actually not what I typically consider a break in the paradigms I’ve worked with in the past

@drammock

I’m thinking of past experiments I’ve done where I allowed participants to self-determine break length between blocks. So some breaks might be 15 seconds, others might be 5 minutes. But either one might involve big nasty EEG noise due to the participant stretching or scratching their nose.

cc @drammock @larsoner @cbrnr @agramfort @sappelhoff

Issue Analytics

State:
Created 2 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

larsonercommented, Jun 9, 2021

How about renaming the variable to something like event free period? event_free_period or event_free_t? Would that minimise ambiguity?

The quoted comment that you read was actually old, it’s no longer called min_duration but actually min_break_duration which I think is already unambiguous enough. My confusion stemmed from thinking about the old name min_duration as the minimum annotation duration rather than the minimum break between events duration.

is it true that y = x - x1 - x2

Yes I think for the most part you can get from one representation to the other, though there are situations where the current API can create very short annotations. For example, min_break_duration=10.1, t_start_after_previous=t_start_after_previous=5 you can end up with annotations as short as 0.1 sec. However this seems easy enough to overcome by just not setting min_break_duration to be so close to the sum of the buffer limits.

Also it occurred to me that a “min annotation duration” is easily achievable with a one-liner like:

raw.annotations.delete((raw.annotations['description'] == 'BAD_BREAK') &
                       (raw.annotations['duration'] < 1))

So I think in the end I can live with the current API

0reactions

drammockcommented, Jun 9, 2021

I’m still not convinced. To me it feels weird that when using a function that makes annotations, you specify the minimum annotation duration with 3 parameters (min_break_dur - t_start_after_prev - t_end_before_next) instead of with one parameter. It just feels error-prone and unintuitive to me. But it seems I’m out-voted this time.

Top Results From Across the Web

mne.preprocessing.annotate_break

Create Annotations for breaks in an ongoing recording. This function first searches for segments in the data that are not annotated or do...

Preprocessing API - details - OpenVINO™ Documentation

The purpose of this article is to present details on preprocessing API, such as its capabilities and post-processing. Pre-processing Capabilities¶. Below is a ......

Dataset preprocessing - Keras

preprocessing , help you go from raw data on disk to a tf.data.Dataset object that can be used to train a model. Here's...

NLP Text Preprocessing: A Practical Guide and Template

Text preprocessing is traditionally an important step for natural language processing (NLP) tasks. It transforms text into a more digestible ...

Data preprocessing for ML: options and recommendations

You preprocess the raw training data using the transformation implemented in the tf.Transform Apache Beam APIs, and run it at scale on Dataflow....