Discuss preprocessing.annotate_break() API
See original GitHub issueAfter #9445 has accidentally been merged prematurely, we should continue discussing the API of the new annotate_break()
function. It currently looks like this:
Now it appears there was some disagreement regarding intended behavior and the resulting API. Specifically, no consensus was reached regarding the meaning and treatment of the parameter currently called min_break_duration
. I’ll paste here some statements from the related discussion:
I think @hoechenberger is saying “min_duration refers to length of a break period, i.e. the total time between events (before accounting for t_start_before_prev and t_stop_before_next)”, whereas I think @larsoner is thinking of min_duration as the minimum duration of what ends up getting annotated.
To me the naming is problematic. Any parameter called min_duration should IMO definitely refer to the minimum duration of the resulting annotations (this is, I think, how @larsoner interpreted it). If it turns out to make more sense to parameterize the duration between events (rather than the duration of resultant annotations) then we definitely need a new name for that parameter.
to me, it’s more natural to think of this problem as:
how long of a buffer should I leave after an event before ignoring data? how long before the next event do I need to start keeping data again? if the remaining time in between is shorter than X, don’t bother annotating it. I think this indicates that we really need to get the param names and docstring super clear. For the way I’m thinking about it I might say left_buffer, right_buffer, min_duration (where buffer could also be margin or padding or gap) and another possibility is pre_event_buffer and post_event_buffer. Those names might also work for @hoechenberger’s API (where you specify minimum break rather than minimum annotation length) if everyone prefers that, assuming min_duration becomes min_break_duration or (better?) min_inter_event_duration.
I agree with the current approach. When I record and analyze my data, I know approximately how long the breaks in the paradigm were. For example every 20 minutes, there was a 1 minute break. And I know that apart from these breaks my paradigm is fast-paced. So I know I can set min_break_duration = 30, and then care about tstart… and tstop. Seems way simpler to me, but apparently that’s controversial, so more opinions would be good.
To add a little context / argue for my version: what I’m calling left_buffer and right_buffer are always going to be experiment-dependent, depending on how long your “trial” is / when during the trial you stamp events (i.e., do you include an end-of-trial event or not). In contrast, the min_annotation_duration (as @larsoner and I conceived of it) is not experiment-dependent at all, and might have a sensible default like 1 second or 5 seconds. People should still be able to control it, but it’s possible to set a default that wouldn’t break for certain experiment designs. In @hoechenberger’s proposed API, min_break_duration is confounded with the left and right buffers and is therefore also dependent on experimental design, so there’s no way to set a safe default (I don’t think?)
I suppose we’re also thinking about different use cases here… I mostly agree with what @sappelhoff said: I would typically know that, for a given study I ran, breaks last, say, 1+ mins or so, and then I would just set min_break_duration to that value to capture those breaks, whereas …
. In contrast, the min_annotation_duration (as @larsoner and I conceived of it) is not experiment-dependent at all, and might have a sensible default like 1 second or 5 seconds. … this seems to indicate you’re really thinking about capturing rather short event-free segments too? This is not what I initially had in mind when creating this function. So maybe this is also a reason why we have a bit of confusion here?
What I like about the min_break_duration approach is that I don’t have to do any math to capture a (sufficiently long) break. It will cause headaches for short break periods though, as you suggested, yes. But that’s actually not what I typically consider a break in the paradigms I’ve worked with in the past
I’m thinking of past experiments I’ve done where I allowed participants to self-determine break length between blocks. So some breaks might be 15 seconds, others might be 5 minutes. But either one might involve big nasty EEG noise due to the participant stretching or scratching their nose.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
The quoted comment that you read was actually old, it’s no longer called
min_duration
but actuallymin_break_duration
which I think is already unambiguous enough. My confusion stemmed from thinking about the old namemin_duration
as the minimum annotation duration rather than the minimum break between events duration.Yes I think for the most part you can get from one representation to the other, though there are situations where the current API can create very short annotations. For example, min_break_duration=10.1, t_start_after_previous=t_start_after_previous=5 you can end up with annotations as short as 0.1 sec. However this seems easy enough to overcome by just not setting
min_break_duration
to be so close to the sum of the buffer limits.Also it occurred to me that a “min annotation duration” is easily achievable with a one-liner like:
So I think in the end I can live with the current API
I’m still not convinced. To me it feels weird that when using a function that makes annotations, you specify the minimum annotation duration with 3 parameters (
min_break_dur - t_start_after_prev - t_end_before_next
) instead of with one parameter. It just feels error-prone and unintuitive to me. But it seems I’m out-voted this time.