on possible (minor) tweaks to Annotation.to_samples
See original GitHub issueholy moly I can’t thank past us (but mostly @bmcfee and @justinsalamon) for making #109 happen, just bailed me out of an annoying issue I’m having with open-tag JAMS and issues around what to do with null events (separate issue, will cease tangent).
In working with this super useful feature, I’ve a couple ideas / questions for the crowd. Not super confident these are worth the effort yet, but this is why this is a question and not a PR…
- What if the instantaneous values object is a
tuple
instead of alist
? This would aid in the reuse of other objects (e.g.set
,collections.Counter
) which need a hashable type for reduction. This wouldn’t necessarily be super useful for continuous / float values, but categorical (e.g. tags, like in my case) would be way helpful. The two counterarguments I see are: I understand from looking at the source how a mutable list makes this so much easier to implement; and, if this isn’t a standard case, it’s easy enough to lambda-map the results into a hashable type after and then do this. - Thoughts on adding a
fill_value
field to the method’s interface? In my case, only positive intervals are labeled, and so I get back empty lists where there is no range. It’d be great to backfill the null class at sample time, and at first blush this seems like an easy feature … the only issue I see is, what default parameter would give the current result? It can’t beNone
, because one could truly wantNone
as the backfill value, e.g.[[None], [None], ... ]
. It shouldn’t be[]
, because semantically one would expect[[[]], [[]], ... ]
. Any ideas on this one?
If nothing else, I skimmed the issue that originally spawned this feature, and didn’t see a discussion of either (1) or (2) above, and figured they’d be worth adding to the conversation. Tagging this as wontfix is a potentially fine outcome.
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Avoiding top pitfalls in annotation projects | by Maria Mestre
This can help you sample examples to annotate across all your labelling categories, and assess the size of each category.
Read more >Purposeful Annotation: A "Close Reading" Strategy that ...
When my students have a text they can write on, the idea, then, is to annotate in a way that supports our purpose...
Read more >How to Annotate with Bounding Boxes [Guide & Examples]
Bounding box annotation is the process of labeling objects in an image with rectangular boxes. Learn how to do bounding box annotation efficiently...
Read more >Best Practices for Managing Data Annotation Projects
Consider Strategic Adjustments if Additional Annotations Are Needed . ... Where possible, examples should consist of up-to-date screenshots of an annotated ...
Read more >Annotation setup
Most annotation guidelines will need a lot of detail and examples to ensure that human annotators consistently annotate text. The example presented here...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’m leaning toward leaving it as is. Tuples, being immutable, can be a little unwieldy for a lot of the things we want to use values outputs for (eg slicing down to a fixed vocab).
I like this in theory, but as you say, the API for it seems awkward, especially when you consider that it should be consistent across all namespaces. You could do it in two steps by having a flag to control backfill, and a separate fill_value parameter to handle the data itself.
I guess that’s valid. If a user supplies a bad fill value, that’s on them.
So to recap, here’s the current logic:
value
for each sample position as an empty list. (Repeat for confidences.)The reason for all the list hackery is that observations can overlap, so the
to_samples
definition for a value at timet
is the union of value fields. Any sample that’s outside of any labeled interval retains the empty list as its values.The proposed change would allow a user to change this by providing a list of default values that it initializes with instead of the empty list. In writing this up, I see two problems with this idea that had escaped my attention before:
I’m beginning to think this not worth implementing. It’s easy enough for a user to post-process the values array as follows:
and then get on with their life. I think I prefer this solution over trying to implement something general-purpose that leads to awkward and confusing API decisions.