question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Next generation jams

See original GitHub issue

This issue is intended to consolidate many of the long-standing issues and offline discussions we’ve had around revising the jams specification for a variety of applications and use-cases.


Goals of revising the schema

  1. Migrate to a fully json-schema compliant spec #178 (instead of hybrid / dynamic namespaces).
    • This would facilitate a number of applications, like #19 (quick look / web view and edit), #40 (collection management), and #86 (fragmented storage).
  2. Add versioning to the schema definitions. This way, old files can still validate according to their specified jams version. This in turn makes it easier to evolve the schema without breaking compatibility.
  3. Simplify (and accelerate) the validation code from the python side.

Revision phase 1: full json-schema definition

The first step is to move all namespace definitions into full jsonschema definitions. In the proposed change, a namespace definition now becomes a secondary schema for the Annotation object.

Annotation objects must validate against both the template schema (our current annotation schema def), and exactly one of the pre-defined namespace schemas. Each namespace schema defines an exact match on the Annotation.namespace field, in addition to whatever constraints placed on the value and confidence fields.

The is_sparse flag will be removed, as this is not part of jsonschema. (We’ll come back to this later).

This phase will complete #178 .

Revision phase 2: hosted and versioned schema

Completing phase 1 will result in a fully json-schema compatible implementation of our specification, against which all current JAMS files should validate.

The next step (phase 2) is to place this schema under version control and host it remotely (e.g. `jams.github.io/schema/v0.3/schema.json`` or something). We can then revise the schema to include a version number in its definition, so that jams files can self-identify which version they are valid under.

With the remote schema implementation, it should be possible/easy to promote all jams definitions to top-level objects, so that you can independently validate an Annotation or FileMetadata object without having it belong to a full JAMS file.

This phase will complete #86 and facilitate #40 , by allowing partial storage.

Revision phase 3: extending the Annotation class

As mentioned in #24 , the current annotation structure might be a bit too rigid for more general media objects. @justinsalamon and I discussed this offline, and arrived at the following proposal:

  • Rename Annotation def to IntervalAnnotation, in which observations are (time, duration, value, confidence) tuples
  • Add new annotation types
    • StaticAnnotation: just (value, confidence)
    • BoundingBoxAnnotation: (x, y, width, height, value, confidence)
    • TimeBoundingBoxAnnotation: (time, x, y, duration, width, height, value, confidence)
    • possibly others: polygons, instantaneous samples, etc…
  • Annotation validation now becomes and(oneOf([Interval, Static, BoundingBox, ...]), oneOf([namespaces]))

This provides maximal flexibility in combining different annotation contents (tags etc) with annotation extents (time intervals, bounding boxes, etc). Including a StaticAnnotation type also provides a way to resolve #206.

Phase 3 completes the proposed changes to the schema.

Alongside schema changes, we also want to generalize some things about the python implementation. Notably, it would be good to extend the search function to also support annotation contents. This way, we could find and excerpt annotations by value (eg time intervals labeled as guitar or bounding boxes with face). This isn’t a huge change from what the search function already does, but it will take a bit more implementation work.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
MCMcCallumcommented, Jan 2, 2021

I’ve been taking a crack at this over the break. I’m most of the way there, though I’ve realized to make this work we may have to alter the JAMs schema a little, resulting in some currently existing valid jams data becoming invalid in the latest version…

Currently we have a list of annotations which can each contain a list of observations, and that list of observations can either be a sparse type or a dense type of list. I’m proposing that we change this to always be a list of observations (i.e., no dense / not dense distinction) and that observation type therein can either be a single observation (in the sparse case), or a observation containing lists of values (in the dense case). This will move all current jams dense observations down one level to the observation type, rather than being a different Annotation type overall.

This way the Annotation type itself has all the non-data dependent properties (e.g., Curator, sandbox, etc…) and it is only its data attribute that is defined by the observation type (both the. data and namespace attributes will be defined by the namespace). This data attribute is always an array of observations, and in the case of current DenseObservation types that exist out there in the wild, it will be a single element array with the observation type itself containing value, confidence, time, and duration arrays.

This greatly simplifies the code and schema, but will change the schema for dense observations from something like:

{
    "annotations": [
        {
            "data": {
                "value": [ 1.0, 0.5 ],
                "time": [ 1.0, 2.0 ],
                "confidence": [ 0.9, 0.9 ],
                "duration": [ 1.0, 1.0 ]
            }
        }
    ]
}

to something like:

{
    "annotations": [
        {
            "data": [
                {
                    "values": [ 1.0, 0.5 ],
                    "times": [ 1.0, 2.0 ],
                    "confidences": [ 0.9, 0.9 ],
                    "durations": [ 1.0, 1.0 ]
                }
            ]
        }
    ]
}

This has the added benefit of one annotation having possibly multiple dense observations in an Annotation. E.g., in the case of pitch contours, multiple pitch contours beginning and ending according to a vocal activity detector, or in an annotation application where the annotator is able to draw contours over a waveform, each drawn contour could be sampled and represented as a single DenseObservation.

At phase 3 of this issue, we can then further include a dense sampled observation type, e.g.:

{
    "annotations": [
        {
            "data": [
                {
                    "values": [ 1.0, 0.5, 0.3 ],
                    "start_time": 1.0,
                    "sample_rate": 1000.0,
                }
            ]
        }
    ]
}
1reaction
bmcfeecommented, Oct 14, 2020

Had a chat with @rabitt about some of this at ISMIR, and she pointed out that we currently have a bit of a blind spot when it comes to annotations of symbolic data. Concretely, objects like a score or a midi file may not have a fixed “duration” (in seconds), but may have similar extent specifications in terms of beats or ticks.

This seems soluble in the proposed framework by introducing extent types for symbolic data. We may need to wiggle a bit on the top-level schema (JAMS object) to make this work, but I think it would be worth doing in the long run.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Next Generation Jam (@nextgenerationjam) • Instagram ...
Next Generation Jam. Bikes are fun! Kids rule! April 8th, 2023 @thewheelmill · #NextGenJam · youtu.be/oWM6ZJefTmw. 414 posts. 1,196 followers. 523 following.
Read more >
NextGen Jam | HoopSeen
The future of middle school basketball will start at the NextGen Jam as the elite grassroots programs from across the region will come...
Read more >
Next Generation Jammer - NAVAIR
The Next Generation Jammer (NGJ) System is an external carriage Airborne Electronic Attack capability for the EA-18G that will provide enhanced capabilities ...
Read more >
Next Generation Jam - YouTube
NEXT GENERATION JAM 2018 | OFFICIAL RECAP VIDEO · NEXT GENERATION JAM 4. | Ayo VS Čechu · NEXT GENERATION JAM 4. |...
Read more >
Domino Next Gen Jams 2022 - HCL Software
Join us at one of our in-person Domino NextGen Jams and help shape the future of our product! Morning Session: Explore the future...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found