question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Taking just intonation seriously

See original GitHub issue

At the risk of opening a horrible can of worms, I’m opening this discussion to get thoughts, opinions, and feedback on the possibility of extending librosa to support just intonation where appropriate. I’ll preface this by saying that I barely know what I’m talking about, and welcome discussion from anyone who’s well-versed in this stuff.

Background

We currently have implicit assumptions of equal temperament (ET) throughout the package. This arose from a few related motivations:

  • Simplicity of implementation and interpretation
  • Compatibility with MIDI and other quasi-standards
  • It’s what most people want and expect anyway

Practically speaking, ET assumptions are most important in the following parts of the code:

  • CQT: the assumption that octaves are evenly divided is baked into the algorithm. (Less so for IIRT, which can also generate semitone filters.)
  • Chroma features: partially this inherits from CQT assumptions, but there’s also the issue of having a default of 12 bins, which implicitly forces enharmonic equivalence.
  • Unit conversion: midi compatibility
  • Note spelling and display, eg, hz_to_note maps first through hz_to_midi and then through midi_to_note. MIDI is 12TET, and this carries over to note spelling. Similarly for the Indian notation converters.

What’s the problem?

Generally speaking, ET covers a broad range of use cases. It’s commonly said that other systems can be well approximated by a sufficiently fine-grained ET, so is there really anything to do here?

It could also be argued that this kind of functionality is out of scope for librosa, and would be better left either as its own package or in a more symbolic-oriented package like music21.

Two counter-arguments to the above come to my mind, but I’m sure there are more:

  1. Using an expanded ET representation to approximate, say, 7-limit just intonation (JI) is fine, but it doesn’t resolve the note spelling issue. Doing that part properly still requires some logic that is not (yet) implemented.
  2. We are, at present, forcing ET assumptions into the Indian notation systems. This issue was briefly discussed during the implementation of #641 / #1203 , and we decided that it was a reasonable compromise given the constraints. That said, it’s an awkward assumption, and it would be nice to relax it.

What would a solution look like?

The world of non-ET systems is huge, and I don’t think we could or should try to be completionist here. I think we can get pretty far by first looking into just intonation (JI) systems, and specifically the p-limit systems where intervals are generated by products and ratios of powers of primes up to a specified p.

To ground the discussion, let’s use CQT as a running example. (Yes, it will be weird because the bin-spacing is non-uniform. But we can still keep the Q-factor of each filter constant, so I’m not going to worry about those details for now.) So for example, we might have something like the following examples

# 12TET starting at C1, basically our current default
librosa.cqt(y=y, sr=sr, fmin=32.7, bins_per_octave=12, temperament="et")
# 12-tone 3-limit (Pythagorean) JI starting at C1
librosa.cqt(y=y, sr=sr, fmin=32.7, bins_per_octave=12, temperament="ji3")
# 12-tone 3-limit JI starting at A2
librosa.cqt(y=y, sr=sr, fmin=110, bins_per_octave=12, temperament="ji5")
# 24-tone 7-limit JI starting at A2
librosa.cqt(y=y, sr=sr, fmin=110, bins_per_octave=24 temperament="ji7")

(We could have helpful aliases here, eg pythagorean=ji3, ptolemaic=ji5, ji=ji7, etc.)

The core operation that we need to worry about here is interval generation. We’ll need define a function to generate the frequency ratios for a single octave (based at some root frequency which we’ll usually denote as 1).

  • In the ET context, intervals are simple the nth interval is at a ratio 2**(n/bpo) above the root (for bpo bins per octave).
  • In the Pythagorean (3-limit) system, intervals are also simple, but quite different. There’s a natural ordering induced by taking alternating powers of 3: (0, 1, -1, 2, -2, 3, -3, ...) and wrapping into the octave to get ratios (1, 3/2, 4/3, 9/8, 16/9, 27/16, 32/27, ...) etc.

After that, it gets more complex, as there are choices to be made about which powers of prime factors are introduced and in which order. Different people have proposed different ways of doing this (see, eg, Partch). However, as far as I can tell: A) none of them are universally agreed upon, and more importantly, B) they’re almost always pinned to a specific number of bins per octave and pre-computed, rather than arising from a clear mathematical principle or algorithm. The latter point is critical if we want to support variable numbers of bins per octave (which we do).

Desiderata

It will be helpful to think about what we really want and need out of a p-limit JI interval generator. The following seem like reasonable requirements to me, but I’m definitely not an expert in this stuff.

  1. Octave equivalence. I think this is a reasonable assumption, and it will make our lives easier in general.
  2. Transposition independence. The sequence of intervals should not depend critically on the choice of the root pitch. (Yes, there are practical arguments for why one might want that kind of dependency eg for mapping onto physical instruments, but I’m not sure how much sense they would make in our present context.)
  3. Should support arbitrarily many bins per octave (BPO).
  4. When BPO=7, it should generate a “reasonable” diatonic scale.* (Probably major)
  5. When BPO=12, it should generate some kind of chromatic scale.
  6. Interval sets should be monotonic with respect to BPO. Going from 12 to 13 to 19 should not remove any previously selected intervals. * **

*: not actually satisfied by our ET implementation, but I can live with that. ET is designed to maximize spacing between intervals, and that’s fundamentally incompatible with these properties.

**: I’m not married to this property, but it would be nice if we can have it.

Interval-generating algorithms

As mentioned above, the 3-limit case is trivial, and arises from a fundamental principle of sorting intervals according to the absolute value of the power of 3 in their factorizations. (Only 2s and 3s are allowed, and 2s don’t count if we assume octave equivalence. We break ties in favor of positive powers over negative powers.)

We can think of this as putting the powers of 3 on a number line, expanding in both directions:

← ... 3^-3, 3^-2, 3^-1, 3^0, 3^1, 3^2, 3^3, ... →

and taking increasing steps out from the origin.

In higher-order p-limit systems, the prime factors can be arranged on a (π(p)-1)-dimensional lattice instead of a (π(3)-1=1)-dimensional line. The question then becomes: when do we take a step in the 3-direction vs the 5-direction vs the 7-direction, etc etc? I’ve implemented a few heuristics for this offline, and they all seem to fail in one way or another according to the above listed desiderata.

Yesterday, @jordan-lenchitz pointed me at Tenney’s “harmonic crystal growth” scheme: https://www.tandfonline.com/doi/full/10.1080/07494460701671525 , which as far as I understand it, is a greedy algorithm that starts at the center (0,0,…) and iteratively adds the interval r which minimizes the sum (or average) “harmonic distance” between r and all previously selected intervals. The harmonic distance between two intervals a, b is defined as log2(a) +log2(b) - 2log2 GCD(a,b), or equivalently,

log2 LCM(a, b) - log2 GCD(a, b)

Note: I’m not sure I understand how negative powers are handled here, if at all. My hunch is that this would be generalized to fractions by defining LCM and GCD in terms of the max and min of prime factorization powers from a and b, but it’s not apparently made explicit in the paper.

This seems like a promising approach, and going by Sabat & Tenney https://www.tandfonline.com/doi/full/10.1080/07494460701671533 it looks like it ticks the boxes listed above. There are, apparently, ties to be broken here, I assume arising from how negative and positive powers are treated. The authors describe random tie-breaking, but that would be terrible for consistency and reproducibility in a software library context. Any insight here would be appreciated, but it looks like it could at least be worth some investigation.

What else?

Even if a suitable interval generator can be constructed, we’d still need to build out some functionality to notate things correctly (ie without assuming enharmonic equivalence), which is pretty non-trivial. Various schemes are out there (eg extended Helmholtz-Ellis), and it should be tedious but possible to map out intervals to note names.

The biggest problem that I can see here will be the actual rendering of symbols: we already have enough trouble with unicode for double-accidentals #1331 and the thought of handling image makes me queasy, especially since they don’t easily down-render to 7-bit ascii.

We may or may not additionally need some more converters to handle JI ratios mapping to Hz or midi, but that should be pretty straightforward.

So, all that said, I’m still not convinced that this is a good idea or worth the undertaking, but I do want to kick the discussion off. Any and all feedback is welcome.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:58 (47 by maintainers)

github_iconTop GitHub Comments

1reaction
bmcfeecommented, Aug 29, 2022

Since the VQT and FJS functionality seems to be converging, it might be a good time to revisit the implications for Hindustani and Carnatic notations. (I’d like to resolve these first before expanding functionality to other traditions.)

In #641 we made a conscious decision to simplify things to a 12TET base, following this comment from @kaustuvkanti : https://github.com/librosa/librosa/issues/641#issuecomment-643563955 - essentially, the svara are treated unambiguously for Hindustani notation, and as enharmonics determined by the provided melakarta raga for Carnatic. (This too is a limitation, but seems to be a reasonably one.) This actually is an approximation, as svara that map to the same 12TET interval (eg R2 and G1) derive from different ratios in the 22 sruti scheme.

It’s now possible to construct a VQT using exactly the sruti22 interval set - it might make sense to pack that as a predefined collection. If we do that, we could then extend the unit converters (or implement new ones) to notate Svara without quantizing through a 12tet grid.

I guess this would end up implementing a function like mela_to_sruti, which converts a mela string / number specification to a subset of the 22 intervals, analogous to how key_to_degrees takes a key specification (eg Eb:maj) and returns a subset of pitch class indices. We could then have sruti_to_svara (? I’m not sure that’s a coherent name) that would translate an interval index to notation. This would serve as an alternative to mela_to_svara. I’m not sure yet how this would thread through the rest of the API, but I wanted to get opinions before marching down this path.

1reaction
bmcfeecommented, Aug 3, 2022
  • Would it make sense to infer the note name from fmin if it’s not explicitly given?

I think this probably is what we want here; eg something like this:

def hz_to_fjs(freqs, *, fmin=None, unison=None, unicode=False):
    if fmin is None:
        fmin = np.min(freqs)
    if unison is None:
        unison = hz_to_note(fmin, octave=False, unicode=False)

    intervals = freqs / fmin

    return intervals_to_fjs(intervals, unison=unison, unicode=unicode)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Just Intonation Explained - Kyle Gann
On a piano in just intonation, moving from one tonic to another changes the whole interval makeup of the key, and you get...
Read more >
Why Is Just Intonation Impractical? - Music Stack Exchange
Just intonation is extremely impractical for instruments that play chords (guitar or piano), or any instrument with fixed pitches which cannot bend, such...
Read more >
Just intonation - Wikipedia
In music, just intonation or pure intonation is the tuning of musical intervals as whole number ratios (such as 3:2 or 4:3) of...
Read more >
Just intonation? : r/reasoners - Reddit
Just intonation is a flavour of tuning, which means it is a way of mapping notes (or keys that you press on your...
Read more >
Just intonation and key changes - The Ethan Hein Blog
If you want to use just intonation intervals derived from harmonics, then they will not work in every key. So we as a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found