Improve IPA transcriber for OE
See original GitHub issueThe current IPA transcriber for OE largely ignores the phonological environment, relying mostly on unconditioned mappings of letters to IPA symbols.
To give a basic example, currently c
yields /k/
, so that first person singular ic
is mapped to /ik/
. But according to this resource on OE phonology:
When followed by a front vowel (æ, e, i) or the diphthongs ea or eo, or when preceded by the letter i AND not followed by a back vowel, c is pronounced /ʧ/.
The proposal is to improve the IPA transcriber by encoding conditional phonological rules. This process is a nearly ideal use case for test-driven development. A basic set of tests will be put in place as a target for the initial implementation of the rules. More demanding tests may then be constructed as drivers of iterative improvement.
There are downstream uses for IPA transcription, so I’d like to address this issue sooner rather than later.
Issue Analytics
- State:
- Created 5 years ago
- Comments:14 (7 by maintainers)
Top GitHub Comments
@clemsciences you’re right, for simple rules, as in item-specific ones with simple specifications of environmental features, the mechanism is wordy. The more complex rules I think benefit from this format.
I have therefore added a simplified mechanism on top of the existing one, roughly similar to the rules in your system:
Enabled by this:
Wow, it does a much better work, but it’s really verbose (higher complexity leads to more code?). I’m going to read your code tomorrow with more attention.