Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unassigned/non-standard (compound) language and dialect codes

See original GitHub issue

Wiktionary has entries for several languages and dialects with unofficial codes we can’t scrape. Some examples of these include

Central Franconian: gmw-cfr
Old Galician/Portuguese: roa-opt
Westrobothnian: gmq-bot

possibly among others. The first part of the code denotes a valid ISO 639-3 language group, while the second part looks like a temporary assignment.

This issue is not a bug. It is simply intended for the book-keeping purposes. I suppose this is not related to #329.

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

agutkincommented, Jun 28, 2021

Yes, precisely.

1reaction

agutkincommented, Jun 25, 2021

Looking at unmatched_languages.json it turns out that the Wiktionary language codes are rather systematically constructed.

The ones which are probably most problematic (in terms of work involved to support them) are the *-proto languages, but the remaining few five or six are probably reasonably easy to support. I guess what we have here is an edge case where the the wiktionary code maps to a non-existent compound ISO where the first part has to be a valid ISO language group name and should be verifiable, while the second can come from the configuration file.

Top Results From Across the Web

Frequently Asked Questions (FAQ) - Codes for the ...

Collective language codes are language groups that are used if the criteria for assigning a separate language code are not met. The words...

UAX #15: Unicode Normalization Forms

Summary. This annex describes normalization forms for Unicode text. When implementations keep strings in a normalized form, they can be assured that ...

Compiler Compatibility - Oracle® Developer Studio 12.6

Bit-fields which are declared as int (not signed int or unsigned int) can be ... The C language standard enables the compiler to...

Standards - ST.26 page: 3.26.1 en / 03-26-01 Date

For the purpose of this standard, a peptide nucleic acid (PNA) residue is not considered an amino acid, but is considered a nucleotide...

List of ISO 639-1 codes

ISO language name 639‑1 639‑2/T 639‑2/B 639‑3 Abkhazian ab abk abk abk Afar aa aar aar aar Afrikaans af afr afr afr