Unassigned/non-standard (compound) language and dialect codes
See original GitHub issueWiktionary has entries for several languages and dialects with unofficial codes we can’t scrape. Some examples of these include
- Central Franconian:
gmw-cfr
- Old Galician/Portuguese:
roa-opt
- Westrobothnian:
gmq-bot
possibly among others. The first part of the code denotes a valid ISO 639-3 language group, while the second part looks like a temporary assignment.
This issue is not a bug. It is simply intended for the book-keeping purposes. I suppose this is not related to #329.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Frequently Asked Questions (FAQ) - Codes for the ...
Collective language codes are language groups that are used if the criteria for assigning a separate language code are not met. The words...
Read more >UAX #15: Unicode Normalization Forms
Summary. This annex describes normalization forms for Unicode text. When implementations keep strings in a normalized form, they can be assured that ...
Read more >Compiler Compatibility - Oracle® Developer Studio 12.6
Bit-fields which are declared as int (not signed int or unsigned int) can be ... The C language standard enables the compiler to...
Read more >Standards - ST.26 page: 3.26.1 en / 03-26-01 Date
For the purpose of this standard, a peptide nucleic acid (PNA) residue is not considered an amino acid, but is considered a nucleotide...
Read more >List of ISO 639-1 codes
ISO language name 639‑1 639‑2/T 639‑2/B 639‑3
Abkhazian ab abk abk abk
Afar aa aar aar aar
Afrikaans af afr afr afr
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, precisely.
Looking at unmatched_languages.json it turns out that the Wiktionary language codes are rather systematically constructed.
The ones which are probably most problematic (in terms of work involved to support them) are the
*-proto
languages, but the remaining few five or six are probably reasonably easy to support. I guess what we have here is an edge case where the the wiktionary code maps to a non-existent compound ISO where the first part has to be a valid ISO language group name and should be verifiable, while the second can come from the configuration file.