question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add support for BCP 47 and output IANA language subtags

See original GitHub issue

By default, Franc returns ISO-639-3 three-letter language tags, as listed in the Supported Languages table.

We would like Franc to alternatively support outputting IANA language subtags as an option, in compliance with the W3C recommendation for specifying the value of the lang attribute in HTML (and the xml:lang attribute in XML) documents.

(Two- and three-letter) IANA language codes are used as the primary language subtags in the language tag syntax as defined by the IETF’s BCP 47, which may be further specified by adding subtags for “extended language”, script, region, dialect variants, etc. (RFC 5646 describes the syntax in full). The addition of such more fine-grained secondary qualifiers are, I guess, out of Franc’s scope, but it would be very helpful nevertheless when Franc would be able to at least return the IANA primary language tags, which suffice, if used stand-alone, to be still in compliance with the spec.

On the Web — as the IETF and W3C agree — IANA language subtags and BCP 47 seem to be the de facto industry standard (at least more so than ISO 639-3). Moreover, the naming convention for TeX hyphenation pattern files (such as used by i.a. OpenOffice) use ISO-8859-2 codes, which overlap better with IANA language subtags, too.

If Franc would output IANA language subtags, then the return values could be used as-is, and without any further post-processing or re-mapping, in, for example CSS rules, specifying hyphenation:

@media print {
  :lang(nl) { hyphenate-patterns: url(hyphenation/hyph-nl.pat); }
}

@wooorm :

  1. What is the rationale for Franc to default on ISO-639-3 (only)? Is it a “better” standard, and, if so, why?
  2. If you would agree it would be a good idea for Franc to support BCP 47 and outputting IANA language subtags as an available option, then how would you prefer it to be implemented and accept a PR? (We’d happily contribute.) Would it suffice to add and map them in data/support.json?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
davidarcommented, Apr 16, 2018

Yes, or with my own https://github.com/wooorm/iso-639-3

For reference:

const iso639 = require('iso-639-3')
const shortLang = {}
for (const {iso6391, iso6393} of iso639) shortLang[iso6393] = iso6391

let lang = franc(md)
if (shortLang[lang]) lang = shortLang[lang]
1reaction
wooormcommented, Mar 17, 2016

OK, thanks!

First off, I do understand why you want BCP-47 tags. That’s a good use case. And, I agree that the solution would be pretty light, as it would not need the complete IANA registry.

But, I do think the solution would be better placed in another module, instead of in the core of Franc. E.g., the following (not yet working) code:

var franc = require('franc');
var toBCP47 = require('iso-639-3-to-bcp-47');

var lang = toBCP47(franc('An English language document with words.'));
console.log(lang);

Yields:

'en'

Would that work?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Language Subtag Registry - IANA
... Type: language Subtag: bh Description: Bihari languages Added: 2005-10-16 ... Kaluli Added: 2009-07-29 %% Type: language Subtag: bcp Description: Bali ...
Read more >
BCP 47 Language Tags - Knowledge Base - Templafy
Below is a list of the most commonly used regionalized Iana codes used with Templafy. Language, BCP 47 Language Tag. English(US), en-US. English ......
Read more >
Understanding the New Language Tags - W3C
The IANA Language Subtag Registry still tracks the ISO standards, except that subtags are never withdrawn and there are clear rules for ...
Read more >
IETF language tags and IANA language subtag registry
This article is about how to tag data with language tags, to provide our users information using the best language we can offer...
Read more >
RFC 4647: Matching of Language Tags
Inc. BCP: 47 M. Davis, Ed. Obsoletes: 3066 Google Category: Best Current ... from the IANA Language Subtag Registry to support canonicalizing language...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found