Language Tags

Draft Best Practices for Language Tags in Bibliographic Linked Data

June 30, 2016

 

Language tags are used in RDF to record the language, script, region and other characteristics of text strings.  Unlike MARC, which uses language codes at the level of bibliographic records, language tags are assigned at the level of the individual property, which allows a great deal of useful specificity.

The current standard for language tags is the Internet Engineering Task Force’s Request for Comment 5646 (IETF RFC 5646), dated September 2009.  (In the language of the IETF, “request for comment” means a final standards documents.)

 

IETF RFC 5646

https://tools.ietf.org/html/bcp47

 

Useful explanation of how to apply language tags from W3C

https://www.w3.org/International/articles/language-tags/

 

IANA Language Subtag Registry (use Ctrl-F to search it)

http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

 

Key points from the standard

Language subtags are either two or three letters long.

Script and region subtags should be omitted when they add no distinguishing value.

Redundant and grandfathered tags should be avoided.

 

Best practices for bibliographic data

Capitalization

Follow the capitalization of subtags given in the IANA registry.

 

Extended Language Subtags

Do not use extended language (extlang) subtags; use the corresponding single language subtag instead.

 

Script Subtags

Do not add a script subtag for the primary script of a language.  In particular, note the Suppress-Script field in the IANA registry, which indicates when a script subtag is not used.

Subtag: fr

Description: French

Suppress-Script: Latn

 

When text is in a script other than the primary one of a language, or a language routinely uses more than one script, add a script subtag.

Uzbek uses both Latin and Cyrillic scripts:

uz-Cyrl

uz-Latn

 

Region Subtags

Language tags for English as the “language of cataloging”, used with notes and other non-transcription properties, should not have a region subtag since RDA is an international cataloging code and bibliographic data are distributed worldwide.

“Includes bibliographies and index”@en

NOT

“Includes bibliographies and index”@en-US

 

Variant Subtags

For text romanized according to the 1997 edition of the ALA-LC tables , add the variant “alalc97”.

“Neotpravlennoe pis’mo”@ru-alalc97

There is currently no approved subtag for later ALA-LC tables.

 

Extension and Private-Use Subtags

Do not use extension and private-use subtags.