This document defines a guide for mapping the International Components of Unicode Resource Bundle file format to XLIFF (XML Localization Interchange File Format).
As different tools may provide different filters to extract the content of ICU Resource Bundles, it is important for interoperability that they represent the extracted data in identical manner in the XLIFF document.
The intent of this document is to provide a set of guidelines to represent data contained in ICU Resource Bundles as XLIFF content. It offers a collection of recommended mapping of data types in ICU Resource Bundles that developers of XLIFF filters can implement, and users of XLIFF utilities can rely on to ensure a better interoperability between tools.
XLIFF is specified in two "flavors". Indicate which of these variants you are using by selecting the appropriate schema. The schema may be specified in the XLIFF document itself or in an OASIS catalog. The namespace is the same for both variants. Thus, if you want to validate the document, the tool used knows which variant you are using. Each variant has its own schema that defines which elements and attributes are allowed in certain circumstances.
As newer versions of XLIFF are approved, sometimes changes are made that render some elements, attributes or constructs in older versions obsolete. Obsolete items are deprecated and should not be used even though they are allowed. The XLIFF specification details which items are deprecated and what new constructs to use.
Transitional - Applications that produce older versions of XLIFF may still use deprecated items. Use this variant to validate XLIFF documents that you read. Deprecated elements and attributes are allowed.
xsi:schemaLocation='urn:oasis:names:tc:xliff:document:1.2
xliff-core-1.2-transitional.xsd'
Strict - All deprecated elements and attributes are not allowed. Obsolete items from previous versions of XLIFF are deprecated and should not be used when writing new XLIFF documents. Use this to validate XLIFF documents that you create.
xsi:schemaLocation='urn:oasis:names:tc:xliff:document:1.2
xliff-core-1.2-strict.xsd'
An ICU Resource Bundle is a collection of resources. There are two main types of ICU resources: simple and complex. Simple resources each contain a single piece of data. There are five types of simple resources: string, integer, integer vector, binary and alias. Complex resources contain other simple and complex resources. There are two types of complex resources: table and array.
An ICU Resource Bundle consists of a top-level table resource containing all the resources in the bundle. It is recommended that the name of this bundle be the locale name. The name might also be the application name followed by_ followed by the
locale name, but the first form is preferred.{ and }.
The following table shows how to specify each resource
type:| Resource Type | Type Name | Resource Data |
| string | :string |
a text string optionally enclosed in quotes |
| integer | :integer or :int |
a decimal or hexadecimal integer |
| integer vector | :intvector |
a comma separated list of integers |
| binary | :binary or :bin |
a sequence of hexadecimal digits optionally enclosed in quotes |
| alias | :alias |
a path enclosed in quotes |
| table | :table |
one or more named resources |
| array | :array |
one or more unnamed resources separated by commas |
"
: , { }.
If the string is not not enclosed in quotes, it is simply a
sequence of non whitespace characters.| Escape | Followed By | Represents |
\u |
four hex digits | a 16 bit Unicode code point |
\U |
eight hex digits | a 32 bit Unicode code point |
\x |
two hex digits | a Unicode code point between U+0000 and U+00FF |
\x{ |
one to eight hex digits then } |
a Unicode code point |
\ |
one to three octal digits | a Unicode code point between U+0000 and U+00FF |
\c |
any character | a control code (low-order four bits of character's code point) |
\a |
ASCII audible alert (U+0007) | |
\b |
ASCII backspace (U+0008) | |
\e |
ASCII escape (U+001B) | |
\f |
ASCII form feed (U+000C) | |
\n |
ASCII newline (U+000A) | |
\r |
ASCII carriage return (U+000D) | |
\t |
ASCII tab (U+0009) | |
\v |
ASCII vertical tab (U+000B) | |
\ |
any other character | the character (e.g. \\, \",
\}) |
quoted_string :string {
"This is a simple quoted string."
}
unquoted_string :string {
Supercalifragilisticexpialidocious
}
string_with_escapes :string {
"The ideograph for \"sun\" is \u65E5."
}
A string resource can be composed of multiple strings separated by
whitespace characters. Quoted strings are simply concatenated
together. For example, the following two definitions produce the
same resource:
only_a_test :string {
"This is "
"only a test."
}
only_a_test :string {
"This is only a test."
}
If the strings are not quoted, they are concatenated with a space
between them. Again, the following two definitions produce the same
resource:
only_a_test :string {
This
is
only
a
test.
}
only_a_test :string {
This is only a test.
}
(Note that in this example, each word is actually a separate
string.){ and }. The
variable data is supplied at runtime using the ICU
MessageFormat and ChoiceFormat
interfaces. Consult the ICU
User Guide for more information about these interfaces. If
MessageFormat or ChoiceFormat descriptors
are present then either the string must be quoted or the
{ and } must be escaped.ChoiceFormat messages are difficult to
translate without a detailed knowledge of their syntax. For this
reason, it is recommended that their use be avoided.):include and a string value that is the name of a
file containing the string resource data. The string resource data
in this file can contain the same escape sequences as a regular
string resource.0x followed by a sequence of hexadecimal digits. A
negative value is preceded by a minus sign. The value is a 32 bit
signed value. Here are some examples of integer
resources:
window_height :integer {
600
}
window_offset :integer {
-200
}
checksum :integer {
0xBCFE3759
}
font_sizes :intvector {
8,
10,
12,
18,
24,
36
}
single_font_size :intvector {
12
}
empty :intvector {
}
md5_sum :binary {bcfe765be0fdfab22c5f9efd12c52abc}
:import and a quoted path for file containing the
binary data. For example:
logo :import {"logo.gif"}
authors :alias {"root/authors"}
(Note: in general an alias resource that's used in one Resource
Bundle may not be appropriate in another Resource Bundle. For this
reason the use of alias resources be avoided.)
primary_colors :table {
red :string {"Red"}
orange :string {"Orange"}
yellow :string {"Yellow"}
green :string {"Green"}
blue :string {"Blue"}
indigo :string {"Indigo"}
violet :string {"Violet"}
}
fonts :table {
default_size :integer {12}
font_sizes :intvector {
8,
10,
12,
16,
24,
36
}
font_families :array {
:string {"Times"},
:string {"Helvetica"},
:string {"Courier"}
}
}
file_menu_items :array {
:string {"Cut"},
:string {"Copy"},
:string {"Paste"},
:string {"Delete"}
}
fonts :array {
:integer {12},
:intvector {
8,
10,
12,
16,
24,
36
},
:array {
:string {"Times"},
:string {"Helvetica"},
:string {"Courier"}
}
}
:string type name or the enclosing
{ and }. For example:
fish :array {
"One fish",
"Two fish",
"Red fish",
"Blue fish"
}
{. A resource name followed by a type
name or a { means that the resource is a table. A type
name or a { means that the resource is an array. A
string followed by a comma means that the resource is an array of
strings. A string followed by a } means that the
resource is a string. Here are some examples of resources with
implied types:
color_table {
red :string {"Red"}
orange :string {"Orange"}
yellow :string {"Yellow"}
green :string {"Green"}
blue :string {"Blue"}
indigo :string {"Indigo"}
violet :string {"Violet"}
}
file_menu_array {
:string {"Cut"},
:string {"Copy"},
:string {"Paste"},
:string {"Delete"}
}
fish_string_array {
"One fish",
"Two fish",
"Red fish",
"Blue fish"
}
// and extend to
the end of the line, and multi-line comments that start
with /* and end with */. Here
are some examples of comments:/*
* The resources for a fictitious Hello World application. The application
* displays a single window with a logo and the hello message.
*/
sample :table {
....
}
// The names of
// the translators
translators :array {
"John E. English",
"Alan Smithee"
}
checksum :integer { // a CRC checksum of
0xBCFE3759 // the application binary
}
/** and end
with */are documentation comments These comments
apply to the following resource and may contain two special tokens.
The token @translate instructs the translator whether
or not the resource should be translated. The
token @notebegins a special note to the
translator. All of the comment text following these tokens up to
the end of the comment or the next @, is associated
with the token. It is not necessary to have whitespace after the
token. Leading and trailing whitespace and *
characters are ignored. Here are some examples of documentation
comments:/**
* The names of the translators.
*
* @note replace these with your names.
*/
translators :array {
"John E. English",
"Allen Smythee"
}
/**
* The width of the application window.
*
* @translate yes
* @note Be sure that the window is
* wide enough to contain the
* translated greeting.
*/
window_width :integer {
600
}
/**
* The height of the application window
*
* @translate yes
* @noteThere is no space after the token!
*/
window_height :integer {
400
}
/**
* The resources for a fictious Hello World application. The application
* displays a single window with a logo and the hello message.
*/
en :table {
/**
* @note This is the message that the application displays to the user.
*/
hello :string {"Hello, world!"}
/**
* The height of the application window.
*
* @note Make sure this is tall enough to display the translated message.
*/
window_height :integer {200}
/**
* The width of the application window.
*
* @note Make sure this is wide enough to display the translated message.
*/
window_width :integer {600}
/**
* The application version number
*
* @translate no
*/
version :intvector {
1, // major version
2, // minor version
3 // patch level
}
/**
* The MD5 checksum of the application.
*
* @translate no
*/
md5_sum :binary {bcfe765be0fdfab22c5f9efd12c52abc}
/**
* The logo to be displayed in the application window.
*
* @translate no
*/
logo :import {"logo.gif"}
/*
* The Authors. Just use the name from the root bundle.
*/
authors :alias {"root/authors"}
/*
* The translators.
*/
translators :array {
:string {"John E. English"},
:string {"Alan Smithee"},
:string {"Allen Smythee"}
}
/**
* The application menus.
*
* @note Keep the menus and the menu items in this order.
*/
menus :table {
file_menu :table {
name :string {"File"}
items :array {
:string {"New"},
:string {"Open..."},
:string {"Save"},
:string {"Save As..."},
:string {"Exit"}
}
}
edit_menu :table {
name :string {"Edit"}
items :array {
:string {"Cut"},
:string {"Copy"},
:string {"Paste"},
:string {"Delete"}
}
}
help_menu :table {
name :string {"Help"}
items :array {
:string {"Help Topics"},
:string {"About Hello World"}
}
}
}
}
This section discusses the general considerations to take in
account when extracting data from ICU Resource Bundles. The
ICU genrb tool can Resource Bundle source files to
XLIFF files. The ICU4J XLIFF2ICUConverter tool
converts XLIFF files to ICU Resource Bundle source files. For
consistency, these tools should be used whenever possible.
The ICU genrb tool converts the Resource Bundle
text files into binary resource files. It is possible to extract
data from either the text or the binary files. It is recommended to
extract the data from the text files because the comments, and the
meta-information that they contain, such as @translate
and @note, are not included in the binary files.
Where practical, it is recommended to maintain resource data as XLIFF files and convert to ICU format only as a build step. For more information on localization workflow process, please see the Localizing with XLIFF & ICU.
ICU Resource Bundle file format uses ID based translation units just like other major resource formats.
The source of ICU Resource Bundles can be any one of the major encodings supported by ICU, however it is strongly recommended that the files be stored in UTF-8 encoding.
When extracting data from ICU Resource Bundle files, filters
should respect the encoding of the file and provide conversion to
UTF-8 for storing data in XLIFF.
<?xml version="1.0" encoding="utf-8"?>
<xliff version='1.2' xmlns='urn:oasis:names:tc:xliff:document:1.2'
xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:schemaLocation='urn:oasis:names:tc:xliff:document:1.2 xliff-core-1.2-transitional.xsd'>
<file xml:space = "preserve" source-language = "en" datatype = "x-icu-resource-bundle"
original = "en.txt" date = "2007-04-11T19:03:09Z">
<header>
<tool tool-id = "genrb-3.3-icu-3.7.1" tool-name = "genrb"/>
</header>
<body>
<group id = "en" restype = "x-icu-table">
... the resources in the resource bundle...
</group>
</body>
</file>
</xliff>
Each ICU Resource Bundle file maps to one XLIFF <file> element. XLIFF
representations of ICU Resource Bundle files should have the
datatype attribute with a value of
"x-icu-resource-bundle", and the
original attribute set to the name of the
Resource Bundle file from which the data was extracted.
The <file> element should have an
xml:space attribute with a value of
"preserve", establishing this as the default for the
whole file. Filters should retain the whitespace in individual
resources.
The optional XLIFF <header> element contains
a <tool> element which identifies the tool that
extracted the data from the ICU Resource Bundle.
The XLIFF <body>
element contains translation units, which may be grouped using
hierarchical <group>elements to represent table
and array resources. Since the top-level resource in a Resource
Bundle is a table resource, all of the resources in the file will
be inside of a <group> element.
Documentation comments should be extracted and included as
comments inside the next <group>,
<trans-unit> or <bin-unit>
generated after the comment is extracted. If the comment
contains @translate, this should be used to set
the tranlsate attribute on the
next <group>,
<trans-unit> or <bin-unit>
generated after the comment is extracted. It is not necessary to
generate a translate attribute for @translate
yes since this is the default.
If the comment contains @note, this should be
extracted to a <note> element in the
next <group>,
<trans-unit> or <bin-unit>
generated after the comment is extracted.
For example:
/**would be mapped to:
* These are top level comments for the bundle. Tag name: root
* @translate yes
* @note Comments for tag named root
*/
root{
/**
* The CRC checksum for
* the application binary.
*
* @translate no
* @noteThis was calculated by developement.
*/
checksum :integer {
0xBCFE3759
}
}
<file xml:space = "preserve" source-language = "en" datatype =
"x-icu-resource-bundle" original = "root.txt" date = "2007-01-17T20:58:58Z">
<header>
<tool tool-id = "genrb-3.3-icu-3.7.1" tool-name = "genrb" />
</header>
<body>
<group id = "root" restype = "x-icu-table">
<!--These are top level comments for the bundle. Tag name: root-->
<note>Comments for tag named root</note>
<trans-unit id = "checksum" resname = "checksum" translate = "no">
<!--The CRC checksum for the application binary.-->
<source>0xBFCE3759</source>
<note>This was calculated by development.</note>
</trans-unit>
</group>
</body>
</file>
<source> element within a
<trans-unit>. Filters should concatenate
multiple strings into a single string and convert all escape
sequences to the corresponding characters.For example the string resources:
hello :string {
"Hello, world!"
}
string_with_escapes :string {
"The ideograph for \"sun\" is \u65E5."
}
only_a_test :string {
This
is
only
a
test.
}
would be mapped to:
<trans-unit id = "hello" resname = "hello">
<source>Hello, world!</source>
</trans-unit>
<trans-unit id = "string_with_escapes" resname = "string_with_escapes">
<source>The ideograph for "sun" is 日.</source>
</trans-unit>
<trans-unit id = "only_a_test" resname = "only_a_test">
<source>This is only a test.</source>
<trans-unit>
:include, filters
should read the contents of the included file and process it as if
it had been specified directly.
/**
* Included text.
*/
included_text :include {"mystring.txt"}
would be mapped to:
<trans-unit id = "included_text" resname = "included_text">
<!--Included text.-->
<source>Contents of mystring.txt</source>
</trans-unit>
MessageFormat descriptors in string resources
represent variable data that is supplied at runtime, and should not
be translated. These descriptors should be mapped to a
<ph> element. ChoiceFormat
descriptors in string resources represent a combination of variable
data that is supplied at runtime and text that should be
translated. These descriptors should also be mapped to a
<ph> element and the translatable text within
them should be mapped to a <sub> element. For
example:
msgFormat :string {
"At {1,time} on {1,date}, there was {2} on planet{0,number,integer}."
}
choiceFormat :string {
"Folder {0} contains {1,choice,0#no files|1#one file|1<{1,number,integer} files}."
}
would be mapped to:
<source>At <ph id="1">{1,time}</ph> on <ph id="2">{1, date}</ph>,
there was <ph id="3">{2}</ph> on planet <ph id="4">{0,number,integer}</ph>.
</source>
<source>Folder <ph id="1">{0}</ph> contains
<ph id="2">{1,choice,0#<sub>no files</sub>|1#<sub>one file</sub>|1<{1,number,integer}
<sub> files</sub>}</ph>.
</source>
(Note: extra whitespace has been added to these XLIFF examples for
readability. In both examples the <source>
element should be on a single line.)Each integer resource should be mapped to a single
<source> elements inside of a
<trans-unit> element . The <trans-unit>
element should have a restype attribute with a value
of "x-icu-integer". There is no need to convert
integer resourcs specified as hexadecimal numbers, i.e. those
strarting with 0x, to decimal notation.
For example:
/**
* The height of the application window.
*
* @note Make sure this is tall enough to display the translated message.
*/
window_height :integer {200}
/**
* The width of the application window.
*
* @note Make sure this is wide enough to display the translated message.
*/
window_width :integer {600}
/**
* The CRC checksum for
* the application binary.
*
* @translate no
* @noteThis was calculated by developement.
*/
checksum :integer {0xBCFE3759}
would be mapped to:
<trans-unit id = "window_height" resname = "window_height" restype = "x-icu-integer">
<!--The height of the application window.-->
<source>200</source>
<note>Make sure this is tall enough to display the translated message.</note>
</trans-unit>
<trans-unit id = "window_width" resname = "window_width" restype = "x-icu-integer">
<--The width of the application window.-->
<source>600</source>
<note> Make sure this is wide enough to display the translated message.</note>
</trans-unit>
<trans-unit id = "checksum" resname = "checksum" restype = "x-icu-integer" translate = "no">
<!--The CRC checksum for the application binary.-->
<source>0xBCEF3759</source>
<note>This was calculated by development.</note>
</trans-unit>
Each binary resource should be mapped
to a single <bin-source> element inside of a
<bin-unit> element. The
<bin-unit> should have a restype
attribute with a value of "x-icu-binary" and a
mime-type attribute with the value
"application/octet-stream". The
<bin-source> element should contain an
<internal-file> element that specifies the
binary data.
For example:
/**
* The MD5 checksum of the application.
*
* @translate no
*/
md5_sum :binary {bcfe765be0fdfab22c5f9efd12c52abc}
would be mapped to:
<bin-unit id = "md5_sum" resname = "md5_sum" mime-type = "application/octet-stream"
restype = "x-icu-binary" translate = "no">
<!--The MD5 checksum of the application.-->
<bin-source>
<internal-file form = "application/octet-stream" crc = "187654673">BCFE765BE0FDFAB22C5F9EFD12C52ABC</internal-file>
</bin-source>
</bin-unit>
:import should be
mapped to a single <external-file> element
inside of a <bin-source> element inside of a
<bin-unit> element. The <bin-unit> element should
have arestype
attribute with a value of "x-icu-binary".
For example:
/**
* The logo to be displayed in the application window.
*
* @translate no
*/
logo :import {"logo.gif"}
would be mapped to:
<bin-unit id = "logo" resname = "logo"
mime-type = "application/octet-stream"
restype = "x-icu-binary" translate = "no">
<!--The logo to be displayed in the application window.-->
<bin-source>
<external-file href = "logo.gif"/>
</bin-source>
</bin-unit>
Each alias resource should be mapped
to a single <source> element inside of a
<trans-unit> element. The
<trans-unit> should have a restype
attributue with a value of "x-icu-alias". The alias
path should be mapped to a <ph> element with an
id attribute with a value that is the path. For an
alias resource the translate attribute should always
be set to "no".
For example:
authors :alias {"root/authors"}
would be mapped to:
<trans-unit id = "authors" resname = "authors" restype = "x-icu-alias" translate = "no">
<source><ph id="root/authors"/></source>
</trans-unit>
<group> element with a
restype attribute with a value of
"x-icu-array". Because the resources contained in the
array are not named their id atrributes should have a
value that is the name of the array followed by a _
followed by the resource's index in the array.
menu_items :array {
:string {"Cut"},
:string {"Copy"},
:string {"Paste"},
:string {"Delete"}
}
would be mapped to:
<group id = "menu_items" resname = "menu_items" restype = "x-icu-array">
<trans-unit id = "menu_items_0">
<source>Cut</source>
</trans-unit>
<trans-unit id = "menu_items_1">
<source>Copy</source>
</trans-unit>
<trans-unit id = "menu_items_2">
<source>Paste</source>
</trans-unit>
<trans-unit id = "menu_items_3">
<source>Delete</source>
</trans-unit>
</group>
<group> element with a
restype attribute with a value of
"x-icu-intvector". Each integer value in the integer
vector should be mapped to a single <source>
element inside a <trans-unit> within the
<group> element. The
<trans-unit> element should have
an id attrbute
with a value that is the name of integer vector followed by a
_ followed by the index of the integer.
/**
* The application version number
*/
version :intvector {
1, // major version
2, // minor version
3 // patch level
}
would be mapped to:
<group id = "version" resname = "version" restype = "x-icu-intvector">
<!--The application version number-->
<trans-unit id = "version_0" restype = "x-icu-integer">
<source>1</source>
</trans-unit>
<trans-unit id = "version_1" restype = "x-icu-integer">
<source>2</source>
</trans-unit>
<trans-unit id = "version_2" restype = "x-icu-integer">
<source>3</source
</trans-unit>
</group>
<group> element with a
restype atrribute with a value of
"x-icu-table". Each resource in the table should have
an id atrribute that is the table name followed by
_ followed by the name of the resource.
/**
* The names of the primary colors
*/
primary_colors{
red :string {"Red"}
orange :string {"Orange"}
yellow :string {"Yellow"}
green :string {"Green"}
blue :string {"Blue"}
indigo :string {"Indigo"}
violet :string {"Violet"}
}
would be mapped to:
<group id = "primary_colors" resname = "primary_colors" restype = "x-icu-table">
<!--The names of the primary colors-->
<trans-unit id = "primary_colors_red">
<source>Red</source>
</trans-unit>
<trans-unit id = "primary_colors_orange">
<source>Orange</source>
</trans-unit>
<trans-unit id = "primary_colors_yellow">
<source>Yellow</source>
</trans-unit>
<trans-unit id = "primary_colors_green">
<source>Green</source>
</trans-unit>
<trans-unit id = "primary_colors_blue">
<source>Blue</source>
</trans-unit>
<trans-unit id = "primary_colors_indigo">
<source>Indigo</source>
</trans-unit>
<trans-unit id = "primary_colors_violet">
<source>Violet</source>
</trans-unit>
</group>
The following people have contributed to this document:
Here is an example of an ICU Resource Bundle file converted to XLIFF:
[ ICU Userguide] The ICU Userguide http://www.icu-project.org/userguide/localizing.html
[OASIS] Organization for the Advancement of Structured Information Standards Web site.
[RFC 4646] RFC 4646 Tags for the Identification of Languages . Phillips and Davis, Sept 2006.
[XML 1.0] Extensible Markup Language (XML) 1.0 (Third Edition) . W3C (World Wide Web Consortium), Feb 2004
[XLIFF 1.1] XLIFF 1.1 Specification . OASIS XLIFF Technical Committee, October 2003.
[XLIFF 1.2] XLIFF 1.2 Draft Specification . OASIS XLIFF Technical Committee, May 2006.
[XLIFF Tools] The XLIFF Tools Project http://xliff-tools.freedesktop.org/