Ram Viswanadha, 2002 - 11 – 07
Introduction
The current data organization and inheritance mechanism does not address the need for multiple resources on a per service basis. So the services are being forced to work around the problem by inventing artificial locales leading to maintenance and interchange problems. The long term goal of ICU4C and ICU4J resource data is to maintain the data in Common XML Locale repository and convert the data to specific format for consumption of projects.
Design Goals
· The data should be easily converted from one format to another without loss of information
· The design should be extensible and should cater to future needs of services
· Concepts devised during design of Locale Data Markup Language (LDML) should be incorporated, e.g.: multiple resource elements for a given service, intra-resource inheritance. For more information please see here.
· Resource data organization and inheritance should be independent of the service that requires the data
· Inheritance and locale lookup should be backward compatible
Problems
Collation
· Can't have locales that match all the Microsoft ones without needlessly duplicating TRADITIONAL for all Spanish locales, for example.
· Cannot override the default collation to be used by changing the data
· The default collation to be used cannot be specified in the data and is assumed
· The proposed Cumulative Fallback mechanism solves some problems but cannot incorporate the ideas from LDML
Calendars
· Cannot roll in the concept of multiple calendars in the current design
· The proposed design works around the problem in the current inheritance model that only top level resources can be inherited, which is incompatible with the collation service
Currency Elements, Number Format, Date Format
· No concept of multiple resource elements
Proposed Solution
· Implement multi-level inheritance of tagged resources
· Re-organize data in resource bundles to reflect XML specification
Example:
root{
Version{"1.1"}
Scripts{
Latin{"Romana"}
Cyrillic{"Kyrillica"}
}
Languages{
ab{"Abkhazian"}
aa{"afar"}
af{"Afrikaans"}
sq{"Albanian"}
}
Countries{
AF{"Afghanistan"}
AL{"Albania"}
DZ{"Algeria"}
AD{"Andorra"}
AO{"Angola"}
US{"United States"}
}
Variants{
"%%B" { "Bokm\u00e5l" } // Norwegian variant display name
"%%NY" { "Nynorsk" } // Norwegian variant display name
"%%AL" { "\u00C5land" } // Aland variant display name
"%%POSIX" { "POSIX" }
}
Keys{
collation { "Collation" }
currency { "Currency" }
}
Types{
collation{
"%%PHONEBOOK" { "Phonebook Order" }
"%%PINYIN" { "Pinyin Order" }
"%%TRADITIONAL" { "Traditional" }
"%%STROKE" { "Stroke Order" }
"%%DIRECT" { "Direct Order" }
}
}
ExemplarCharacters{ "[a-z]" }
// delimiters this is resource is an array so
// child locale can override the whole resource
// but not individual elements
Delimiters{
"'", // quotation start
"'", // quotation end
"\"" // double quotation start
"\"" // double quotation end
}
MeasurementSystem{"US"}
PaperSize{
height{"123" }
width {"12"}
units {"mm"}
}
LocalPatternChars { "GyMdkHmsSEDFwWahKzYe" }
Calendars{
Default{"Gregorian"}
Gregorian{
MonthNames {
"January",
"February",
"March",
"April",
"May",
"June",
"July",
"August",
"September",
"October",
"November",
"December",
}
MonthAbbreviations {
"Jan",
"Feb",
"Mar",
"Apr",
"May",
"Jun",
"Jul",
"Aug",
"Sep",
"Oct",
"Nov",
"Dec",
}
DayAbbreviations {
"Sun",
"Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
}
DayNames {
"Sunday",
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
}
AmPmMarkers {
"AM",
"PM",
}
Week{
WeekEndStart{
"fri",
"18:00"
}
WeekEndStart{
"sun",
"18:00"
}
}
DateTimeElements:intvector {
1, // First day of the week
1, // Min days in a week
}
Eras {
"BC",
"AD",
}
DateFormat{
Default:int{2} //index of default pattern
Patterns{
"EEEE, MMMM d, yyyy",
"MMMM d, yyyy",
"MMM d, yyyy",
"M/d/yy", // Changing this will break binary compatibility.
}
}
TimeFormat{
Defualt:int{ 2 } //index of default pattern
pattern{
"h:mm:ss a z",
"h:mm:ss a z",
"h:mm:ss a",
"h:mm a",
}
}
DateTimeFormat{
"{1} {0}"
}
}
}
zoneStrings {
Default{ "America/Los_Angeles" }
"America/Los_Angeles"{
"PST",
"Pacific Standard Time",
"PST",
"Pacific Daylight Time",
"PDT",
"Los Angeles",
}
"America/Denver"{
"MST",
"Mountain Standard Time",
"MST",
"Mountain Daylight Time",
"MDT",
"Denver",
}
"America/Phenix"{
"PNT",
"Mountain Standard Time",
"MST",
"Mountain Standard Time",
"MST",
"Phoenix",
}
"America/Chicago"{
"CST",
"Central Standard Time",
"CST",
"Central Daylight Time",
"CDT",
"Chicago",
}
"America/New_York"{
"EST",
"Eastern Standard Time",
"EST",
"Eastern Daylight Time",
"EDT",
"New York",
}
"America/IndianaPolis"{
"IET",
"Eastern Standard Time",
"EST",
"Eastern Standard Time",
"EST",
"Indianapolis",
}
"America/Puerto_Rico"{
"PRT",
"Atlantic Standard Time",
"PRT",
"Atlantic Standard Time",
"PRT",
"Puerto Rico",
}
"America/Honolulu"{
"HST",
"Hawaii Standard Time",
"HST",
"Hawaii Standard Time",
"HST",
"Honolulu",
}
"America/Anchorage"{
"AST",
"Alaska Standard Time",
"AST",
"Alaska Daylight Time",
"ADT",
"Anchorage",
}
}
NumberElements {
".",
",",
";",
"%",
"0",
"#",
"+", // Add + sign
"-",
"E",
"\u2030",
"\u221E",
"\uFFFD",
}
NumberPatterns {
Default:int {0}
Patterns{
"#,##0.###;-#,##0.###", // decimal pattern
"\u00A4 #,##0.00;-\u00A4 #,##0.00", // currency pattern
"#,##0%", // percent pattern
"#E0", // scientific pattern
}
}
Currencies{
Default{"Nuetral" },
Nuetral{
"\u00A4", // currency symbol
"XXX", // International Currency symbol
"Currency Symbol", // Display Name
"", // Decimal seperator, if not present then currency seperator from number elements will be used
"", // Grouping seperator, if not present then currency seperator from number elements will be used
}
}
Collations{
Default{"Neutral"}
Neutral{
Version { "0.0" }
Sequence { "" }
}
}
//------------------------------------------------------------
// Rule Based Number Format Support
//------------------------------------------------------------
/*
* Default used to be English (US) rules, but now default just formats
* like DecimalFormat. The former default rules are now the _en rules.
*/
SpelloutRules {
"=#,##0.######=;\n"
}
OrdinalRules {
"=#,##0=;\n"
}
DurationRules {
"=#,##0=;\n"
}
}