2001 - 07 - 07
There are a number of places where our current Locale lookup is clumsy, and forces us to have artificial locales and duplicate data, leading to maintenance problems. Examples are Euro Currency Formats, Traditional Spanish Sort, Phonebook German Sort. There undoubtedly will be others.
For example, every Spanish locale could need a traditional Spanish sort. That means having variants of at least the following locales:
Argentina, Bolivia, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Puerto Rico, Spain, Spanish, United States, Uruguay, Venezuela
Here is what we have currently for Spanish, for example:
es {
Version { "1.0" }
CollationElements {
Version { "1.0" }
Override { "FALSE" }
Sequence { "& N < n\u0303, N\u0303 " }
}
...
}
es__TRADITIONAL {
Version { "1.0" }
CollationElements {
Version { "1.0" }
Override { "FALSE" }
Sequence { "& N < n\u0303, N\u0303"
"&C < ch <<< Ch <<< CH"
"&l < ll <<< Ll <<< LL" }
}
LocaleID { "040a" }
}
The problem is, we can't have locales that match all the Microsoft ones without needlessly duplicating _TRADITIONAL for all Spanish locales.
Here is an idea for changing this. It is pretty rough so far, but I thought I would get it down on paper (or electrons) for comment.
Instead of having these all in separate resource bundles, we modify the lookup process and RB format. The RBs could look something like:
es {
Version { "1.0" }
CollationElements {
Version { "1.0" }
Override { "FALSE" }
Sequence { "& N < n\u0303, N\u0303 " }
}
CollationElements_TRADITIONAL {
Version { "1.0" }
Override { "FALSE" }
Sequence { "& N < n\u0303, N\u0303"
"&C < ch <<< Ch <<< CH"
"&l < ll <<< Ll <<< LL" }
}
...
}
Suppose our current locale was Spanish, Bolivia, Traditional Sort, Euro Currency (just to have a complicated example). This would be specified with "es_BO/EURO/TRADITIONAL". What we would do is look up in the following order:
| Resource Bundle | Tag in Resource Bundle |
|---|---|
| es_BO_EURO_TRADITIONAL | CollationElements |
| es_BO | CollationElements_EURO_TRADITIONAL |
| es | CollationElements_EURO_TRADITIONAL |
| es_BO | CollationElements_EURO |
| es | CollationElements_EURO |
| es_BO | CollationElements_TRADITIONAL |
| es | CollationElements_TRADITIONAL |
| es_BO | CollationElements |
| es | CollationElements |
| default locale... | CollationElements |
Notice a few things. Variants on the same level would be separated in the input by slash, not "_", and do not form a tree structure. Instead, they are treated as tags, and they take precedence over the normal tree structure. That is, we take CollationElements_TRADITIONAL in Spanish in preference to the plain CollationElements in es_BO.
Once we find a CollationElements, we cache the result, so the next time we look for "es_BO/EURO/TRADITIONAL", we get it immediately. This is also an extreme case: typically there would be fewer lookups.
The RB could look something like:
es_BO_TRADITIONAL {
Version { "1.0" }
CollationElements <include es__TRADITIONAL>
LocaleID { "040a" }
}
If we find an <include XXX> statement, then we redirect to look up the resource bundle chain from XXX. Note: we would want some check to prevent circularity!
Right now, it is not easy for someone to find out whether a particular service is a fallback or not: and if it is, which locale it is actually appropriate for. For Resource Bundle itself, we note whether or not it was a fallback, with U_USING_FALLBACK_ERROR or U_USING_DEFAULT_ERROR, and you can find out the bundle that was used with ResourceBundle::getLocale (). But if you open a service (like a collator, etc), you can't do that.
To fix this, I suggest we add a static getInstanceType(locale) to every class that has a static createInstance(). It returns the most specific Locale for which the createInstance() is appropriate. For example, we would have the following:
myLocale = Collator::getInstanceType(requestedLocale);
Here is an example, with notes below:
| en_US_CALIFORNIA | doesn't exist |
| en_US | exists, doesn't contain CollationElements, is default locale |
| en | exists, contains CollationElements |
| zz_WW | doesn't exist |
| zz | doesn't exist |
Instead of calling a static method on the class, we could call a an instance method on the object. That has two disadvantages:
(a) someone has to create an object in order to determine what it would be appropriate for. For iterating over lists to find out information, that is an extra burden.
(b) we wouldn't have to add storage to each object.