Cumulative Fallback and Fallback Detection

Draft

2001 - 07 - 07

There are a number of places where our current Locale lookup is clumsy, and forces us to have artificial locales and duplicate data, leading to maintenance problems. Examples are Euro Currency Formats, Traditional Spanish Sort, Phonebook German Sort. There undoubtedly will be others.

For example, every Spanish locale could need a traditional Spanish sort. That means having variants of at least the following locales:

Argentina, Bolivia, Chile, Colombia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Puerto Rico, Spain, Spanish, United States, Uruguay, Venezuela

Here is what we have currently for Spanish, for example:

es {
    Version { "1.0" }
    CollationElements { 
        Version { "1.0" }
        Override { "FALSE" }
        Sequence { "& N < n\u0303, N\u0303 " }
    }
...
}
es__TRADITIONAL {
    Version { "1.0" }
    CollationElements { 
        Version { "1.0" }
        Override { "FALSE" }
        Sequence { "& N < n\u0303, N\u0303"
        "&C < ch <<< Ch <<< CH"
        "&l < ll <<< Ll <<< LL"  }
    }
    LocaleID { "040a" }
}

The problem is, we can't have locales that match all the Microsoft ones without needlessly duplicating _TRADITIONAL for all Spanish locales.

Cumulative Approach

Here is an idea for changing this. It is pretty rough so far, but I thought I would get it down on paper (or electrons) for comment.

Instead of having these all in separate resource bundles, we modify the lookup process and RB format. The RBs could look something like:

es {
    Version { "1.0" }
    CollationElements { 
        Version { "1.0" }
        Override { "FALSE" }
        Sequence { "& N < n\u0303, N\u0303 " }
    }
    CollationElements_TRADITIONAL { 
        Version { "1.0" }
        Override { "FALSE" }
        Sequence { "& N < n\u0303, N\u0303"
        "&C < ch <<< Ch <<< CH"
        "&l < ll <<< Ll <<< LL"  }
    }
...
}

Suppose our current locale was Spanish, Bolivia, Traditional Sort, Euro Currency (just to have a complicated example). This would be specified with "es_BO/EURO/TRADITIONAL". What we would do is look up in the following order:

Resource Bundle Tag in Resource Bundle
es_BO_EURO_TRADITIONAL CollationElements
es_BO CollationElements_EURO_TRADITIONAL
es CollationElements_EURO_TRADITIONAL
es_BO CollationElements_EURO
es CollationElements_EURO
es_BO CollationElements_TRADITIONAL
es CollationElements_TRADITIONAL
es_BO CollationElements
es CollationElements
default locale... CollationElements

Notice a few things. Variants on the same level would be separated in the input by slash, not "_", and do not form a tree structure. Instead, they are treated as tags, and they take precedence over the normal tree structure. That is, we take CollationElements_TRADITIONAL in Spanish in preference to the plain CollationElements in es_BO.

Once we find a CollationElements, we cache the result, so the next time we look for "es_BO/EURO/TRADITIONAL", we get it immediately. This is also an extreme case: typically there would be fewer lookups.

Alternatives

Fallback Detection

Right now, it is not easy for someone to find out whether a particular service is a fallback or not: and if it is, which locale it is actually appropriate for. For Resource Bundle itself, we note whether or not it was a fallback, with U_USING_FALLBACK_ERROR or U_USING_DEFAULT_ERROR, and you can find out the bundle that was used with ResourceBundle::getLocale (). But if you open a service (like a collator, etc), you can't do that.

To fix this, I suggest we add a static getInstanceType(locale) to every class that has a static createInstance(). It returns the most specific Locale for which the createInstance() is appropriate. For example, we would have the following:

myLocale = Collator::getInstanceType(requestedLocale);

Here is an example, with notes below:

en_US_CALIFORNIA doesn't exist
en_US exists, doesn't contain CollationElements, is default locale
en exists, contains CollationElements
zz_WW doesn't exist
zz doesn't exist

Alternatives

Instead of calling a static method on the class, we could call a an instance method on the object. That has two disadvantages:

(a) someone has to create an object in order to determine what it would be appropriate for. For iterating over lists to find out information, that is an extra burden.

(b) we wouldn't have to add storage to each object.