-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
emoji sort order in the DUCET #773
Comments
Discussion in CLDR/ICU design meeting 20240506: Options
|
TODO(markus): Talk with ESR, see if it would be acceptable to use a simplified emoji sort order without the |
Possible simplified sort order. I did this manually, for discussion, so it may not be 100% right. In the end, I also kept some ZWJ sequence contractions for things like lime, broken link, etc., assuming that we can support a small-ish number of them. (Will need some work in the sifter tool.) Once we agree on an approach, we will need to modify the generator code and get the real thing. For trying this out, either build an ICU RuleBasedCollator for the rules, or paste them into the "Append rules" box of the ICU Collation Demo.
|
Goals for this issue:
The DUCET could in principle sort symbols arbitrarily, for example by code point. However, it defines a bespoke sort order:
https://www.unicode.org/charts/collation/chart_General-Symbol.html
The DUCET sort order of emoji generally does not group similar emoji together unless they have adjacent code points.
At least one Unicode member organization has bug reports about the sort order of emoji.
UTS51 has long defined a grouping and sort order for emoji:
& [before 1]�€
FDD1 20AC; [0D 8A 02, 05, 05] # CURRENCY first primary
CLDR has long included a collation tailoring for this (see above), but it is hard to use.
CLDR has ticket CLDR-10745 “Merge emoji into CLDR root”. If the emoji sort order were built into the default sort order, then it would be always available.
We want the DUCET and CLDR root default sort orders to be the same.
If we agree to move the UTS51 emoji sort order into both default sort orders, then the cleanest way to do so is to modify the DUCET input data file, together with modifying the code that parses this file and outputs the actual sort order file so that it can handle whatever we need for this that it does not already handle.
The text was updated successfully, but these errors were encountered: