[FEEDBACK] Simpler formulation of Pattern Selection #898

macchiati · 2024-10-03T22:04:40Z

I think the algorithm for variant selection is much too complicated. That is, I think we can structure it in a way that gets the same results, but is not as complicated to explain — and matches a simpler and more efficient implementation that (a) doesn’t involve sorting, (b) is single-pass, and (c) can be implemented to have a fast exit.

This was sparked by the discussion around "resolved value" being needed in pattern selection. The 'dot' notation is used for convenience here, but needn't be in the fleshed-out text.

This algorithm avoids the complications involved in sorting, and allows for a single pass to find the best pattern. As in the sorting algorithm, there may be multiple key-lists that are "as good" as one another, and in that case the first of that group is chosen.

Changes from the initial version are in italics — and the Optimizations section is all new.

Definitions

Define a selector-list = 1*(s selector), as in the match-statement production
Define a key-list = key *(s key), as in the variant production

The Pattern Selection process depends on two capabilities of selectors:

selector.match(key) returns a value in {fail, ok}
selector.compare(key1, key2) returns a value in {worse, same, better}

The compare() is checking to see that key2 is better/same/worse than key2.
It is only ever called if key1 and key2 are ok.

Using these, list versions are defined in a natural way (see below for details):

selector-list.match(key-list)
selector-list.compare(key-list1, key-list2)

Determining which of a message's patterns is formatted

(where there are selectors)

Let bestVariant be undefined.
For each variant in the list:
- Let match be selector-list.match(variant.key-list)
- If match = fail, continue the for-loop
- Else if bestVariant is undefined, let bestVariant be variant
- Else if the selector-list.compare(variant.key-list, bestVariant.key-list) = better, then let bestVariant be variant
- Continue loop
If the loop terminated, return bestVariant
- // We are guaranteed to have one, since there is a key-list with all *’s

In other words, the result is fail if any selector.match(key) value = fail, else ok.

Determining selector-list.match(key-list)

Let result be ok
For each selector, key1 in the selector-list, key-list: (ie, taking the i-th element of each list)
- If key = "*", continue loop
- Let value be selector.match(toNfc(key))
- If value = fail, return fail
- Else continue loop
If the loop terminated, return result.

Determining selector-list.compare(key-list1, key-list2)

For each selector, key1, key2 in the selector-list, key-list1, key-list2: (ie., taking the i-th element of each list)
- Let result be selector.compare(key1, key2)
- If result ≠ same, return result
- Else continue loop
If the loop terminated, return same.

Optimizations (Optional)

There are various optimizations that have the same results, but that can improve the performance, sometimes quite significantly.

Best Value

One of them is to modify * selector.match(key) to return an additional value, best. The use of this value allows for early termination. The way it works is that when a key-list is found where every key is best (meaning no other key is better, though other keys could be the same), then the selection process can terminate. If there is no best value (an odd but possible case), the algorithm behaves as before. That involves the following small changes (in italics):

Definitions

selector.match(key) returns a value in {fail, ok, best}

Determining which of a message's patterns is formatted

If match = fail, continue the for-loop
Else if match = best, return the variant
Else if setVariant is undefined, let bestVariant be variant

Determining selector-list.match(key-list)

Let result be best
…
- If key = "*", let result be ok, and continue loop
  …
- Else if value is ok, let result be ok
- Else continue loop

In other words, the result is fail if any selector.match(key) value = fail, else best if every selector.match(key) value = best, else ok.

Example

In the following, checking for the best value can eliminate (on average) half of the key-list checks in the following set of variants.

.match $v1 $v2
a a {{…}}
a b {{…}}
a c {{…}}
a * {{…}}
b a {{…}}
b b {{…}}
b c {{…}}
b * {{…}}
c a {{…}}
c b {{…}}
c c {{…}}
* * {{…}}

Reducing function calls

The selector.compare(key1, key2) is only ever called where key2 does not have any fail values.
Thus an implementation need only call one function, selector.compare, if that function is enhanced to have values:
{fails, worse, same, better, best}

Let compare-value be selector-list.compare(bestVariant.key-list)
If compare-value = fail, continue the for-loop
Else if compare-value = best, return the variant
Else if bestVariant is undefined, let bestVariant be variant
Else if compare-value = better, then let bestVariant be variant

CURRENT TEXT

https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#pattern-selection
...

To determine which variant best matches a given set of inputs,
each selector is used in turn to order and filter the list of variants.

Each variant with a key that does not match its corresponding selector
is omitted from the list of variants.
The remaining variants are sorted according to the selector's key-ordering preference.
Earlier selectors in the matcher's list of selectors have a higher priority than later ones.

When all of the selectors have been processed,
the earliest-sorted variant in the remaining list of variants is selected.

This selection method is defined in more detail below.
An implementation MAY use any pattern selection method,
as long as its observable behavior matches the results of the method defined here.

Resolve Selectors

First, resolve the values of each selector:

Let res be a new empty list of resolved values that support selection.
For each selector sel, in source order,
1. Let rv be the resolved value of sel.
2. If selection is supported for rv:
  1. Append rv as the last element of the list res.
3. Else:
  1. Let nomatch be a resolved value for which selection always fails.
  2. Append nomatch as the last element of the list res.
  3. Emit a Bad Selector error.

The form of the resolved values is determined by each implementation,
along with the manner of determining their support for selection.

Resolve Preferences

Next, using res, resolve the preferential order for all message keys:

Let pref be a new empty list of lists of strings.
For each index i in res:
1. Let keys be a new empty list of strings.
2. For each variant var of the message:
  1. Let key be the var key at position i.
  2. If key is not the catch-all key '*':
    1. Assert that key is a literal.
    2. Let ks be the resolved value of key in Unicode Normalization Form C.
    3. Append ks as the last element of the list keys.
3. Let rv be the resolved value at index i of res.
4. Let matches be the result of calling the method MatchSelectorKeys(rv, keys)
5. Append matches as the last element of the list pref.

The method MatchSelectorKeys is determined by the implementation.
It takes as arguments a resolved selector value rv and a list of string keys keys,
and returns a list of string keys in preferential order.
The returned list MUST contain only unique elements of the input list keys.
The returned list MAY be empty.
The most-preferred key is first,
with each successive key appearing in order by decreasing preference.

The resolved value of each key MUST be in Unicode Normalization Form C ("NFC"),
even if the literal for the key is not.

If calling MatchSelectorKeys encounters any error,
a Bad Selector error is emitted
and an empty list is returned.

Filter Variants

Then, using the preferential key orders pref,
filter the list of variants to the ones that match with some preference:

Let vars be a new empty list of variants.
For each variant var of the message:
1. For each index i in pref:
  1. Let key be the var key at position i.
  2. If key is the catch-all key '*':
    1. Continue the inner loop on pref.
  3. Assert that key is a literal.
  4. Let ks be the resolved value of key.
  5. Let matches be the list of strings at index i of pref.
  6. If matches includes ks:
    1. Continue the inner loop on pref.
  7. Else:
    1. Continue the outer loop on message variants.
2. Append var as the last element of the list vars.

Sort Variants

Finally, sort the list of variants vars and select the pattern:

Let sortable be a new empty list of (integer, variant) tuples.
For each variant var of vars:
1. Let tuple be a new tuple (-1, var).
2. Append tuple as the last element of the list sortable.
Let len be the integer count of items in pref.
Let i be len - 1.
While i >= 0:
1. Let matches be the list of strings at index i of pref.
2. Let minpref be the integer count of items in matches.
3. For each tuple tuple of sortable:
  1. Let matchpref be an integer with the value minpref.
  2. Let key be the tuple variant key at position i.
  3. If key is not the catch-all key '*':
    1. Assert that key is a literal.
    2. Let ks be the resolved value of key.
    3. Let matchpref be the integer position of ks in matches.
  4. Set the tuple integer value as matchpref.
4. Set sortable to be the result of calling the method SortVariants(sortable).
5. Set i to be i - 1.
Let var be the variant element of the first element of sortable.
Select the pattern of var.

SortVariants is a method whose single argument is
a list of (integer, variant) tuples.
It returns a list of (integer, variant) tuples.
Any implementation of SortVariants is acceptable
as long as it satisfies the following requirements:

Let sortable be an arbitrary list of (integer, variant) tuples.
Let sorted be SortVariants(sortable).
sorted is the result of sorting sortable using the following comparator:
1. (i1, v1) <= (i2, v2) if and only if i1 <= i2.
The sort is stable (pairs of tuples from sortable that are equal
in their first element have the same relative order in sorted).

The text was updated successfully, but these errors were encountered:

eemeli · 2024-10-04T04:59:39Z

Based on a first look and consideration, this formulation of the selection algorithm should give the same results as the current one, but with a few caveats (in no particular order):

The inclusion of a best result for selector.match(key) is an unnecessary complication to the spec algorithm. It would be valid for an implementation to provide that optimization, but we don't need to care about early results in the spec text.
The * keys need to be handled directly within the selector-list.match(key-list) and selector-list.compare(key-list1, key-list2) methods rather than being passed to the user-defined methods. That's the only way we can guarantee their behaviour, as well as simplifying the inputs to user code to always be only strings.
The bit about parsing key values as NFC needs to be retained.
We don't need to amend the ABNF to account for these changes. The selector-list and key-list values contain resolved values rather than syntax values, so they'll need to be constructed as a part of the algorithm in any case.

I'd be very happy to review a PR replacing our current text with this, provided that the above concerns are accounted for.

macchiati · 2024-10-04T18:12:15Z

I don't want to stress the system, given the pending deadlines. I think the important thing is to have a clause that stresses that the current algorithm doesn't have to be followed exactly, the only requirement is that the same results obtain as the current text.

…

On Thu, Oct 3, 2024, 22:00 Eemeli Aro ***@***.***> wrote: Based on a first look and consideration, this formulation of the selection algorithm should give the same results as the current one, but with a few caveats (in no particular order): 1. The inclusion of a *best* result for *selector.match(key)* is an unnecessary complication to the spec algorithm. It would be valid for an implementation to provide that optimization, but we don't need to care about early results in the spec text. 2. The * keys need to be handled directly within the *selector-list.match(key-list)* and *selector-list.compare(key-list1, key-list2)* methods rather than being passed to the user-defined methods. That's the only way we can guarantee their behaviour, as well as simplifying the inputs to user code to always be only strings. 3. The bit about parsing key values as NFC needs to be retained. 4. We don't need to amend the ABNF to account for these changes. The *selector-list* and *key-list* values contain resolved values rather than syntax values, so they'll need to be constructed as a part of the algorithm in any case. I'd be very happy to review a PR replacing our current text with this, provided that the above concerns are accounted for. — Reply to this email directly, view it on GitHub <#898 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJLEMFRN37GBJCBQPMNMSTZZYOFDAVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJSHAZDIOBYG4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

macchiati · 2024-10-04T21:12:56Z

That being said, I'll look at your comments and update the source doc.

…

On Fri, Oct 4, 2024 at 11:11 AM Mark Davis Ⓤ ***@***.***> wrote: I don't want to stress the system, given the pending deadlines. I think the important thing is to have a clause that stresses that the current algorithm doesn't have to be followed exactly, the only requirement is that the same results obtain as the current text. On Thu, Oct 3, 2024, 22:00 Eemeli Aro ***@***.***> wrote: > Based on a first look and consideration, this formulation of the > selection algorithm should give the same results as the current one, but > with a few caveats (in no particular order): > > 1. The inclusion of a *best* result for *selector.match(key)* is an > unnecessary complication to the spec algorithm. It would be valid for an > implementation to provide that optimization, but we don't need to care > about early results in the spec text. > 2. The * keys need to be handled directly within the > *selector-list.match(key-list)* and *selector-list.compare(key-list1, > key-list2)* methods rather than being passed to the user-defined > methods. That's the only way we can guarantee their behaviour, as well as > simplifying the inputs to user code to always be only strings. > 3. The bit about parsing key values as NFC needs to be retained. > 4. We don't need to amend the ABNF to account for these changes. The > *selector-list* and *key-list* values contain resolved values rather > than syntax values, so they'll need to be constructed as a part of the > algorithm in any case. > > I'd be very happy to review a PR replacing our current text with this, > provided that the above concerns are accounted for. > > — > Reply to this email directly, view it on GitHub > <#898 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ACJLEMFRN37GBJCBQPMNMSTZZYOFDAVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJSHAZDIOBYG4> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

eemeli · 2024-10-05T06:22:53Z

We do already now include this:

message-format-wg/spec/formatting.md

Lines 474 to 475 in 22707c7

    
           An implementation MAY use any pattern selection method, 
        
           as long as its observable behavior matches the results of the method defined here.

macchiati · 2024-10-05T14:43:37Z

Perfect, thanks!

…

On Fri, Oct 4, 2024, 23:23 Eemeli Aro ***@***.***> wrote: We do already now include this: https://github.com/unicode-org/message-format-wg/blob/22707c778374ad4e2c9116efc805cedc295ef985/spec/formatting.md?plain=1#L474-L475 — Reply to this email directly, view it on GitHub <#898 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJLEMEL67DJHAF4OB6CKSLZZ6AVFAVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJUHE2DOOBWGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

aphillips · 2024-10-07T20:01:12Z

Is this a dupe of #715?

eemeli · 2024-10-07T20:16:38Z

The earlier issue should perhaps be closed; this is a better formulation.

macchiati · 2024-10-07T20:16:54Z

I think it is a better exposition than #715. Maybe close #715 as a dupe of this.

…

On Mon, Oct 7, 2024 at 1:01 PM Addison Phillips ***@***.***> wrote: Is this a dupe of #715 <#715>? — Reply to this email directly, view it on GitHub <#898 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJLEMEEU7V7GMQADJVKBETZ2LSB5AVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJXG44DCMBXGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

macchiati · 2024-10-07T21:59:21Z

@eemeli I made some changes to address your comments. Please take a look.

eemeli · 2024-10-08T08:20:39Z

Thank you, it's better! Still a couple of things:

The selector-list values need to be resolved values, as they need access to the formatting context.
It would be best for selector.compare(key1, key2) to be able to take string values as arguments, and for the * to be handled in selector-list.compare(). That would match the current algorithm, and allow for * and |*| keys to be treated differently, as they should be.

eemeli · 2024-10-08T08:24:56Z

Another thought: The only thing that's done with the result of selector.compare(key1, key2) is to check if it's better or not. So the worse and same values to be collapsed into one, so the user-defined function can return just a boolean value.

macchiati added the Preview-Feedback Feedback gathered during the technical preview label Oct 3, 2024

macchiati changed the title ~~[FEEDBACK]~~ [FEEDBACK] Simpler formulation of Pattern Selection Oct 3, 2024

macchiati mentioned this issue Oct 3, 2024

Add Resolved Values and Function Handler sections to formatting #728

Merged

aphillips added the formatting label Oct 4, 2024

aphillips mentioned this issue Oct 7, 2024

Simpler description of the matching algorithm #715

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEEDBACK] Simpler formulation of Pattern Selection #898

[FEEDBACK] Simpler formulation of Pattern Selection #898

macchiati commented Oct 3, 2024 •

edited

Loading

eemeli commented Oct 4, 2024

macchiati commented Oct 4, 2024 via email

macchiati commented Oct 4, 2024 via email

eemeli commented Oct 5, 2024 •

edited

Loading

macchiati commented Oct 5, 2024 via email

aphillips commented Oct 7, 2024

eemeli commented Oct 7, 2024

macchiati commented Oct 7, 2024 via email

macchiati commented Oct 7, 2024

eemeli commented Oct 8, 2024

eemeli commented Oct 8, 2024

[FEEDBACK] Simpler formulation of Pattern Selection #898

[FEEDBACK] Simpler formulation of Pattern Selection #898

Comments

macchiati commented Oct 3, 2024 • edited Loading

Definitions

Determining which of a message's patterns is formatted

Determining selector-list.match(key-list)

Determining selector-list.compare(key-list1, key-list2)

Optimizations (Optional)

Best Value

Definitions

Determining which of a message's patterns is formatted

Determining selector-list.match(key-list)

Example

Reducing function calls

CURRENT TEXT

Resolve Selectors

Resolve Preferences

Filter Variants

Sort Variants

eemeli commented Oct 4, 2024

macchiati commented Oct 4, 2024 via email

macchiati commented Oct 4, 2024 via email

eemeli commented Oct 5, 2024 • edited Loading

macchiati commented Oct 5, 2024 via email

aphillips commented Oct 7, 2024

eemeli commented Oct 7, 2024

macchiati commented Oct 7, 2024 via email

macchiati commented Oct 7, 2024

eemeli commented Oct 8, 2024

eemeli commented Oct 8, 2024

macchiati commented Oct 3, 2024 •

edited

Loading

eemeli commented Oct 5, 2024 •

edited

Loading