-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEEDBACK] Simpler formulation of Pattern Selection #898
Comments
Based on a first look and consideration, this formulation of the selection algorithm should give the same results as the current one, but with a few caveats (in no particular order):
I'd be very happy to review a PR replacing our current text with this, provided that the above concerns are accounted for. |
I don't want to stress the system, given the pending deadlines.
I think the important thing is to have a clause that stresses that the
current algorithm doesn't have to be followed exactly, the only requirement
is that the same results obtain as the current text.
…On Thu, Oct 3, 2024, 22:00 Eemeli Aro ***@***.***> wrote:
Based on a first look and consideration, this formulation of the selection
algorithm should give the same results as the current one, but with a few
caveats (in no particular order):
1. The inclusion of a *best* result for *selector.match(key)* is an
unnecessary complication to the spec algorithm. It would be valid for an
implementation to provide that optimization, but we don't need to care
about early results in the spec text.
2. The * keys need to be handled directly within the
*selector-list.match(key-list)* and *selector-list.compare(key-list1,
key-list2)* methods rather than being passed to the user-defined
methods. That's the only way we can guarantee their behaviour, as well as
simplifying the inputs to user code to always be only strings.
3. The bit about parsing key values as NFC needs to be retained.
4. We don't need to amend the ABNF to account for these changes. The
*selector-list* and *key-list* values contain resolved values rather
than syntax values, so they'll need to be constructed as a part of the
algorithm in any case.
I'd be very happy to review a PR replacing our current text with this,
provided that the above concerns are accounted for.
—
Reply to this email directly, view it on GitHub
<#898 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMFRN37GBJCBQPMNMSTZZYOFDAVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJSHAZDIOBYG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That being said, I'll look at your comments and update the source doc.
…On Fri, Oct 4, 2024 at 11:11 AM Mark Davis Ⓤ ***@***.***> wrote:
I don't want to stress the system, given the pending deadlines.
I think the important thing is to have a clause that stresses that the
current algorithm doesn't have to be followed exactly, the only requirement
is that the same results obtain as the current text.
On Thu, Oct 3, 2024, 22:00 Eemeli Aro ***@***.***> wrote:
> Based on a first look and consideration, this formulation of the
> selection algorithm should give the same results as the current one, but
> with a few caveats (in no particular order):
>
> 1. The inclusion of a *best* result for *selector.match(key)* is an
> unnecessary complication to the spec algorithm. It would be valid for an
> implementation to provide that optimization, but we don't need to care
> about early results in the spec text.
> 2. The * keys need to be handled directly within the
> *selector-list.match(key-list)* and *selector-list.compare(key-list1,
> key-list2)* methods rather than being passed to the user-defined
> methods. That's the only way we can guarantee their behaviour, as well as
> simplifying the inputs to user code to always be only strings.
> 3. The bit about parsing key values as NFC needs to be retained.
> 4. We don't need to amend the ABNF to account for these changes. The
> *selector-list* and *key-list* values contain resolved values rather
> than syntax values, so they'll need to be constructed as a part of the
> algorithm in any case.
>
> I'd be very happy to review a PR replacing our current text with this,
> provided that the above concerns are accounted for.
>
> —
> Reply to this email directly, view it on GitHub
> <#898 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ACJLEMFRN37GBJCBQPMNMSTZZYOFDAVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJSHAZDIOBYG4>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
We do already now include this: message-format-wg/spec/formatting.md Lines 474 to 475 in 22707c7
|
Perfect, thanks!
…On Fri, Oct 4, 2024, 23:23 Eemeli Aro ***@***.***> wrote:
We do already now include this:
https://github.com/unicode-org/message-format-wg/blob/22707c778374ad4e2c9116efc805cedc295ef985/spec/formatting.md?plain=1#L474-L475
—
Reply to this email directly, view it on GitHub
<#898 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMEL67DJHAF4OB6CKSLZZ6AVFAVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJUHE2DOOBWGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Is this a dupe of #715? |
The earlier issue should perhaps be closed; this is a better formulation. |
… On Mon, Oct 7, 2024 at 1:01 PM Addison Phillips ***@***.***> wrote:
Is this a dupe of #715
<#715>?
—
Reply to this email directly, view it on GitHub
<#898 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMEEU7V7GMQADJVKBETZ2LSB5AVCNFSM6AAAAABPKV5TAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJXG44DCMBXGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
@eemeli I made some changes to address your comments. Please take a look. |
Thank you, it's better! Still a couple of things:
|
Another thought: The only thing that's done with the result of selector.compare(key1, key2) is to check if it's better or not. So the worse and same values to be collapsed into one, so the user-defined function can return just a boolean value. |
I think the algorithm for variant selection is much too complicated. That is, I think we can structure it in a way that gets the same results, but is not as complicated to explain — and matches a simpler and more efficient implementation that (a) doesn’t involve sorting, (b) is single-pass, and (c) can be implemented to have a fast exit.
This was sparked by the discussion around "resolved value" being needed in pattern selection. The 'dot' notation is used for convenience here, but needn't be in the fleshed-out text.
This algorithm avoids the complications involved in sorting, and allows for a single pass to find the best pattern. As in the sorting algorithm, there may be multiple key-lists that are "as good" as one another, and in that case the first of that group is chosen.
Changes from the initial version are in italics — and the Optimizations section is all new.
Definitions
selector-list = 1*(s selector)
, as in thematch-statement
productionkey-list = key *(s key)
, as in thevariant
productionThe Pattern Selection process depends on two capabilities of selectors:
The compare() is checking to see that key2 is better/same/worse than key2.
It is only ever called if key1 and key2 are ok.
Using these, list versions are defined in a natural way (see below for details):
Determining which of a message's patterns is formatted
(where there are selectors)
In other words, the result is fail if any selector.match(key) value = fail, else ok.
Determining selector-list.match(key-list)
Determining selector-list.compare(key-list1, key-list2)
Optimizations (Optional)
There are various optimizations that have the same results, but that can improve the performance, sometimes quite significantly.
Best Value
One of them is to modify * selector.match(key) to return an additional value, best. The use of this value allows for early termination. The way it works is that when a key-list is found where every key is best (meaning no other key is better, though other keys could be the same), then the selection process can terminate. If there is no best value (an odd but possible case), the algorithm behaves as before. That involves the following small changes (in italics):
Definitions
Determining which of a message's patterns is formatted
Determining selector-list.match(key-list)
…
…
In other words, the result is fail if any selector.match(key) value = fail, else best if every selector.match(key) value = best, else ok.
Example
In the following, checking for the best value can eliminate (on average) half of the key-list checks in the following set of variants.
Reducing function calls
The selector.compare(key1, key2) is only ever called where key2 does not have any fail values.
Thus an implementation need only call one function, selector.compare, if that function is enhanced to have values:
{fails, worse, same, better, best}
CURRENT TEXT
https://github.com/unicode-org/message-format-wg/blob/main/spec/formatting.md#pattern-selection
...
To determine which variant best matches a given set of inputs,
each selector is used in turn to order and filter the list of variants.
Each variant with a key that does not match its corresponding selector
is omitted from the list of variants.
The remaining variants are sorted according to the selector's key-ordering preference.
Earlier selectors in the matcher's list of selectors have a higher priority than later ones.
When all of the selectors have been processed,
the earliest-sorted variant in the remaining list of variants is selected.
This selection method is defined in more detail below.
An implementation MAY use any pattern selection method,
as long as its observable behavior matches the results of the method defined here.
Resolve Selectors
First, resolve the values of each selector:
res
be a new empty list of resolved values that support selection.sel
, in source order,rv
be the resolved value ofsel
.rv
:rv
as the last element of the listres
.nomatch
be a resolved value for which selection always fails.nomatch
as the last element of the listres
.The form of the resolved values is determined by each implementation,
along with the manner of determining their support for selection.
Resolve Preferences
Next, using
res
, resolve the preferential order for all message keys:pref
be a new empty list of lists of strings.i
inres
:keys
be a new empty list of strings.var
of the message:key
be thevar
key at positioni
.key
is not the catch-all key'*'
:key
is a literal.ks
be the resolved value ofkey
in Unicode Normalization Form C.ks
as the last element of the listkeys
.rv
be the resolved value at indexi
ofres
.matches
be the result of calling the method MatchSelectorKeys(rv
,keys
)matches
as the last element of the listpref
.The method MatchSelectorKeys is determined by the implementation.
It takes as arguments a resolved selector value
rv
and a list of string keyskeys
,and returns a list of string keys in preferential order.
The returned list MUST contain only unique elements of the input list
keys
.The returned list MAY be empty.
The most-preferred key is first,
with each successive key appearing in order by decreasing preference.
The resolved value of each key MUST be in Unicode Normalization Form C ("NFC"),
even if the literal for the key is not.
If calling MatchSelectorKeys encounters any error,
a Bad Selector error is emitted
and an empty list is returned.
Filter Variants
Then, using the preferential key orders
pref
,filter the list of variants to the ones that match with some preference:
vars
be a new empty list of variants.var
of the message:i
inpref
:key
be thevar
key at positioni
.key
is the catch-all key'*'
:pref
.key
is a literal.ks
be the resolved value ofkey
.matches
be the list of strings at indexi
ofpref
.matches
includesks
:pref
.var
as the last element of the listvars
.Sort Variants
Finally, sort the list of variants
vars
and select the pattern:sortable
be a new empty list of (integer, variant) tuples.var
ofvars
:tuple
be a new tuple (-1,var
).tuple
as the last element of the listsortable
.len
be the integer count of items inpref
.i
belen
- 1.i
>= 0:matches
be the list of strings at indexi
ofpref
.minpref
be the integer count of items inmatches
.tuple
ofsortable
:matchpref
be an integer with the valueminpref
.key
be thetuple
variant key at positioni
.key
is not the catch-all key'*'
:key
is a literal.ks
be the resolved value ofkey
.matchpref
be the integer position ofks
inmatches
.tuple
integer value asmatchpref
.sortable
to be the result of calling the methodSortVariants(sortable)
.i
to bei
- 1.var
be the variant element of the first element ofsortable
.var
.SortVariants
is a method whose single argument isa list of (integer, variant) tuples.
It returns a list of (integer, variant) tuples.
Any implementation of
SortVariants
is acceptableas long as it satisfies the following requirements:
sortable
be an arbitrary list of (integer, variant) tuples.sorted
beSortVariants(sortable)
.sorted
is the result of sortingsortable
using the following comparator:(i1, v1)
<=(i2, v2)
if and only ifi1 <= i2
.sortable
that are equalin their first element have the same relative order in
sorted
).The text was updated successfully, but these errors were encountered: