[FEEDBACK] Message Format Unquoted Literals #724
Labels
Future
Deferred for future standardization
Preview-Feedback
Feedback gathered during the technical preview
resolve-candidate
This issue appears to have been answered or resolved, and may be closed soon.
syntax
Issues related with MF Syntax
Summary
Consider relaxing constraints on literals, after v45
Background
Right now, unquoted literals are fairly narrowly constrained by
message.abnf
; here are the relevant lines:
Reason for reconsidering
However, for functions outside of the standard registry, this forces
many natural literals to use quotes. Here is an example from a function
that would handle MF1’s choice format:
The natural literals to use would be intervals, which use [,(,),]
characters for ranges (the choice format would require some recasting
because it depends on ordering of variants. It currently uses >.) So
that would require
Many Unicode symbols are included by XML’s NT-NCName (about 6,000
currently), while many are excluded (about 2,600 currently). But these
are literals, not identifiers, which is what name is
intended for. By expanding beyond identifier usage, it allows functions
to avoid requiring quoting in many cases. It also allows us to dispense
with the special formulation for number-literal.
The literals for number, date, etc could be specified elsewhere, but
wouldn’t have to be in the ABNF.
That would allow for various registries to have more sophisticated
literal without requiring quoting, and without privileging the
structured literals that we know about now.
Requirements
So, what restrictions on characters for a broadened definition of
unquoted literals would be required by a revised ABNF?
No ‘}’, because it would make .local $x = {literal} fail.
No ‘|’, because an initial one would conflict with quoting, and it is best to just forbid it anywhere in an unquoted literal to prevent confusion.
No ‘{’. Not strictly required, but for clarity wherever used.
None of the big blocks of ‘strange’ code points that XML forbids: controls, surrogates, private-use, noncharacters.
These are all immutable (Unicode Character Encoding Stability).
This also disallows the noncharacters that XML didn’t know about yet, before the noncharacter property was made immutable.
No whitespace, since variant uses that for separators between keys.
This could be done by just disallowing the “s” production characters, but that could be very confusing. {a b} looks too much like two items (the space is an A0 NO-BREAK SPACE). So it should be broadened to the Unicode Whitespace characters.
Unicode Whitespace is not guaranteed immutable, but has not changed for over a decade. Anyway, we would derive the code points as of now, so everything would be stable into the future.
(Any others?)
Not coincidentally, 2-3 are the characters in the reserved-escape
production.
Detailed Proposal
This would result in the following change:
OLD
NEW
Needed to avoid syntax conflicts
Whitespace
Controls
Surrogates
Private Use
Noncharacters
The text was updated successfully, but these errors were encountered: