Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR semantic datetime skeleton spec is nearly ready and MF2 should use it #866

Open
sffc opened this issue Aug 22, 2024 · 12 comments
Open
Labels
LDML 47 LDML 47 Release (Stable) registry

Comments

@sffc
Copy link
Member

sffc commented Aug 22, 2024

The default registry still lists "field options" as being valid configuration for MF2.

https://github.com/unicode-org/message-format-wg/blob/main/exploration/default-registry-and-mf1-compatibility.md#field-options

As an implementer, I am troubled by this requirement creeping into MF2. It is well established that "field options" are filled with footguns and edge cases. Old-style datetime skeletons not only require that implementations like ICU4X ship larger code and data, they often do not encourage i18n best practices.

I have been working on a specification to solve these issues called "semantic skeleta". We have been discussing this almost every week in the CLDR Design WG that meets weekly on Mondays at 10am (unfortunately the same time slot as the MF WG meeting). Now that the spec draft is nearly complete, I wanted to share it here.

https://docs.google.com/document/d/1dmMk_XODm3DGe84GmMVw7O6a7oIs5yojDXm_25CcMbw/edit#heading=h.d9pp2vm43mob

This is not the first time I've raised this issue, but previously, semantic skeleta were only a design doc. I am posting this issue now to raise awareness about the upcoming technology preview in UTS 35, which I think MF2 should embrace.

@aphillips
Copy link
Member

@sffc Thanks for the update.

Please note that the document you linked is a design document. There is specification text in the registry.md document that also uses these various options.

The existing skeleton support in ICU did not make the cut for LDML45 and isn't supported, AFAIK, in MF2 except via "option bags". I look forward to seeing more of semantic skeletons. The exact way that we integrate such support in MF2 is at a critical juncture. If possible, it would be good to avoid creating sets of options that are deprecated shortly afterwards, but required for conformance. At the same time, we might be just a little ahead of being able to offer the new stuff. I look forward to a discussion here.

@sffc
Copy link
Member Author

sffc commented Aug 26, 2024

Supporting "options bags of individual field lengths" puts the same requirements on implementations such as ICU4X. In particular, it is highly customizable, meaning that implementations need to ship the DateTimePatternGenerator and all the code and data required for it. Semantic skeleta, on the other hand, are a strict subset designed to represent classical skeletons that "actually make sense", a small enough set that implementations can pre-compute the patterns ahead of time. So, requiring "options bags of individual field lengths" directly harms implementations (currently ICU4X but likely more in the future), and by extension clients of those implementations.

(I haven't brought this to ICU4X-TC for a formal recommendation but personally I think this rises to the level of "a concern that must be resolved during tech preview")

@eemeli
Copy link
Collaborator

eemeli commented Aug 26, 2024

@sffc Could you share a couple of examples of what a semantic skeleton value could look like when used as an MF2 option value? It's not immediately obvious to me from the linked doc what they look like.

@sffc
Copy link
Member Author

sffc commented Aug 26, 2024

The spec defines the schema, not a specific interface, but for MessageFormat 2.0, an interface could look something like

{$someDate :datetime fieldSet=[year, month, day] length=medium}

ICU4X is planning to use all-caps identifiers for the field sets, which MF2 could also choose to adopt (if that happened, we'd probably put them into the semantic skeleton spec)

{$someDate :datetime fieldSet=YMD length=medium}

Please note with semantic skeleta that not all field sets are well-defined. If you request a field set [year, hour], that is considered a syntax error.

@aphillips
Copy link
Member

Please note with semantic skeleta that not all field sets are well-defined. If you request a field set [year, hour], that is considered a syntax error.

Is "well-defined" a conformance term (the way we use valid and well-formed in say BCP47)?

I thought at one point there were enumerated names for the well-defined field sets, such as YearMonth etc. with the idea being that only useful ones would be defined.

@sffc
Copy link
Member Author

sffc commented Aug 26, 2024

An implementation should reject something like fieldSet=[year, hour] length=medium in order to be conformant, if that's what you're asking. That's a good call-out that I'll make sure gets into the semantic skeleta spec.

Yes, the spec lists out the field sets that are well-defined.

@mihnita
Copy link
Collaborator

mihnita commented Aug 28, 2024

Note that rejecting the options bag and taking in semantic skeletons means that the ICU4C and ICU4J up-to-date implementations of MF2 will have to wait until the semantic skeletons are also implemented in ICU4C and ICU4J.

I am not saying we should / should not do it.

Just saying that we would probably have to move all the option-bag behavior we have now to a namespace (draft) so that people have something to test with.

Any feedback we get in that space would not be as relevant.

And many might wait for adoption until the next release of ICU.


Also don't support option bags means that MF2 does not align with the current ECMAScript style for DateFormat.

@sffc
Copy link
Member Author

sffc commented Aug 28, 2024

Note that rejecting the options bag and taking in semantic skeletons means that the ICU4C and ICU4J up-to-date implementations of MF2 will have to wait until the semantic skeletons are also implemented in ICU4C and ICU4J.

I want to emphasize that semantic skeleta are designed to be implemented on top of a library that implements classical skeleta. In ICU4X, there are about 100 lines of code that sits between the semantic skeleton API and the classical skeleton API.

@eemeli
Copy link
Collaborator

eemeli commented Aug 29, 2024

Thus far, the option sets for :number and :datetime have been kept as subsets of the options available in the JS Intl formatters. Departing from that approach is something that ought to be discussed also in TC39 TG2. Is there any intent of proposing semantic skeletons for adoption in Intl.DateTimeFormat?

@sffc
Copy link
Member Author

sffc commented Aug 29, 2024

Semantic skeleta are designed to be implemented on top of classical skeleta, which includes ECMAScript-style options bags. In other words, semantic skeleta are a subset of Intl.DateTimeFormat with a facelift.

@aphillips aphillips added the Future Deferred for future standardization label Sep 9, 2024
@aphillips
Copy link
Member

I am tagging this as "Future" because it will not meet the cutoff for LDML46. It will still be considered prior to exiting Tech Preview, which is expected in the 2024 calendar year.

@aphillips aphillips added LDML46.1 MF2.0 Draft Candidate and removed Future Deferred for future standardization labels Sep 10, 2024
@sffc
Copy link
Member Author

sffc commented Sep 11, 2024

The semantic skeleton spec technical preview was just approved for CLDR 46. unicode-org/cldr#4031

Please note the section defining how to map from a semantic skeleton to a classical skeleton.

@aphillips aphillips added LDML 47 LDML 47 Release (Stable) and removed LDML46.1 MF2.0 Draft Candidate labels Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LDML 47 LDML 47 Release (Stable) registry
Projects
None yet
Development

No branches or pull requests

4 participants