-
-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22794 MF2: De-duplicate C++ and Java test data #3050
ICU-22794 MF2: De-duplicate C++ and Java test data #3050
Conversation
0071651
to
3503550
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Maybe this was discussed before, but for example ICU tarballs in https://github.com/unicode-org/icu/releases/tag/release-75-1 do not have this format. Perhaps we should have a situation where the Then the code could check |
#else | ||
srcDataDir = ".." U_FILE_SEP_STRING ".." U_FILE_SEP_STRING ".." U_FILE_SEP_STRING ".." U_FILE_SEP_STRING "testdata" U_FILE_SEP_STRING; | ||
FILE *f = fopen(".." U_FILE_SEP_STRING ".." U_FILE_SEP_STRING "test" U_FILE_SEP_STRING | ||
"testdata" U_FILE_SEP_STRING "rbbitst.txt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you move rbbitst.txt in this PR? I think this should perhaps be a different file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's correct as-is, but it's a weird way to check if a directory exists (I copied that from existing code). In 0ef54a5 I changed it to call stat()
on the testdata
directory itself instead.
checkCondition(cp == '|', "Invalid escape sequence, only \"\\|\" is valid here"); | ||
result.appendCodePoint('|'); | ||
boolean isValidEscape = cp == '|' || cp == '\\' || cp == '{' || cp == '}'; | ||
checkCondition(isValidEscape, "Invalid escape sequence inside quoted literal"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to validate the approach of shared test data!
Path icuTestdataDir = filePath.resolve("../../../../../../../../../../../testdata/message2/").normalize(); | ||
return icuTestdataDir.resolve(json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally should test and give the user some feedback if it doesn't work. Also, see comment about packaging when using a non-repo work area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c53f203. Re: using a non-repo work area, see #3050 (comment)
testdata/message2/README.txt
Outdated
Tests in the `spec/` subdirectory are taken from https://github.com/unicode-org/message-format-wg/blob/main/test | ||
and need to be manually updated if the contents change upstream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests in the `spec/` subdirectory are taken from https://github.com/unicode-org/message-format-wg/blob/main/test | |
and need to be manually updated if the contents change upstream. | |
Tests in the `spec/` subdirectory are taken from https://github.com/unicode-org/message-format-wg/blob/main/test | |
and need to be manually updated if the contents change upstream, but see https://unicode-org.atlassian.net/browse/ICU-22812 for making that automatic. |
wouldn't it be from CLDR not mfwg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of -- if MFWG changes the spec tests, then CLDR would need to change to reflect those changes (for non-spec tests, CLDR is the source of truth). Clarified this in 1993b61.
In ICU4C, 0ef54a5 should do this. I'm not sure how to do the same for ICU4J, though. I got as far as finding the |
I was able to get it to work by editing the
though I don't know if that breaks anything. However, I'm not sure how to run the tests in the source .jar file once it's been created. |
Looks like the build failures are timeouts. |
that has to do with needing to use an older java to build.
I'm not sure, but I can also try to test/fix this in the PR #3053
You have to unpack the jar and then compile everything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be in the right direction, need to verify with ICU-TC whether sys/stat.h is OK for inclusion.
@catamorphism OK, well all tests pass on ICU4C (verified they are loading the files from the common area) and on ICU4J they fail with this (which at least shows that the test data is being read from its location!)
|
Do any of the .jar files actually contain tests? I noticed that the |
No need, I changed it to check for a text file. |
What command are you running to get that output? It's weird because the string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for refactoring test data & getting ICU4J tests to use the entire shared test data set including fixing up ICU4J unit test code accordingly.
…rectory Modify ICU4C and ICU4J test readers to handle all tests Add `ignoreJava` and `ignoreCpp` properties to tests where needed Includes parser bug fixes: ICU4J: require a complex-body after declarations ICU4J: Correctly parse the complex body after an unsupported statement ICU4J: Handle date params in tests and remove default params for tests ICU4J: Handle decimal params in tests ICU4J: Require whitespace before variable/literal in reserved annotation ICU4J: Require whitespace between options ICU4J: Require a variable-expression in an .input declaration ICU4J: don't require space between last key and pattern in variant ICU4J: don't require space between selectors ICU4J: allow whitespace after '=' in option ICU4J: parse escape sequences in quoted literals according to grammar ICU4J: allow whitespace within markup after attributes list
a7e238e
to
dd6d64c
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the good work!
some questions...
# Copy top-level testdata directory so it's a sibling of the source/ directory | ||
cp -R $(ICU4CTOP)/../testdata $(DISTY_TMP)/icu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we copying testdata files?
Is it because we distribute ICU4C .zip/.jar files rooted in icu4c/ rather than the one-higher repo root?
If so, shouldn't we change that instead? E.g., start with 76 to package up from the repo root, but for ICU4C exclude the icu4j/ and tools/ folders (and probably some more stuff)?
@@ -1679,6 +1679,61 @@ const char *IntlTest::getSourceTestData(UErrorCode& /*err*/) { | |||
return srcDataDir; | |||
} | |||
|
|||
static bool fileExists(const char* fileName) { | |||
// Test for `srcDataDir` existing by checking for `srcDataDir`/message2/valid-tests.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strangely, overly specific for this rather generic function.
Did this want to be a comment on or in getSharedTestData()?
static bool fileExists(const char* fileName) { | ||
// Test for `srcDataDir` existing by checking for `srcDataDir`/message2/valid-tests.json | ||
U_ASSERT(fileName != nullptr); | ||
FILE *f = fopen(fileName, "r"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can/should we use some modern-ish C++ library function for "file exists"?
// First, check the top level of the source directory, | ||
// in case we're in a source tarball |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as for ICU4C: Should we start to package up ICU4J zip/jar/tar files from repo root, except omit unnecessary folders?
This PR moves all the .json files for data-driven ICU4C and ICU4J tests into a single top-level directory,
icu/testdata/
.I did my best to remove duplicate tests.
This involved fixing several ICU4J parser bugs, almost all involving whitespace handling.
Some tests still had to be ignored in either Java or C++, in which cases I filed ICU tickets.
Checklist