Add couple of sanity checks, improve e2e test usability and fix minor bugs #43

pradh · 2021-08-27T15:16:40Z

New checks:

Existence checks for unit and mmethod of SVObs
Ensure DCID refs have a valid set of characters in a backward-compatible way.
Expect refs for units
Ensures name / dcid match in Prop/Class

(Some of the above should also address #36, though existence-checks would already report errors, in a round-about way.)

While rolling this out, found it very hard to update the expectation for the e2e tests, so made two changes:

Instead of asserting on the first error in map diff, print all relevant info and then assert.
Add a mode to these tests where we can actually generate the expectation. (The counter maps aren't really ordered in json, but that might be okay for now.)

Fix a couple of bugs in parser:

When StringUtil.SplitStructuredLineWithEscapes returns failure, we shouldn't throw exception. (See added McfParserTest unit tests)
Allow merging empty graphs into an empty McfGraph. This is useful for testing empty files.

pradh · 2021-08-27T15:22:27Z

tool/src/test/resources/org/datacommons/tool/genmcf/successtmcf/output/generated.mcf

@@ -55,34 +55,34 @@ typeOf: dcid:City
 Node: dcid:CH-SH
 dcid: "CH-SH"
 name: "Shanghai"
-location:
+location: 


Just noticing... filed #44 for this

beets · 2021-08-27T16:47:57Z

tool/src/test/java/org/datacommons/tool/GenMcfTest.java

+            TestUtil.readStringFromPath(expectedReportPath),
+            TestUtil.readStringFromPath(actualReportPath));
+        assertEquals(
+            org.datacommons.util.TestUtil.mcfFromFile(expectedGeneratedFilePath.toString()),


is just using TestUtil sufficient (rather than the full package path?)

There's a TestUtil in this directory too.

beets · 2021-08-27T16:53:51Z

tool/src/test/java/org/datacommons/tool/TestUtil.java

-        actualLog.getCounterSet().getCountersMap());
-    assertMapsAreEqual(
-        "Level Summary", expectedLog.getLevelSummaryMap(), actualLog.getLevelSummaryMap());
-    assertTrue(actualLog.getEntriesList().containsAll(expectedLog.getEntriesList()));


have you tried this pattern? it might give better error messages (there's also an option to ignore ordering).

assertThat(actualRecordProtos) .displayingDiffsPairedBy(Record::getId) .containsExactlyElementsIn(expectedRecords);

That is an awesome tip! Though I didn't use this specifically, I finally found a way to do "expect" and got rid of the custom function.

chejennifer

thanks for the changes!

chejennifer · 2021-08-27T16:47:59Z

tool/src/test/java/org/datacommons/tool/GenMcfTest.java

@@ -33,15 +33,24 @@
 // directory, add an input directory and an output directory. In the input directory, put the test
 // files you want to run the lint tool against. In the output directory, put a report.json file with
 // the expected report output.
+//
+// These tests can be run in a mode to produce golden files, as below:
+//    mvn -DgoldenFilesPrefix=$PWD/tool/src/test/resources/org/datacommons/tool test


should there be a space like "mvn -D goldenFilesPrefix..."?

I think no space https://maven.apache.org/surefire/maven-surefire-plugin/examples/single-test.html

chejennifer · 2021-08-27T17:01:42Z

util/src/main/java/org/datacommons/util/McfChecker.java

@@ -38,6 +40,14 @@
      Set.of(Vocabulary.DOMAIN_INCLUDES, Vocabulary.RANGE_INCLUDES);
  private final Set<String> PROP_REFS_IN_PROP =
      Set.of(Vocabulary.NAME, Vocabulary.LABEL, Vocabulary.DCID, Vocabulary.SUB_PROPERTY_OF);
+
+  // Includes: a-z A-Z 0-9 _ & + - % / . )(


do we want to allow "&" in dcid? When I did an import, I noticed that enums that represent a combination of multiple enums uses "&" to concat the multiple enums and so having "&" in a dcid gave me problems.

i believe we already include "&", but i was hoping we can avoid any special url characters from our dcid's (so no [&?= #]%+). we do use / but are able to handle this on the clients

Unfortunately, &+% show up currently in our refs/dcids today.

I've filed #45 to track this. Perhaps we could disallow in import tool with a special mode to run backward-compatibly, or something?

since this is long-pending/important, lets switch discussion to #45, and I'll submit this PR :)

pradh

Thanks for the review!

pradh · 2021-08-27T16:58:05Z

tool/src/test/java/org/datacommons/tool/GenMcfTest.java

+            TestUtil.readStringFromPath(expectedReportPath),
+            TestUtil.readStringFromPath(actualReportPath));
+        assertEquals(
+            org.datacommons.util.TestUtil.mcfFromFile(expectedGeneratedFilePath.toString()),


There's a TestUtil in this directory too.

pradh · 2021-08-27T17:42:50Z

tool/src/test/java/org/datacommons/tool/GenMcfTest.java

@@ -33,15 +33,24 @@
 // directory, add an input directory and an output directory. In the input directory, put the test
 // files you want to run the lint tool against. In the output directory, put a report.json file with
 // the expected report output.
+//
+// These tests can be run in a mode to produce golden files, as below:
+//    mvn -DgoldenFilesPrefix=$PWD/tool/src/test/resources/org/datacommons/tool test


I think no space https://maven.apache.org/surefire/maven-surefire-plugin/examples/single-test.html

pradh · 2021-08-27T17:44:09Z

tool/src/test/java/org/datacommons/tool/TestUtil.java

-        actualLog.getCounterSet().getCountersMap());
-    assertMapsAreEqual(
-        "Level Summary", expectedLog.getLevelSummaryMap(), actualLog.getLevelSummaryMap());
-    assertTrue(actualLog.getEntriesList().containsAll(expectedLog.getEntriesList()));


That is an awesome tip! Though I didn't use this specifically, I finally found a way to do "expect" and got rid of the custom function.

pradh · 2021-08-27T17:52:06Z

util/src/main/java/org/datacommons/util/McfChecker.java

@@ -38,6 +40,14 @@
      Set.of(Vocabulary.DOMAIN_INCLUDES, Vocabulary.RANGE_INCLUDES);
  private final Set<String> PROP_REFS_IN_PROP =
      Set.of(Vocabulary.NAME, Vocabulary.LABEL, Vocabulary.DCID, Vocabulary.SUB_PROPERTY_OF);
+
+  // Includes: a-z A-Z 0-9 _ & + - % / . )(


Unfortunately, &+% show up currently in our refs/dcids today.

I've filed #45 to track this. Perhaps we could disallow in import tool with a special mode to run backward-compatibly, or something?

pradh added 6 commits August 26, 2021 19:25

Implement additional checks

8452b6d

Improve tests

b4068c7

Improve tests

e4af2b6

Improve tests

f973b6e

Improve tests

dd6a997

Improve tests

18eea44

pradh requested review from chejennifer and beets August 27, 2021 15:16

pradh commented Aug 27, 2021

View reviewed changes

beets approved these changes Aug 27, 2021

View reviewed changes

chejennifer reviewed Aug 27, 2021

View reviewed changes

chejennifer approved these changes Aug 27, 2021

View reviewed changes

Address PR comments

aab8a39

pradh commented Aug 27, 2021

View reviewed changes

pradh mentioned this pull request Aug 27, 2021

Transition out of using special chars in DCIDs #45

Open

pradh merged commit 6199f84 into datacommonsorg:master Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add couple of sanity checks, improve e2e test usability and fix minor bugs #43

Add couple of sanity checks, improve e2e test usability and fix minor bugs #43

pradh commented Aug 27, 2021 •

edited

Loading

pradh Aug 27, 2021

beets Aug 27, 2021

pradh Aug 27, 2021

beets Aug 27, 2021

pradh Aug 27, 2021

chejennifer left a comment

chejennifer Aug 27, 2021

pradh Aug 27, 2021

chejennifer Aug 27, 2021

beets Aug 27, 2021

pradh Aug 27, 2021

pradh Aug 27, 2021

pradh left a comment

pradh Aug 27, 2021

pradh Aug 27, 2021

pradh Aug 27, 2021

pradh Aug 27, 2021

Add couple of sanity checks, improve e2e test usability and fix minor bugs #43

Add couple of sanity checks, improve e2e test usability and fix minor bugs #43

Conversation

pradh commented Aug 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chejennifer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pradh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pradh commented Aug 27, 2021 •

edited

Loading