Implement basic full-text-search #1359

justinfagnani · 2022-10-08T18:47:59Z

This implements full-text search by:

Stemming a number of fields and storing the results in an array field in CustomElement documents
Stemming the user's text query
Retrieving elements with a collectionGroup query across the customElement collections and filtering with an array-contains-any operator

I'm using https://github.com/NaturalNode/natural for stemming. It looks well maintained and thorough compared to other libraries.

The fields currently indexed are:

package description
element summary
element description
A split of the element name by -, using segments more than 3 characters long (to get button from sl-button, etc).

Builds on #1364. We need to fix #1351 to ensure that queries are confined to a single namespace.

aomarks

Nice! I can see us moving to something e.g. Lucene based, or a separate hosted service in the future -- but I like how simple this implementation is and am happy to see how far it can take us (especially after adding some ranking).

Main comment is for more test coverage :D

aomarks · 2022-10-14T01:15:59Z

packages/catalog-api/src/lib/schema.graphql

-    elementName: String!
-    tag: String = "latest"
-  ): CustomElement
+  elements(query: String, limit: Int): [CustomElement!]!


Do we perhaps also want a strict lookup like the element API we had before, or will we instead support search prefixes like package:foo element:bar?

I think that the package() query gives us what we had with element() - and we can make some of the fields like customElements into queries if we want that.

I think this query should grow to add both search operators and possibly subqueries/fields that get you from element to package here.

One consideration is how to implement structured + text search. If we move to use a search service do we do both Firestore + search service queries and mix them? If a search service offer structured fields do we delegate all searches to it?

packages/catalog-server/src/lib/catalog.ts

aomarks · 2022-10-14T01:17:45Z

packages/catalog-server/src/lib/firestore/firestore-repository.ts

          // Update the PackageVersion doc
-          await t.update(versionRef, {
+          // We remove the converter to fix the types:


Not sure I understand "remove the converter" here, since it's actually adding "withConverter" -- seems like the opposite? :)

.withConverter(null) removes the packageVersionConverter that's added in getPackageVersionCollectionRef.

packages/catalog-server/src/lib/firestore/firestore-repository.ts

aomarks · 2022-10-14T01:19:36Z

packages/catalog-server/src/lib/firestore/firestore-repository.ts

+      const tagName = c.export.name;
+      // Grab longer tag name parts for searching. We want "button" from
+      // md-button, etc.
+      const tagNameParts = tagName.split('-').filter((s) => s.length > 3);


Maybe even >=2? Thinking of md- which I heard might be the new prefix for mwc?

Yeah, I did > 3 to try to exclude the prefixes - I wanted to try to capture words here, not the prefixes. This is very debatable, I just threw this in, but that was my thinking.

aomarks · 2022-10-14T01:24:32Z

packages/catalog-server/src/lib/firestore/firestore-repository.ts

+      if (queryTerms.length > 10) {
+        queryTerms.length = 10;
+      }
+      dbQuery = dbQuery.where('searchTerms', 'array-contains-any', queryTerms);


We discussed offline, but just for the record here -- it sounds like the searchTerms field is automatically indexed, and firestore supports indexed array-contains-any lookups.

Yep, and there are no errors in the emulator.

packages/catalog-server/src/lib/graphql.ts

aomarks · 2022-10-14T01:26:40Z

packages/catalog-server/src/lib/firestore/firestore-repository.ts

+    limit,
+  }: {
+    query?: string;
+    limit?: number;


Can limit be non-optional here, since the same 25 default is assigned earlier in the call stack? Or maybe factored out to a constant? (Just thinking if we want to change the default, it would be easy to accidentally only do it in one place).

It actually seems fine for it to differ. The graphql one is what matters most. This is just a fallback incase someone called the repo directly without a limit. Required is fine too, though in a JS world I'd still usually check before launching a non-limited query :)

aomarks · 2022-10-14T01:27:07Z

packages/catalog-server/src/test/lib/catalog_test.ts

@@ -25,6 +25,13 @@ const testPackage1Path = fileURLToPath(
  new URL('../test-packages/test-1/', import.meta.url)
 );

+// A set of import, fetch, search tests that use the same data
+const TEST_SEQUENCE_ONE = 'test-data-1';


Slightly confused by the term "sequence" here, it's actually a partition name? Or is sequence some technical firestore term?

It's my term... I'm meaning a sequence of tests that rely on each other. I really want some kind of test nesting.

packages/catalog-server/src/lib/firestore/firestore-repository.ts

aomarks · 2022-10-14T01:36:59Z

packages/catalog-server/src/test/lib/catalog_test.ts

@@ -84,6 +91,37 @@ test('A second import does nothing', async () => {
  assert.equal(importResult.problems, undefined);
 });

+test('Full text search', async () => {


(I thought I sent this comment with the review, but must have lost it somehow).

Feels like we could do with some more test coverage here. Maybe a test rig/utility function that would make it easier to concisely write the scenarios? Some cases it feels like we should have:

Search by tag name term

Search by package term

Search by entire package name

Search by entire tag name

Search by description term

Tests for the various stemmings/other normalization we expect to be happening -- case sensitivity, removing punctuation, plural normalization, etc.

A search with both a matching and non-matching term (to confirm it's OR not AND matching)

Checking that the limit is respected

No matching results

No elements in the index at all

Agree we should definitely have more coverage. I added a few more tests for now.

packages/catalog-server/src/lib/firestore/firestore-repository.ts

justinfagnani requested a review from aomarks October 8, 2022 18:47

justinfagnani force-pushed the elements-query branch from 6f6ba2a to b8277a9 Compare October 12, 2022 19:46

Add 'isLatest' field to reduce array-contains uses

3e1eb52

justinfagnani force-pushed the elements-query branch from b8277a9 to 07ad0bf Compare October 13, 2022 07:12

Implement basic full-text-search

9a80564

justinfagnani force-pushed the elements-query branch from 07ad0bf to 9a80564 Compare October 13, 2022 15:35

justinfagnani changed the base branch from main to latest-field October 13, 2022 15:35

justinfagnani changed the title ~~WIP: prototype full-text-search~~ Implement basic full-text-search Oct 13, 2022

justinfagnani marked this pull request as ready for review October 13, 2022 15:41

Base automatically changed from latest-field to main October 13, 2022 17:04

aomarks requested changes Oct 14, 2022

View reviewed changes

justinfagnani added 3 commits October 14, 2022 10:22

Address feedback

0897748

Merge branch 'main' into elements-query

da9845a

Add more tests

7589b7d

justinfagnani requested a review from aomarks October 14, 2022 21:21

aomarks approved these changes Oct 15, 2022

View reviewed changes

packages/catalog-server/src/lib/firestore/firestore-repository.ts Outdated Show resolved Hide resolved

Remove console log

7b80145

justinfagnani merged commit 3abc31d into main Oct 15, 2022

justinfagnani deleted the elements-query branch October 15, 2022 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement basic full-text-search #1359

Implement basic full-text-search #1359

justinfagnani commented Oct 8, 2022 •

edited

Loading

aomarks left a comment

aomarks Oct 14, 2022

justinfagnani Oct 14, 2022

aomarks Oct 14, 2022

justinfagnani Oct 14, 2022

aomarks Oct 14, 2022 •

edited

Loading

justinfagnani Oct 14, 2022

aomarks Oct 14, 2022

justinfagnani Oct 14, 2022

aomarks Oct 14, 2022 •

edited

Loading

justinfagnani Oct 14, 2022

aomarks Oct 14, 2022

justinfagnani Oct 14, 2022

aomarks Oct 14, 2022

justinfagnani Oct 14, 2022

Implement basic full-text-search #1359

Implement basic full-text-search #1359

Conversation

justinfagnani commented Oct 8, 2022 • edited Loading

aomarks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aomarks Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aomarks Oct 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinfagnani commented Oct 8, 2022 •

edited

Loading

aomarks Oct 14, 2022 •

edited

Loading

aomarks Oct 14, 2022 •

edited

Loading