-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement basic full-text-search #1359
Conversation
6f6ba2a
to
b8277a9
Compare
b8277a9
to
07ad0bf
Compare
07ad0bf
to
9a80564
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I can see us moving to something e.g. Lucene based, or a separate hosted service in the future -- but I like how simple this implementation is and am happy to see how far it can take us (especially after adding some ranking).
Main comment is for more test coverage :D
elementName: String! | ||
tag: String = "latest" | ||
): CustomElement | ||
elements(query: String, limit: Int): [CustomElement!]! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we perhaps also want a strict lookup like the element
API we had before, or will we instead support search prefixes like package:foo element:bar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the package()
query gives us what we had with element()
- and we can make some of the fields like customElements
into queries if we want that.
I think this query should grow to add both search operators and possibly subqueries/fields that get you from element to package here.
One consideration is how to implement structured + text search. If we move to use a search service do we do both Firestore + search service queries and mix them? If a search service offer structured fields do we delegate all searches to it?
// Update the PackageVersion doc | ||
await t.update(versionRef, { | ||
// We remove the converter to fix the types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand "remove the converter" here, since it's actually adding "withConverter" -- seems like the opposite? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.withConverter(null)
removes the packageVersionConverter
that's added in getPackageVersionCollectionRef
.
packages/catalog-server/src/lib/firestore/firestore-repository.ts
Outdated
Show resolved
Hide resolved
const tagName = c.export.name; | ||
// Grab longer tag name parts for searching. We want "button" from | ||
// md-button, etc. | ||
const tagNameParts = tagName.split('-').filter((s) => s.length > 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even >=2
? Thinking of md-
which I heard might be the new prefix for mwc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did > 3 to try to exclude the prefixes - I wanted to try to capture words here, not the prefixes. This is very debatable, I just threw this in, but that was my thinking.
if (queryTerms.length > 10) { | ||
queryTerms.length = 10; | ||
} | ||
dbQuery = dbQuery.where('searchTerms', 'array-contains-any', queryTerms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline, but just for the record here -- it sounds like the searchTerms
field is automatically indexed, and firestore supports indexed array-contains-any
lookups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, and there are no errors in the emulator.
limit, | ||
}: { | ||
query?: string; | ||
limit?: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can limit be non-optional here, since the same 25
default is assigned earlier in the call stack? Or maybe factored out to a constant? (Just thinking if we want to change the default, it would be easy to accidentally only do it in one place).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually seems fine for it to differ. The graphql one is what matters most. This is just a fallback incase someone called the repo directly without a limit. Required is fine too, though in a JS world I'd still usually check before launching a non-limited query :)
@@ -25,6 +25,13 @@ const testPackage1Path = fileURLToPath( | |||
new URL('../test-packages/test-1/', import.meta.url) | |||
); | |||
|
|||
// A set of import, fetch, search tests that use the same data | |||
const TEST_SEQUENCE_ONE = 'test-data-1'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly confused by the term "sequence" here, it's actually a partition name? Or is sequence some technical firestore term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my term... I'm meaning a sequence of tests that rely on each other. I really want some kind of test nesting.
@@ -84,6 +91,37 @@ test('A second import does nothing', async () => { | |||
assert.equal(importResult.problems, undefined); | |||
}); | |||
|
|||
test('Full text search', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I thought I sent this comment with the review, but must have lost it somehow).
Feels like we could do with some more test coverage here. Maybe a test rig/utility function that would make it easier to concisely write the scenarios? Some cases it feels like we should have:
- Search by tag name term
- Search by package term
- Search by entire package name
- Search by entire tag name
- Search by description term
- Tests for the various stemmings/other normalization we expect to be happening -- case sensitivity, removing punctuation, plural normalization, etc.
- A search with both a matching and non-matching term (to confirm it's OR not AND matching)
- Checking that the limit is respected
- No matching results
- No elements in the index at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we should definitely have more coverage. I added a few more tests for now.
This implements full-text search by:
customElement
collections and filtering with anarray-contains-any
operatorI'm using https://github.com/NaturalNode/natural for stemming. It looks well maintained and thorough compared to other libraries.
The fields currently indexed are:
-
, using segments more than 3 characters long (to getbutton
fromsl-button
, etc).Builds on #1364. We need to fix #1351 to ensure that queries are confined to a single namespace.