-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement basic full-text-search #1359
Changes from 3 commits
3e1eb52
9a80564
0897748
da9845a
7589b7d
7b80145
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -290,13 +290,12 @@ export class FirestoreRepository implements Repository { | |
} | ||
|
||
async writeCustomElements( | ||
packageVersionMetadata: Version, | ||
{name: packageName, version, description}: Version, | ||
customElements: CustomElementInfo[], | ||
distTags: string[], | ||
author: string | ||
): Promise<void> { | ||
// Store custom elements data in subcollection | ||
const {name: packageName, version, description} = packageVersionMetadata; | ||
const versionRef = this.getPackageVersionRef(packageName, version); | ||
const customElementsRef = versionRef.collection('customElements'); | ||
const isLatest = distTags.includes('latest'); | ||
|
@@ -326,6 +325,10 @@ export class FirestoreRepository implements Repository { | |
...descriptionStems, | ||
...summaryStems, | ||
...tagNameParts, | ||
tagName, | ||
// TODO (justinfagnani): tokenizing the package name is temporary | ||
// until we don't tokenize the *entire* query | ||
...natural.PorterStemmer.tokenizeAndStem(packageName), | ||
]), | ||
]; | ||
|
||
|
@@ -508,6 +511,7 @@ export class FirestoreRepository implements Repository { | |
if (queryTerms.length > 10) { | ||
queryTerms.length = 10; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why limit at 10? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's the Firestore limit: https://cloud.google.com/firestore/docs/query-data/queries#array-contains-any I should make a TODO/issue - If we keep this approach, we want to sort the query by inverse frequency so we keep the most important terms. Then 10 seems like enough. For longer queries I think you get better results with n-gram searches, but that's stretching my knowledge and probably another reason to use a service :) |
||
} | ||
console.log('queryTerms', queryTerms); | ||
justinfagnani marked this conversation as resolved.
Show resolved
Hide resolved
|
||
dbQuery = dbQuery.where('searchTerms', 'array-contains-any', queryTerms); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We discussed offline, but just for the record here -- it sounds like the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, and there are no errors in the emulator. |
||
} | ||
|
||
justinfagnani marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an intuition, no action needed. Feels like we might want to weight tag name parts a little more highly than description terms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think weighting / relevance might be one reason to push us to use a search service rather quickly. I don't think we have a natural way of getting Firestore to consider some kind of weighting in our query, we just get a result or not for the
array-contains-any
operator. Sorting by array fields isn't very useful. We can get Firestore to sort by simple fields like a "ranking" field.To do a relevance ordering we'd have to talk all the results, iterate and apply something like tf-id, and sort on that. That means our Firestore results size will have to be bigger than our requested query size because we might have high-relevance results later in the Firestore result set.
It's plausible we could continue down this path and keep it simple, but it also seems like it might get complicated real quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it didn't occur to me that we'd need the ranking to happen on the database side, and presumably you couldn't write a query like that (pretty sure you could with something like postgres, but firebase doubtful). I'm pretty convinced we'll need a standalone search service, then, based on lucene or similar (a service we run, or a third-party hosted one).