Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how does filtering work in lantern #321

Open
lukebuehler opened this issue Jun 11, 2024 · 5 comments
Open

how does filtering work in lantern #321

lukebuehler opened this issue Jun 11, 2024 · 5 comments

Comments

@lukebuehler
Copy link

I'm evaluating lantern and would love to see a small example and documentation on how filtering works in lantern.

Could be simply something like this:
https://github.com/pgvector/pgvector/?tab=readme-ov-file#filtering

Copy link

greptile-apps bot commented Jun 11, 2024

Filtering in Lantern can be done using standard SQL WHERE clauses. Here is a small example:

CREATE TABLE small_world (id integer, vector real[3]);
INSERT INTO small_world (id, vector) VALUES (0, '{0,0,0}'), (1, '{0,0,1}');

-- Create an index on the vector column
CREATE INDEX ON small_world USING lantern_hnsw (vector);

-- Filter with a WHERE clause
SELECT id, vector FROM small_world WHERE id = 1;

For more details, refer to the README.md file.

References

/sql
/src/hooks
/test/sql
/README.md

Ask Greptile

@dqii
Copy link
Contributor

dqii commented Jun 11, 2024

Hi @lukebuehler ! The example above does do filtering on id = 1.
There's another example here in the docs: https://lantern.dev/docs/develop/query
which does filtering on published_at < 2010
Does that help? Let me know if you were looking for something else!

@Ngalstyan4
Copy link
Contributor

Hi @lukebuehler,

The filtering example from the pgvector docs you linked above will work with no changes for lantern indexes as well!
That is - the index will be consulted for nearest vectors and the vectors will be filtered after the fact with the given SQL predicate.

If the filter filters out a large portion of the rows, vector searches with pgvector will often return no results.
Unlike pgvector, lantern will continue searching and returning results from the index until the given LIMIT is reached.
Currently, this is done by recursively searching for more and more elements, but I just opened a PR (#322 ) that makes this filtered search more efficient by loading elements from the vector index in a streaming fashion.

Let me know if you have any other questions!

@lukebuehler
Copy link
Author

This is helpful thanks! Good to know that you are using a recursive post-filter in lantern. I missed the query example for filtering.

In pgvector, if you have an index on the where column, the data is pre-filtered for exact search, and you can also create multiple partial indexes on a column which you can then select with a where clause. However, that only works with =, not with other comparisons. So It's nice that you do recursive search for non-equality predicates!

As for partial indexes, does lantern support it? It would be helpful for multi-tenanting.

Just my 2c: I often look for filtering documentation for various vector dbs, and they are often not very clear. I think a doc page that explains the basics conditions when it pre-filters, post-filters, uses a partial index (or whatever your feature are), etc is really helpful. LanceDB has a decent page.

@Ngalstyan4
Copy link
Contributor

As for partial indexes, does lantern support it? It would be helpful for multi-tenanting.

Partial indexes are supported, exactly as in btree or pgvector indexes! Here is an example from our tests

Thanks for the feedback on the docs! Will improve them before closing this issue.
Let me know if you have any other questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants