Releases: meilisearch/arroy
v0.5.0
New features
Binary quantization by @irevoire in #82
The binary quantization lets you index up to 10 times more items for the same amount of disk.
The drawback is that it reduces the relevancy when querying documents.
The more dimensions your dataset has, the less the relevancy is impacted. After benchmarking the binary quantization a lot we recommend you use it if:
- You have (or plan to have) more than 100_000 items in your database
- Your items have more than 1400 dimensions
To use the feature, you can simply change the Distance
provided when opening a Writer
and a Reader
by adding BinaryQuantized
to it.
Euclidean
becomes BinaryQuantizedEuclidean
for example.
Warning
Enabling the binary quantization is a destructive operation. Once enabled, all your vectors will be modified to only contain -1
and 1
, and you won’t be able to get back your original vectors ever again.
Finally, binary quantization has not been implemented for the dot-product distance.
Accept a function to abort the indexing process by @irevoire in #86
If you ever wanted to stop arroy from finishing an indexing process, that’s for you.
You can now provide a closure that arroy will call from time to time, and if it returns true
arroy will stop as quickly as possible and return the new error: BuildCancelled
.
Breaking
Rename the angular distance to cosine distance by @irevoire in #94
This is both API-breaking and DB-breaking, which means you'll have to re-import all your vectors by hand in arroy after upgrading.
Since it’s more common, we decided to rename Angular
and BinaryQuantizedAngular
to Cosine
and BinaryQuantizedCosine
.
Use builder pattern for the configuration by @irevoire in #96
This is API breaking.
Since the API to query vectors and build databases was getting more and more optional parameters, we decided to use a builder pattern that should ease the usage and let us add new configuration options without breaking in the future.
Now, instead of writing:
let results = reader.nns_by_item(&rtxn, item_id, n_results, search_k, None)?.unwrap();
You would instead write:
let results = reader.nns(n_results).search_k(search_k).by_item(&rtxn, item_id)?.unwrap();
The same goes for the build
method, instead of writing:
writer.build(&mut wtxn, &mut rng, None)?;
You instead write:
writer.builder(&mut rng).build(&mut wtxn)?;
Maintenance
- Make the warning output errors in the ci by @irevoire in #97
- Reorganize the NodeId to make the appending of vectors work in more cases and add a test by @irevoire in #98
- Store the list of updated IDs directly in LMDB instead of a roaring bitmap to increase the vector insertion performances by @irevoire in #99
- increase the arroy version for the next release by @irevoire in #100
Full Changelog: v0.4.0...v0.5.0
v0.4.0
v0.3.1
v0.3.0
What's Changed
- Stops returning a result in
Writer::new
by @irevoire in #64 - Add a test to check if Writer::clear works by @Kerollmops in #66
Full Changelog: v0.2.0...v0.3.0
v0.2.0
A lot of stuff was implemented since the last release, but to sum up the most important one: Arroy now comes with multi-threading and incremental indexing. That officially makes it faster than annoy
by a good margin 🎉
A lot of work has been put into making sure the generated trees are valid and working as well.
The whole list of changes
- Ensure the right distance is used when querying a database by @Kerollmops in #28
- Cleanup Documentation by @Kerollmops in #30
- Improve the
Reader
andWriter
creation by @Kerollmops in #27 - Split the import from the searches in the movies example by @irevoire in #25
- Add a way to generate graphs and to get stats about your database by @irevoire in #34
- Use roaring bitmaps to store descendants by @irevoire in #37
- Search filtering by @irevoire in #38
- Build the trees in parallel by @Kerollmops in #32
- Introduce the
Reader/Writer::is_empty
andcontains_item
methods by @Kerollmops in #43 - Fix the Angular distance and Windows by @Kerollmops in #46
- Make sure we always build at least one tree by @dureuill in #49
- Incremental indexing by @irevoire in #41
- Update LICENSE by @curquiza in #50
- Re-use deleted node IDs in incremental mode by @irevoire in #56
- Add a fuzzer by @irevoire in #57
- Improve the tmp node deletion by @irevoire in #62
New Contributors
Full Changelog: v0.1.0...v0.2.0