Skip to content

Releases: meilisearch/arroy

v0.5.0

01 Oct 14:08
629d1d1
Compare
Choose a tag to compare

New features

Binary quantization by @irevoire in #82

The binary quantization lets you index up to 10 times more items for the same amount of disk.
The drawback is that it reduces the relevancy when querying documents.
The more dimensions your dataset has, the less the relevancy is impacted. After benchmarking the binary quantization a lot we recommend you use it if:

  • You have (or plan to have) more than 100_000 items in your database
  • Your items have more than 1400 dimensions

To use the feature, you can simply change the Distance provided when opening a Writer and a Reader by adding BinaryQuantized to it.
Euclidean becomes BinaryQuantizedEuclidean for example.

Warning

Enabling the binary quantization is a destructive operation. Once enabled, all your vectors will be modified to only contain -1 and 1, and you won’t be able to get back your original vectors ever again.

Finally, binary quantization has not been implemented for the dot-product distance.

Accept a function to abort the indexing process by @irevoire in #86

If you ever wanted to stop arroy from finishing an indexing process, that’s for you.
You can now provide a closure that arroy will call from time to time, and if it returns true arroy will stop as quickly as possible and return the new error: BuildCancelled.

Breaking

Rename the angular distance to cosine distance by @irevoire in #94

This is both API-breaking and DB-breaking, which means you'll have to re-import all your vectors by hand in arroy after upgrading.
Since it’s more common, we decided to rename Angular and BinaryQuantizedAngular to Cosine and BinaryQuantizedCosine.

Use builder pattern for the configuration by @irevoire in #96

This is API breaking.

Since the API to query vectors and build databases was getting more and more optional parameters, we decided to use a builder pattern that should ease the usage and let us add new configuration options without breaking in the future.

Now, instead of writing:

let results = reader.nns_by_item(&rtxn, item_id, n_results, search_k, None)?.unwrap();

You would instead write:

let results = reader.nns(n_results).search_k(search_k).by_item(&rtxn, item_id)?.unwrap();

The same goes for the build method, instead of writing:

writer.build(&mut wtxn, &mut rng, None)?;

You instead write:

writer.builder(&mut rng).build(&mut wtxn)?;

Maintenance

  • Make the warning output errors in the ci by @irevoire in #97
  • Reorganize the NodeId to make the appending of vectors work in more cases and add a test by @irevoire in #98
  • Store the list of updated IDs directly in LMDB instead of a roaring bitmap to increase the vector insertion performances by @irevoire in #99
  • increase the arroy version for the next release by @irevoire in #100

Full Changelog: v0.4.0...v0.5.0

v0.4.0

01 Oct 14:07
973f093
Compare
Choose a tag to compare

What's Changed

  • Fix panic at search when the index is empty by @irevoire in #76
  • Panic at search when the trees need a rebuild by @irevoire in #77
  • Add a method on reader to get all the item ids by @irevoire in #78
  • Fix lints and update dependencies by @irevoire in #79

Full Changelog: v0.3.1...v0.4.0

v0.3.1

16 May 16:16
7d65b82
Compare
Choose a tag to compare

What's Changed

v0.3.0

29 Feb 11:11
19e0a07
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.3.0

v0.2.0

16 Jan 15:08
2ff1567
Compare
Choose a tag to compare

A lot of stuff was implemented since the last release, but to sum up the most important one: Arroy now comes with multi-threading and incremental indexing. That officially makes it faster than annoy by a good margin 🎉

A lot of work has been put into making sure the generated trees are valid and working as well.

The whole list of changes

New Contributors

Full Changelog: v0.1.0...v0.2.0

v0.1.0

16 Jan 15:06
8b55f3e
Compare
Choose a tag to compare

Initial release yay! 🎉

See the readme to see everything that was implemented.