Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] rust-agent-qbg #2751

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

[Draft] rust-agent-qbg #2751

wants to merge 1 commit into from

Conversation

datelier
Copy link
Contributor

@datelier datelier commented Nov 19, 2024

Description

Related Issue

Versions

  • Vald Version: v1.7.14
  • Go Version: v1.23.3
  • Rust Version: v1.82.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.2
  • Helm Version: v3.16.2
  • NGT Version: v2.3.4
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

  • New Features
    • Introduced a new library for managing Quantized Binary Graph (QBG) construction and indexing.
    • Added functionality for creating and manipulating properties and indices related to QBG, including robust parameter management and lifecycle operations.
  • Bug Fixes
    • Enhanced error handling in index operations.
  • Documentation
    • Added comprehensive documentation for new classes and methods related to QBG.
  • Tests
    • Implemented tests to ensure the functionality of creating, appending, and searching indices.

Copy link
Contributor

coderabbitai bot commented Nov 19, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes in this pull request introduce a new library, qbg, into the Rust project by modifying several Cargo.toml files and adding new source files. The workspace Cargo.toml is updated to include the qbg library, while the Cargo.toml for the algorithm library adds a dependency on qbg. A new Cargo.toml file is created for the qbg library, specifying its dependencies and metadata. Additionally, new source files (build.rs, input.cpp, input.h, lib.rs) are added to implement functionality related to QBG construction and indexing.

Changes

File Path Change Summary
rust/Cargo.toml Added new workspace member: "libs/algorithms/qbg"
rust/libs/algorithm/Cargo.toml Added new dependency: qbg = { version = "0.1.0", path = "../algorithms/qbg" }
rust/libs/algorithms/qbg/Cargo.toml Created new package with metadata, version, edition, and dependencies (anyhow, cxx, etc.)
rust/libs/algorithms/qbg/build.rs Added build script for integrating C++ code using cxx, specifying source files and compiler flags.
rust/libs/algorithms/qbg/src/input.cpp Introduced Property and Index classes with methods for QBG management and error handling.
rust/libs/algorithms/qbg/src/input.h Declared Property and Index classes and their associated functions.
rust/libs/algorithms/qbg/src/lib.rs Added FFI module for Property and Index, including various function signatures and test cases.

Possibly related PRs

Suggested labels

size/XXXL, actions/backport/release/v1.7

Suggested reviewers

  • vankichi
  • kmrmt

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vdaas-ci
Copy link
Collaborator

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Outside diff range and nitpick comments (10)
rust/libs/algorithm/Cargo.toml (1)

25-25: Documentation needed for QBG integration

Since this is a draft PR introducing a new algorithm component, please provide:

  1. Description of what QBG is and its purpose
  2. Integration requirements and dependencies
  3. Any performance implications or considerations

Consider adding these details to:

  • PR description
  • README in the QBG module
  • Code documentation
rust/libs/algorithms/qbg/src/input.h (2)

32-39: Add parameter names to set_qbg_construction_parameters for clarity

Including parameter names in the function declaration enhances readability and helps developers understand the purpose of each parameter.


49-65: Include parameter names in set_qbg_build_parameters function

Providing parameter names in the function declaration improves code comprehension and maintainability.

rust/libs/algorithms/qbg/src/lib.rs (5)

59-62: Inconsistent Integer Types in set_hierarchical_clustering_init_mode

The function set_hierarchical_clustering_init_mode uses i16 for the parameter hierarchical_clustering_init_mode, whereas similar parameters in other functions use i32. Consider using consistent integer types to avoid potential issues.


63-65: Type Mismatch in set_number_of_second_objects

The parameter number_of_second_objects in set_number_of_second_objects is of type u32, while other similar parameters use usize. For consistency and to prevent potential bugs, consider using usize here as well.


109-130: Use Meaningful Values in Test Property Setters

In the test test_qbg, many property setters are assigned the value 1. Using more realistic and diverse values can improve the effectiveness of the test by covering a wider range of scenarios.


184-185: Simplify Return Statements in Test Functions

The lines return Ok(()); can be simplified to just Ok(()) for brevity and idiomatic Rust code.

Also applies to: 216-217


84-85: Clarify Parameter Naming and Types in new_index and new_prebuilt_index

The new_index function takes a Pin<&mut Property> as the second parameter named p, while new_prebuilt_index takes a bool named p. This overloading of p with different types may cause confusion. Consider renaming the boolean parameter to something more descriptive, like prebuilt, for clarity.

rust/libs/algorithms/qbg/src/input.cpp (2)

315-328: Missing Error Handling in Factory Functions

The factory functions new_index and new_prebuilt_index do not handle exceptions that may be thrown during object creation, potentially leading to unhandled exceptions.

Consider using exception handling to capture and report any errors during object creation:

std::unique_ptr<Index> new_index(const rust::String &path, Property &p)
{
    try
    {
        return std::make_unique<Index>(path, p);
    }
    catch (const std::exception &e)
    {
        // Handle exception, possibly log the error
        std::cerr << "Error creating Index: " << e.what() << std::endl;
        return nullptr;
    }
}

227-238: Use of const char* Instead of std::string in open_index

Passing cpath.c_str() may lead to issues if cpath goes out of scope before c_str() is used.

Since cpath is a local variable, it's safe here, but consider passing std::string directly if the underlying function accepts it, to reduce unnecessary conversions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3d51a12 and d6e423f.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
🔇 Additional comments (4)
rust/Cargo.toml (2)

24-24: LGTM! The workspace member addition follows project conventions.

The new qbg algorithm library is properly placed alongside other algorithm implementations (ngt, faiss).


24-24: Verify qbg library setup

Let's verify that the qbg library directory exists and has the required Cargo.toml configuration.

✅ Verification successful

QBG library setup is properly configured

The verification shows that the qbg library is correctly set up with:

  • Directory structure present at rust/libs/algorithms/qbg/
  • Required files: Cargo.toml, build.rs, and src/ directory
  • Proper Cargo.toml configuration with:
    • Package metadata
    • Required dependencies (anyhow, cxx)
    • Build dependencies (cxx-build, miette)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify qbg library structure and configuration

# Check if the directory exists
if [ ! -d "rust/libs/algorithms/qbg" ]; then
    echo "Error: qbg directory does not exist at expected path"
    exit 1
fi

# Check for Cargo.toml and essential files
echo "Checking qbg library structure:"
ls -la rust/libs/algorithms/qbg/

# Display Cargo.toml contents if it exists
if [ -f "rust/libs/algorithms/qbg/Cargo.toml" ]; then
    echo -e "\nQBG Cargo.toml contents:"
    cat rust/libs/algorithms/qbg/Cargo.toml
else
    echo "Error: Cargo.toml not found in qbg directory"
fi

Length of output: 1461

rust/libs/algorithm/Cargo.toml (1)

25-25: Verify QBG integration path and dependencies

The dependency path looks consistent with other algorithm dependencies, but let's verify the integration.

✅ Verification successful

QBG dependency path and integration verified

The QBG integration appears to be properly structured:

  • QBG directory exists at the correct path (rust/libs/algorithms/qbg)
  • The library implements a C++ bridge for QBG index construction and search
  • Proper license headers and version metadata are present
  • No direct usage in algorithm lib yet, which is expected for a draft PR
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify QBG integration

# Check if qbg directory exists
echo "Checking qbg directory..."
if [ -d "rust/libs/algorithms/qbg" ]; then
    echo "✓ QBG directory exists"
else
    echo "✗ QBG directory not found"
fi

# Check for other potential integration points
echo -e "\nChecking for other QBG references..."
rg -l "qbg" --type rust

Length of output: 336


Script:

#!/bin/bash
# Let's check the content of lib.rs to understand the integration points
echo "=== Content of QBG lib.rs ==="
cat rust/libs/algorithms/qbg/src/lib.rs

echo -e "\n=== Checking Cargo.toml for metadata and description ==="
cat rust/libs/algorithms/qbg/Cargo.toml

echo -e "\n=== Checking for any usage in algorithm lib ==="
rg "qbg" rust/libs/algorithm/src/ -A 2 -B 2

Length of output: 9834

rust/libs/algorithms/qbg/src/input.cpp (1)

273-273: Incorrect Use of v.length() in append

The use of v.length() may be incorrect if v does not have a length() method or if it does not return the expected size.

Verify that v.length() correctly returns the number of elements in the slice. If not, use v.size() or adjust accordingly.

Run the following script to check for the correct method to retrieve the size of rust::Slice:

Ensure that the correct method is consistently used across the codebase.

✅ Verification successful

Let's search for actual usage patterns of rust::Slice in the codebase to understand how it's being used and what methods are available. I've also included a search for the specific vector construction pattern to see the surrounding context.


Use of v.length() is correct for rust::Slice

Looking at the codebase usage, v.length() is the correct method for rust::Slice. This is evidenced by its usage in qbg/src/input.cpp and supported by the fact that rust::Slice is a Rust-C++ bridge type where the Rust convention for getting collection size is length(). The code correctly constructs a vector from the slice's iterators and uses length() to get its size.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for the correct method to get size from rust::Slice in the codebase.

# Search for usage of rust::Slice methods
rg --type-add 'cpp' --files-with-matches 'rust::Slice' | xargs grep -E 'rust::Slice.*\.(length\(\)|size\(\))' -A 2

Length of output: 172


Script:

#!/bin/bash
# Search for rust::Slice usage patterns in cpp files
rg "rust::Slice" -A 3 -g "*.cpp"

# Also search for the specific line context
rg "std::vector<float>\(.*\.begin\(\),.*\.end\(\)\)" -g "*.cpp" -A 2

Length of output: 2993

rust/libs/algorithms/qbg/src/input.h Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.h Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.h Show resolved Hide resolved
rust/libs/algorithms/qbg/src/lib.rs Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/lib.rs Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
Copy link

cloudflare-workers-and-pages bot commented Nov 19, 2024

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: c7fce8b
Status: ✅  Deploy successful!
Preview URL: https://e54683ce.vald.pages.dev
Branch Preview URL: https://feature-agent-rust-agent-qbg.vald.pages.dev

View logs

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 19, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (2)
rust/libs/algorithms/qbg/src/input.cpp (1)

31-34: Add const correctness to getter methods.

The getter methods should be marked as const since they don't modify the object's state.

-QBGConstructionParameters *Property::get_qbg_construction_parameters()
+QBGConstructionParameters *Property::get_qbg_construction_parameters() const
 {
     return qbg_construction_parameters;
 }

-QBGBuildParameters *Property::get_qbg_build_parameters()
+QBGBuildParameters *Property::get_qbg_build_parameters() const
 {
     return qbg_build_parameters;
 }

Also applies to: 95-98

rust/libs/algorithms/qbg/src/lib.rs (1)

362-443: Add error case testing

The test only covers the happy path. Add tests for error cases:

  • Invalid dimensions
  • Empty vectors
  • Out of bounds K values
  • Invalid file paths

Add error case tests:

#[test]
fn test_ffi_qbg_errors() -> Result<()> {
    let path = "invalid/path".to_string();
    let mut p = Property::new();
    p.init_qbg_construction_parameters();
    
    // Test invalid path
    assert!(Index::new(&path, &mut p).is_err());
    
    // Test invalid dimensions
    p.set_dimension(0);
    assert!(Index::new(&"index".to_string(), &mut p).is_err());
    
    Ok(())
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d6e423f and 6de8a6a.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • rust/Cargo.toml
  • rust/libs/algorithm/Cargo.toml
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
  • rust/libs/algorithms/qbg/src/input.h
🔇 Additional comments (2)
rust/libs/algorithms/qbg/src/input.cpp (2)

230-233: Uncomment and implement the destructor.

The destructor is currently commented out, which could lead to resource leaks. The index should be properly closed when the object is destroyed.


335-348: LGTM! Well-implemented factory functions.

The factory functions make good use of modern C++ practices with std::unique_ptr for automatic memory management.

Comment on lines +60 to +95
void Property::set_extended_dimension(rust::usize extended_dimension)
{
qbg_construction_parameters->extended_dimension = extended_dimension;
}

void Property::set_dimension(rust::usize dimension)
{
qbg_construction_parameters->dimension = dimension;
}

void Property::set_number_of_subvectors(rust::usize number_of_subvectors)
{
qbg_construction_parameters->number_of_subvectors = number_of_subvectors;
}

void Property::set_number_of_blobs(rust::usize number_of_blobs)
{
qbg_construction_parameters->number_of_blobs = number_of_blobs;
}

void Property::set_internal_data_type(rust::i32 internal_data_type)
{
qbg_construction_parameters->internal_data_type = internal_data_type;
}

void Property::set_data_type(rust::i32 data_type)
{
qbg_construction_parameters->data_type = data_type;
}

void Property::set_distance_type(rust::i32 distance_type)
{
qbg_construction_parameters->distance_type = distance_type;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add parameter validation in setter methods.

The setter methods for construction parameters should validate their input values to ensure they are within acceptable ranges. For example, dimension and number_of_subvectors should be positive values.

Example implementation for one setter:

 void Property::set_dimension(rust::usize dimension)
 {
+    if (dimension == 0) {
+        throw std::invalid_argument("dimension must be positive");
+    }
     qbg_construction_parameters->dimension = dimension;
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 299 to 356
void Index::search(
rust::Slice<const rust::f32> v,
rust::usize k,
rust::i32 *ids,
rust::f32 *distances)
{
QBGQuery query;
qbg_initialize_query(&query);
std::vector<float> vec(v.begin(), v.end());
query.query = vec.data();

NGTError err = ngt_create_error_object();
NGTObjectDistances results = ngt_create_empty_results(err);
bool ok = qbg_search_index(index, query, results, err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
qbg_destroy_results(results);
ngt_destroy_error_object(err);
return;
}

size_t rsize = qbg_get_result_size(results, err);
size_t limit = std::min(k, rsize);
for (size_t i = 0; i < limit; i++)
{
NGTObjectDistance obj = qbg_get_result(results, i, err);
ids[i] = obj.id;
distances[i] = obj.distance;
}

qbg_destroy_results(results);
ngt_destroy_error_object(err);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider thread safety in search method.

The search method uses shared resources (index) without synchronization. If this method is called from multiple threads, it could lead to race conditions.

Consider adding mutex protection:

+#include <mutex>
+
 class Index {
+private:
+    std::mutex search_mutex;
+
 public:
     void search(
         rust::Slice<const rust::f32> v,
         rust::usize k,
         rust::i32 *ids,
         rust::f32 *distances)
     {
+        std::lock_guard<std::mutex> lock(search_mutex);
         // ... rest of the method
     }
 };

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 284 to 305
rust::i32 Index::append(rust::Slice<const rust::f32> v)
{
NGTError err = ngt_create_error_object();
std::vector<float> vec(v.begin(), v.end());
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
if (id == 0)
{
std::cerr << ngt_get_error_string(err) << std::endl;
ngt_destroy_error_object(err);
return 0;
}
ngt_destroy_error_object(err);
return id;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in append method.

The method returns 0 on error, which could be confused with a valid ID. Consider using exceptions or a more explicit error handling mechanism.

-rust::i32 Index::append(rust::Slice<const rust::f32> v)
+rust::i32 Index::append(rust::Slice<const rust::f32> v)
 {
     NGTError err = ngt_create_error_object();
     std::vector<float> vec(v.begin(), v.end());
     unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
     if (id == 0)
     {
-        std::cerr << ngt_get_error_string(err) << std::endl;
+        std::string error_msg = ngt_get_error_string(err);
         ngt_destroy_error_object(err);
-        return 0;
+        throw std::runtime_error("Failed to append object: " + error_msg);
     }
     ngt_destroy_error_object(err);
     return id;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rust::i32 Index::append(rust::Slice<const rust::f32> v)
{
NGTError err = ngt_create_error_object();
std::vector<float> vec(v.begin(), v.end());
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
if (id == 0)
{
std::cerr << ngt_get_error_string(err) << std::endl;
ngt_destroy_error_object(err);
return 0;
}
ngt_destroy_error_object(err);
return id;
}
rust::i32 Index::append(rust::Slice<const rust::f32> v)
{
NGTError err = ngt_create_error_object();
std::vector<float> vec(v.begin(), v.end());
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
if (id == 0)
{
std::string error_msg = ngt_get_error_string(err);
ngt_destroy_error_object(err);
throw std::runtime_error("Failed to append object: " + error_msg);
}
ngt_destroy_error_object(err);
return id;
}

Comment on lines 330 to 342
pub fn open_index(&mut self, path: &String, prebuilt: bool) {
self.inner.pin_mut().open_index(path, prebuilt);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for index operations

Methods like open_index, build_index, and save_index don't return Results, making it impossible to handle failures gracefully.

Add consistent error handling:

-    pub fn open_index(&mut self, path: &String, prebuilt: bool) {
-        self.inner.pin_mut().open_index(path, prebuilt);
+    pub fn open_index(&mut self, path: &String, prebuilt: bool) -> Result<(), cxx::Exception> {
+        self.inner.pin_mut().open_index(path, prebuilt)?;
+        Ok(())
     }

-    pub fn build_index(&mut self, path: &String, p: &mut property::Property) {
-        self.inner.pin_mut().build_index(path, p.get_property());
+    pub fn build_index(&mut self, path: &String, p: &mut property::Property) -> Result<(), cxx::Exception> {
+        self.inner.pin_mut().build_index(path, p.get_property())?;
+        Ok(())
     }

Also applies to: 334-336, 338-340

Comment on lines 91 to 97
unsafe fn search(
self: Pin<&mut Index>,
v: &[f32],
k: usize,
ids: *mut i32,
distances: *mut f32,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add safety documentation for unsafe search function

The unsafe search function uses raw pointers without clear documentation about:

  • Required buffer sizes for ids and distances
  • Alignment requirements
  • Ownership and lifetime guarantees
  • Valid parameter ranges

Add comprehensive safety documentation:

+        /// # Safety
+        /// 
+        /// Caller must ensure:
+        /// - `ids` points to a buffer of size >= k
+        /// - `distances` points to a buffer of size >= k
+        /// - Both buffers are properly aligned
+        /// - Buffers remain valid for the duration of the call
+        /// - k > 0 and k <= number of indexed vectors
         unsafe fn search(
             self: Pin<&mut Index>,
             v: &[f32],
             k: usize,
             ids: *mut i32,
             distances: *mut f32,
         );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
unsafe fn search(
self: Pin<&mut Index>,
v: &[f32],
k: usize,
ids: *mut i32,
distances: *mut f32,
);
/// # Safety
///
/// Caller must ensure:
/// - `ids` points to a buffer of size >= k
/// - `distances` points to a buffer of size >= k
/// - Both buffers are properly aligned
/// - Buffers remain valid for the duration of the call
/// - k > 0 and k <= number of indexed vectors
unsafe fn search(
self: Pin<&mut Index>,
v: &[f32],
k: usize,
ids: *mut i32,
distances: *mut f32,
);

Comment on lines 124 to 150
pub fn set_qbg_construction_parameters(
&mut self,
extended_dimension: usize,
dimension: usize,
number_of_subvectors: usize,
number_of_blobs: usize,
internal_data_type: i32,
data_type: i32,
distance_type: i32,
) {
self.inner
.pin_mut()
.set_extended_dimension(extended_dimension);
self.inner.pin_mut().set_dimension(dimension);
self.inner
.pin_mut()
.set_number_of_subvectors(number_of_subvectors);
self.inner.pin_mut().set_number_of_blobs(number_of_blobs);
self.inner
.pin_mut()
.set_internal_data_type(internal_data_type);
self.inner.pin_mut().set_data_type(data_type);
self.inner.pin_mut().set_distance_type(distance_type);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for construction parameters

The set_qbg_construction_parameters method accepts parameters without validation. Consider adding checks for:

  • Non-zero dimensions
  • Valid relationships between dimensions and subvectors
  • Valid enum values for data types and distance types

Add validation:

     pub fn set_qbg_construction_parameters(
         &mut self,
         extended_dimension: usize,
         dimension: usize,
         number_of_subvectors: usize,
         number_of_blobs: usize,
         internal_data_type: i32,
         data_type: i32,
         distance_type: i32,
     ) {
+        if dimension == 0 {
+            panic!("dimension must be greater than 0");
+        }
+        if number_of_subvectors == 0 || number_of_subvectors > dimension {
+            panic!("invalid number_of_subvectors");
+        }
+        // Add more validation as needed
         self.inner
             .pin_mut()
             .set_extended_dimension(extended_dimension);

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 423 to 528
// Search
let mut ids: Vec<i32> = vec![0; K];
let mut distances: Vec<f32> = vec![0.0; K];
let mut vec = vec![0.0; DIMENSION];
for i in 0..DIMENSION {
vec[i] = i as f32
}
unsafe {
index.pin_mut().search(
vec.as_slice(),
K,
&mut ids[0] as *mut i32,
&mut distances[0] as *mut f32,
)
};
println!("ids:\n{:?}", ids);
println!("distances:\n{:?}", distances);
index.pin_mut().close_index();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance search result validation in tests

The test prints search results without verifying their correctness. Add assertions to validate:

  • Result ordering (by distance)
  • Distance values are non-negative and properly ordered
  • IDs are within valid range

Add validation:

         unsafe {
             index.pin_mut().search(
                 vec.as_slice(),
                 K,
                 &mut ids[0] as *mut i32,
                 &mut distances[0] as *mut f32,
             )
         };
-        println!("ids:\n{:?}", ids);
-        println!("distances:\n{:?}", distances);
+        // Validate results
+        assert!(ids.iter().all(|&id| id >= 0 && id < 100), "Invalid IDs in results");
+        assert!(distances.windows(2).all(|w| w[0] <= w[1]), "Distances not properly ordered");
+        assert!(distances.iter().all(|&d| d >= 0.0), "Negative distances found");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Search
let mut ids: Vec<i32> = vec![0; K];
let mut distances: Vec<f32> = vec![0.0; K];
let mut vec = vec![0.0; DIMENSION];
for i in 0..DIMENSION {
vec[i] = i as f32
}
unsafe {
index.pin_mut().search(
vec.as_slice(),
K,
&mut ids[0] as *mut i32,
&mut distances[0] as *mut f32,
)
};
println!("ids:\n{:?}", ids);
println!("distances:\n{:?}", distances);
index.pin_mut().close_index();
// Search
let mut ids: Vec<i32> = vec![0; K];
let mut distances: Vec<f32> = vec![0.0; K];
let mut vec = vec![0.0; DIMENSION];
for i in 0..DIMENSION {
vec[i] = i as f32
}
unsafe {
index.pin_mut().search(
vec.as_slice(),
K,
&mut ids[0] as *mut i32,
&mut distances[0] as *mut f32,
)
};
// Validate results
assert!(ids.iter().all(|&id| id >= 0 && id < 100), "Invalid IDs in results");
assert!(distances.windows(2).all(|w| w[0] <= w[1]), "Distances not properly ordered");
assert!(distances.iter().all(|&d| d >= 0.0), "Negative distances found");
index.pin_mut().close_index();

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6de8a6a and 6e5aa96.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • rust/Cargo.toml
  • rust/libs/algorithm/Cargo.toml
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
  • rust/libs/algorithms/qbg/src/input.h
  • rust/libs/algorithms/qbg/src/lib.rs
🔇 Additional comments (1)
rust/libs/algorithms/qbg/src/input.cpp (1)

335-348: LGTM! Well-designed factory functions.

The factory functions are well-implemented:

  • Use of std::unique_ptr ensures proper memory management
  • Clear and concise interface for object creation
  • Follow modern C++ practices

Comment on lines +41 to +60
void Property::set_qbg_construction_parameters(
rust::usize extended_dimension,
rust::usize dimension,
rust::usize number_of_subvectors,
rust::usize number_of_blobs,
rust::i32 internal_data_type,
rust::i32 data_type,
rust::i32 distance_type)
{
qbg_initialize_construction_parameters(qbg_construction_parameters);
qbg_construction_parameters->extended_dimension = extended_dimension;
qbg_construction_parameters->dimension = dimension;
qbg_construction_parameters->number_of_subvectors = number_of_subvectors;
qbg_construction_parameters->number_of_blobs = number_of_blobs;
qbg_construction_parameters->internal_data_type = internal_data_type;
qbg_construction_parameters->data_type = data_type;
qbg_construction_parameters->distance_type = distance_type;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add comprehensive parameter validation in bulk parameter setters.

The bulk parameter setters set_qbg_construction_parameters and set_qbg_build_parameters should validate their input parameters before setting them. This helps catch invalid configurations early.

Example implementation for set_qbg_construction_parameters:

 void Property::set_qbg_construction_parameters(
     rust::usize extended_dimension,
     rust::usize dimension,
     rust::usize number_of_subvectors,
     rust::usize number_of_blobs,
     rust::i32 internal_data_type,
     rust::i32 data_type,
     rust::i32 distance_type)
 {
+    if (dimension == 0) {
+        throw std::invalid_argument("dimension must be positive");
+    }
+    if (number_of_subvectors == 0) {
+        throw std::invalid_argument("number_of_subvectors must be positive");
+    }
+    if (number_of_blobs == 0) {
+        throw std::invalid_argument("number_of_blobs must be positive");
+    }
+    if (extended_dimension < dimension) {
+        throw std::invalid_argument("extended_dimension must be >= dimension");
+    }
     qbg_initialize_construction_parameters(qbg_construction_parameters);
     // ... rest of the method
 }

Also applies to: 105-136

Comment on lines +230 to +236
Index::~Index()
{
// close_index();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Uncomment and implement the destructor.

The destructor is currently commented out, which will lead to resource leaks. The index needs to be properly closed when the object is destroyed.

Apply this fix:

 Index::~Index()
 {
-    // close_index();
+    close_index();
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Index::~Index()
{
// close_index();
}
Index::~Index()
{
close_index();
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)
rust/libs/algorithms/qbg/src/input.h (2)

23-30: Add const correctness to getter methods.

The getter methods should be marked as const since they don't modify the object's state.

Apply this diff to improve const correctness:

-    rust::u32 get_id() { return id; }
-    rust::f32 get_distance() { return distance; }
+    rust::u32 get_id() const { return id; }
+    rust::f32 get_distance() const { return distance; }

59-75: Improve parameter documentation for set_qbg_build_parameters.

The method has many parameters with only partial documentation. Consider adding detailed parameter descriptions or using a parameter struct for better organization.

Example documentation format:

/**
 * Sets the QBG build parameters
 * @param init_mode Hierarchical clustering initialization mode
 * @param first_objects Number of objects for first level
 * ...
 */
void set_qbg_build_parameters(...);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6e5aa96 and de84cf0.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • rust/Cargo.toml
  • rust/libs/algorithm/Cargo.toml
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
  • rust/libs/algorithms/qbg/src/lib.rs
🔇 Additional comments (5)
rust/libs/algorithms/qbg/src/input.h (4)

1-22: LGTM! License and includes are properly set up.

The license header is complete, and all necessary includes are present.


94-94: Replace void *index with a specific pointer type.

Using void * reduces type safety and can lead to errors.


112-114: LGTM! Factory functions are well-designed.

The use of std::unique_ptr for factory functions follows modern C++ best practices and ensures proper resource management.


1-114: Verify Rust FFI integration and usage patterns.

Let's ensure the Rust side correctly uses this C++ interface.

✅ Verification successful

FFI integration is properly implemented and safe

The C++ interface is correctly integrated with Rust through CXX bridge with proper memory management:

  • Uses cxx::bridge for safe FFI bindings
  • Proper use of UniquePtr for C++ object lifetime management
  • Safe wrapper implementations in Rust modules (property and index)
  • Comprehensive test coverage validating the integration
  • Type-safe conversions between Rust and C++ types
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check Rust FFI usage patterns

# Look for Rust FFI bindings
rg -t rust "extern \"C\"" rust/libs/algorithms/qbg/src/

# Look for cxx bridge usage
rg -t rust "#\[cxx::bridge\]" rust/libs/algorithms/qbg/src/

# Check for potential memory management patterns
rg -t rust "Box::into_raw|from_raw" rust/libs/algorithms/qbg/src/

Length of output: 236


Script:

#!/bin/bash
# Let's examine the Rust bridge implementation and usage patterns

# Check the Rust lib.rs content to see the bridge implementation
cat rust/libs/algorithms/qbg/src/lib.rs

# Look for any other Rust files that might use this interface
fd -e rs . rust/libs/algorithms/qbg/src/

# Look for tests that might show usage patterns
fd -e rs . rust/libs/algorithms/qbg/tests/

Length of output: 20764

rust/libs/algorithms/qbg/src/input.cpp (1)

333-346: LGTM! Well-implemented factory functions

The factory functions properly use smart pointers and modern C++ practices for memory management.

Comment on lines +34 to +35
QBGConstructionParameters *qbg_construction_parameters;
QBGBuildParameters *qbg_build_parameters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use smart pointers instead of raw pointers.

Raw pointers are error-prone and make memory management difficult. Consider using std::unique_ptr for better memory safety.

Apply this diff:

-    QBGConstructionParameters *qbg_construction_parameters;
-    QBGBuildParameters *qbg_build_parameters;
+    std::unique_ptr<QBGConstructionParameters> qbg_construction_parameters;
+    std::unique_ptr<QBGBuildParameters> qbg_build_parameters;

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +153 to +158
void Property::set_number_of_second_objects(rust::u32 number_of_second_objects)
{
qbg_build_parameters->number_of_second_objects = number_of_second_objects;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential integer overflow risk in number_of_second_objects parameter type

The method accepts rust::u32 while other similar methods use rust::usize. This inconsistency could lead to integer overflow issues on 64-bit systems when handling large datasets.

-void Property::set_number_of_second_objects(rust::u32 number_of_second_objects)
+void Property::set_number_of_second_objects(rust::usize number_of_second_objects)
 {
     qbg_build_parameters->number_of_second_objects = number_of_second_objects;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void Property::set_number_of_second_objects(rust::u32 number_of_second_objects)
{
qbg_build_parameters->number_of_second_objects = number_of_second_objects;
}
void Property::set_number_of_second_objects(rust::usize number_of_second_objects)
{
qbg_build_parameters->number_of_second_objects = number_of_second_objects;
}

Comment on lines 250 to 268
void Index::build_index(const rust::String &path, Property &p)
{
NGTError err = ngt_create_error_object();
std::string cpath(path);
bool ok = qbg_build_index(cpath.c_str(), p.get_qbg_build_parameters(), err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
ngt_destroy_error_object(err);
return;
}
ngt_destroy_error_object(err);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error propagation in build_index

The method silently fails on error, making it difficult for callers to handle failures appropriately. Consider throwing an exception or returning a status code.

-void Index::build_index(const rust::String &path, Property &p)
+bool Index::build_index(const rust::String &path, Property &p)
 {
     NGTError err = ngt_create_error_object();
     std::string cpath(path);
     bool ok = qbg_build_index(cpath.c_str(), p.get_qbg_build_parameters(), err);
     if (!ok)
     {
         std::cerr << "Error: " << __func__ << std::endl;
         std::cerr << ngt_get_error_string(err) << std::endl;
         ngt_destroy_error_object(err);
-        return;
+        return false;
     }
     ngt_destroy_error_object(err);
+    return true;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void Index::build_index(const rust::String &path, Property &p)
{
NGTError err = ngt_create_error_object();
std::string cpath(path);
bool ok = qbg_build_index(cpath.c_str(), p.get_qbg_build_parameters(), err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
ngt_destroy_error_object(err);
return;
}
ngt_destroy_error_object(err);
}
bool Index::build_index(const rust::String &path, Property &p)
{
NGTError err = ngt_create_error_object();
std::string cpath(path);
bool ok = qbg_build_index(cpath.c_str(), p.get_qbg_build_parameters(), err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
ngt_destroy_error_object(err);
return false;
}
ngt_destroy_error_object(err);
return true;
}

Comment on lines 299 to 356
std::unique_ptr<std::vector<SearchResult>> Index::search(rust::Slice<const rust::f32> v, rust::usize k)
{
QBGQuery query;
qbg_initialize_query(&query);
std::vector<float> vec(v.begin(), v.end());
query.query = vec.data();

NGTError err = ngt_create_error_object();
NGTObjectDistances results = ngt_create_empty_results(err);
bool ok = qbg_search_index(index, query, results, err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
qbg_destroy_results(results);
ngt_destroy_error_object(err);
return nullptr;
}

size_t rsize = qbg_get_result_size(results, err);
size_t limit = std::min(k, rsize);
std::vector<SearchResult> searchResults;
for (size_t i = 0; i < limit; i++)
{
NGTObjectDistance obj = qbg_get_result(results, i, err);
searchResults.push_back(SearchResult{obj.id, obj.distance});
}

qbg_destroy_results(results);
ngt_destroy_error_object(err);

return std::make_unique<std::vector<SearchResult>>(searchResults);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential resource leak in search method

The error handling in the search method could leak resources if qbg_get_result fails. Consider using RAII or ensuring cleanup in all error paths.

 std::unique_ptr<std::vector<SearchResult>> Index::search(rust::Slice<const rust::f32> v, rust::usize k)
 {
     QBGQuery query;
     qbg_initialize_query(&query);
     std::vector<float> vec(v.begin(), v.end());
     query.query = vec.data();

     NGTError err = ngt_create_error_object();
     NGTObjectDistances results = ngt_create_empty_results(err);
+    // Use RAII to ensure cleanup
+    struct ResultsGuard {
+        NGTObjectDistances results;
+        NGTError err;
+        ~ResultsGuard() {
+            qbg_destroy_results(results);
+            ngt_destroy_error_object(err);
+        }
+    } guard{results, err};

     bool ok = qbg_search_index(index, query, results, err);
     if (!ok)
     {
         std::cerr << "Error: " << __func__ << std::endl;
         std::cerr << ngt_get_error_string(err) << std::endl;
-        qbg_destroy_results(results);
-        ngt_destroy_error_object(err);
         return nullptr;
     }

-    size_t rsize = qbg_get_result_size(results, err);
+    size_t rsize = qbg_get_result_size(guard.results, err);
     size_t limit = std::min(k, rsize);
     std::vector<SearchResult> searchResults;
     for (size_t i = 0; i < limit; i++)
     {
-        NGTObjectDistance obj = qbg_get_result(results, i, err);
+        NGTObjectDistance obj = qbg_get_result(guard.results, i, guard.err);
         searchResults.push_back(SearchResult{obj.id, obj.distance});
     }

-    qbg_destroy_results(results);
-    ngt_destroy_error_object(err);

     return std::make_unique<std::vector<SearchResult>>(searchResults);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::unique_ptr<std::vector<SearchResult>> Index::search(rust::Slice<const rust::f32> v, rust::usize k)
{
QBGQuery query;
qbg_initialize_query(&query);
std::vector<float> vec(v.begin(), v.end());
query.query = vec.data();
NGTError err = ngt_create_error_object();
NGTObjectDistances results = ngt_create_empty_results(err);
bool ok = qbg_search_index(index, query, results, err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
qbg_destroy_results(results);
ngt_destroy_error_object(err);
return nullptr;
}
size_t rsize = qbg_get_result_size(results, err);
size_t limit = std::min(k, rsize);
std::vector<SearchResult> searchResults;
for (size_t i = 0; i < limit; i++)
{
NGTObjectDistance obj = qbg_get_result(results, i, err);
searchResults.push_back(SearchResult{obj.id, obj.distance});
}
qbg_destroy_results(results);
ngt_destroy_error_object(err);
return std::make_unique<std::vector<SearchResult>>(searchResults);
}
std::unique_ptr<std::vector<SearchResult>> Index::search(rust::Slice<const rust::f32> v, rust::usize k)
{
QBGQuery query;
qbg_initialize_query(&query);
std::vector<float> vec(v.begin(), v.end());
query.query = vec.data();
NGTError err = ngt_create_error_object();
NGTObjectDistances results = ngt_create_empty_results(err);
// Use RAII to ensure cleanup
struct ResultsGuard {
NGTObjectDistances results;
NGTError err;
~ResultsGuard() {
qbg_destroy_results(results);
ngt_destroy_error_object(err);
}
} guard{results, err};
bool ok = qbg_search_index(index, query, results, err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
return nullptr;
}
size_t rsize = qbg_get_result_size(guard.results, err);
size_t limit = std::min(k, rsize);
std::vector<SearchResult> searchResults;
for (size_t i = 0; i < limit; i++)
{
NGTObjectDistance obj = qbg_get_result(guard.results, i, guard.err);
searchResults.push_back(SearchResult{obj.id, obj.distance});
}
return std::make_unique<std::vector<SearchResult>>(searchResults);
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
rust/libs/algorithms/qbg/src/lib.rs (1)

96-102: Add safety documentation for the search function.

The search function handles raw data and should include documentation about:

  • Valid parameter ranges (k, radius, epsilon)
  • Input vector requirements (size, alignment)
  • Performance implications

Add documentation:

/// Searches the index for nearest neighbors.
///
/// # Arguments
/// * `v` - Query vector. Must have the same dimension as indexed vectors
/// * `k` - Number of nearest neighbors to return. Must be > 0
/// * `radius` - Search radius. Use 0.0 for unlimited radius
/// * `epsilon` - Error tolerance for approximate search
///
/// # Returns
/// Vector of (id, distance) pairs for the k nearest neighbors
fn search(
    self: Pin<&mut Index>,
    v: &[f32],
    k: usize,
    radius: f32,
    epsilon: f32,
) -> UniquePtr<CxxVector<SearchResult>>;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9abde97 and cd17301.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • rust/Cargo.toml
  • rust/libs/algorithm/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/src/input.h
🔇 Additional comments (4)
rust/libs/algorithms/qbg/src/input.cpp (3)

19-31: LGTM! Constructor and destructor properly manage memory.

The Property class correctly handles memory allocation in the constructor and cleanup in the destructor.


233-236: ⚠️ Potential issue

Uncomment and implement the destructor.

The destructor is commented out, which could lead to resource leaks. The index needs to be properly closed when the object is destroyed.

 Index::~Index()
 {
-    // close_index();
+    close_index();
 }

Likely invalid or redundant comment.


155-158: ⚠️ Potential issue

Fix type inconsistency in number_of_second_objects parameter.

The method uses rust::u32 while other similar methods use rust::usize. This inconsistency could lead to integer overflow issues on 64-bit systems when handling large datasets.

-void Property::set_number_of_second_objects(rust::u32 number_of_second_objects)
+void Property::set_number_of_second_objects(rust::usize number_of_second_objects)
 {
     qbg_build_parameters->number_of_second_objects = number_of_second_objects;
 }

Likely invalid or redundant comment.

rust/libs/algorithms/qbg/src/lib.rs (1)

461-475: 🛠️ Refactor suggestion

Add assertions to verify search results.

The test prints search results without verifying their correctness. Add assertions to validate:

  • Result ordering (by distance)
  • Distance values are non-negative
  • Number of results matches k
         let distances: Vec<f32> = search_results
             .pin_mut()
             .into_iter()
             .map(|s| s.get_distance())
             .collect();
-        println!("ids:\n{:?}", ids);
-        println!("distances:\n{:?}", distances);
+        assert_eq!(ids.len(), K, "Wrong number of results");
+        assert!(distances.iter().all(|&d| d >= 0.0), "Negative distances found");
+        assert!(distances.windows(2).all(|w| w[0] <= w[1]), "Results not ordered by distance");

Likely invalid or redundant comment.

Comment on lines +129 to +150
pub fn set_qbg_construction_parameters(
&mut self,
extended_dimension: usize,
dimension: usize,
number_of_subvectors: usize,
number_of_blobs: usize,
internal_data_type: i32,
data_type: i32,
distance_type: i32,
) {
self.inner.pin_mut().set_qbg_construction_parameters(
extended_dimension,
dimension,
number_of_subvectors,
number_of_blobs,
internal_data_type,
data_type,
distance_type,
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for construction parameters.

The set_qbg_construction_parameters method should validate inputs before passing them to C++.

Add validation:

pub fn set_qbg_construction_parameters(
    &mut self,
    extended_dimension: usize,
    dimension: usize,
    number_of_subvectors: usize,
    number_of_blobs: usize,
    internal_data_type: i32,
    data_type: i32,
    distance_type: i32,
) {
    if dimension == 0 {
        panic!("dimension must be positive");
    }
    if number_of_subvectors == 0 || number_of_subvectors > dimension {
        panic!("invalid number_of_subvectors");
    }
    if extended_dimension < dimension {
        panic!("extended_dimension must be >= dimension");
    }
    self.inner.pin_mut().set_qbg_construction_parameters(
        extended_dimension,
        dimension,
        number_of_subvectors,
        number_of_blobs,
        internal_data_type,
        data_type,
        distance_type,
    )
}

Comment on lines 321 to 330
pub fn new(path: &String, p: &mut property::Property) -> Result<Self, cxx::Exception> {
let inner = match ffi::new_index(path, p.get_property()) {
Ok(inner) => inner,
Err(e) => panic!("{}", e),
};
Ok(Index { inner })
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in new method.

Using panic! for error handling is not idiomatic Rust. Consider propagating the error instead.

-        let inner = match ffi::new_index(path, p.get_property()) {
-            Ok(inner) => inner,
-            Err(e) => panic!("{}", e),
-        };
+        let inner = ffi::new_index(path, p.get_property())?;
         Ok(Index { inner })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn new(path: &String, p: &mut property::Property) -> Result<Self, cxx::Exception> {
let inner = match ffi::new_index(path, p.get_property()) {
Ok(inner) => inner,
Err(e) => panic!("{}", e),
};
Ok(Index { inner })
}
pub fn new(path: &String, p: &mut property::Property) -> Result<Self, cxx::Exception> {
let inner = ffi::new_index(path, p.get_property())?;
Ok(Index { inner })
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
rust/libs/algorithms/qbg/src/lib.rs (1)

400-720: Enhance test coverage and assertions

The test implementations primarily verify basic functionality but lack comprehensive assertions and error case testing.

Add the following improvements to tests:

  1. Verify search result ordering:
// In test_index()
let search_results = index.search(vec.as_slice(), K, RADIUS, EPSILON)?;
assert!(
    search_results.windows(2).all(|w| w[0].1 <= w[1].1),
    "Search results should be ordered by distance"
);
  1. Add error case testing:
#[test]
fn test_index_error_cases() -> Result<()> {
    let mut index = Index::new("nonexistent_path", &mut Property::new())?;
    assert!(index.get_object(999999).is_err(), "Should error on invalid ID");
    Ok(())
}
  1. Test parameter validation:
#[test]
#[should_panic(expected = "dimension must be positive")]
fn test_invalid_dimension() {
    let mut p = Property::new();
    p.set_qbg_construction_parameters(1, 0, 1, 1, 1, 1, 1);
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between cd17301 and c7fce8b.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • rust/Cargo.toml
  • rust/libs/algorithm/Cargo.toml
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
  • rust/libs/algorithms/qbg/src/input.h
👮 Files not reviewed due to content moderation or server errors (1)
  • rust/libs/algorithms/qbg/src/input.cpp
🔇 Additional comments (1)
rust/libs/algorithms/qbg/src/lib.rs (1)

131-150: 🛠️ Refactor suggestion

Add input validation for construction parameters

The set_qbg_construction_parameters method accepts parameters without validation, which could lead to undefined behavior in the C++ code.

 pub fn set_qbg_construction_parameters(
     &mut self,
     extended_dimension: usize,
     dimension: usize,
     number_of_subvectors: usize,
     number_of_blobs: usize,
     internal_data_type: i32,
     data_type: i32,
     distance_type: i32,
 ) {
+    if dimension == 0 {
+        panic!("dimension must be positive");
+    }
+    if number_of_subvectors == 0 || number_of_subvectors > dimension {
+        panic!("invalid number_of_subvectors");
+    }
+    if extended_dimension < dimension {
+        panic!("extended_dimension must be >= dimension");
+    }
     self.inner.pin_mut().set_qbg_construction_parameters(
         extended_dimension,
         dimension,
         number_of_subvectors,
         number_of_blobs,
         internal_data_type,
         data_type,
         distance_type,
     )
 }

Likely invalid or redundant comment.

Comment on lines +18 to +105
unsafe extern "C++" {
include!("qbg/src/input.h");

type Property;
fn new_property() -> UniquePtr<Property>;
fn init_qbg_construction_parameters(self: Pin<&mut Property>);
fn set_qbg_construction_parameters(
self: Pin<&mut Property>,
extended_dimension: usize,
dimension: usize,
number_of_subvectors: usize,
number_of_blobs: usize,
internal_data_type: i32,
data_type: i32,
distance_type: i32,
);
fn set_extended_dimension(self: Pin<&mut Property>, extended_dimension: usize);
fn set_dimension(self: Pin<&mut Property>, dimension: usize);
fn set_number_of_subvectors(self: Pin<&mut Property>, number_of_subvectors: usize);
fn set_number_of_blobs(self: Pin<&mut Property>, number_of_blobs: usize);
fn set_internal_data_type(self: Pin<&mut Property>, internal_data_type: i32);
fn set_data_type(self: Pin<&mut Property>, data_type: i32);
fn set_distance_type(self: Pin<&mut Property>, distance_type: i32);
fn init_qbg_build_parameters(self: Pin<&mut Property>);
fn set_qbg_build_parameters(
self: Pin<&mut Property>,
hierarchical_clustering_init_mode: i32,
number_of_first_objects: usize,
number_of_first_clusters: usize,
number_of_second_objects: usize,
number_of_second_clusters: usize,
number_of_third_clusters: usize,
number_of_objects: usize,
number_of_subvectors: usize,
optimization_clustering_init_mode: i32,
rotation_iteration: usize,
subvector_iteration: usize,
number_of_matrices: usize,
rotation: bool,
repositioning: bool,
);
fn set_hierarchical_clustering_init_mode(
self: Pin<&mut Property>,
hierarchical_clustering_init_mode: i16,
);
fn set_number_of_first_objects(self: Pin<&mut Property>, number_of_first_objects: usize);
fn set_number_of_first_clusters(self: Pin<&mut Property>, number_of_first_clusters: usize);
fn set_number_of_second_objects(self: Pin<&mut Property>, number_of_second_objects: u32);
fn set_number_of_second_clusters(
self: Pin<&mut Property>,
number_of_second_clusters: usize,
);
fn set_number_of_third_clusters(self: Pin<&mut Property>, number_of_third_clusters: usize);
fn set_number_of_objects(self: Pin<&mut Property>, number_of_objects: usize);
fn set_number_of_subvectors_for_bp(self: Pin<&mut Property>, number_of_subvectors: usize);
fn set_optimization_clustering_init_mode(
self: Pin<&mut Property>,
optimization_clustering_init_mode: i32,
);
fn set_rotation_iteration(self: Pin<&mut Property>, rotation_iteration: usize);
fn set_subvector_iteration(self: Pin<&mut Property>, subvector_iteration: usize);
fn set_number_of_matrices(self: Pin<&mut Property>, number_of_matrices: usize);
fn set_rotation(self: Pin<&mut Property>, rotation: bool);
fn set_repositioning(self: Pin<&mut Property>, repositioning: bool);

type SearchResult;
fn get_id(self: Pin<&mut SearchResult>) -> u32;
fn get_distance(self: Pin<&mut SearchResult>) -> f32;

type Index;
fn new_index(path: &String, p: Pin<&mut Property>) -> Result<UniquePtr<Index>>;
fn new_prebuilt_index(path: &String, p: bool) -> Result<UniquePtr<Index>>;
fn open_index(self: Pin<&mut Index>, path: &String, prebuilt: bool) -> Result<()>;
fn build_index(self: Pin<&mut Index>, path: &String, p: Pin<&mut Property>) -> Result<()>;
fn save_index(self: Pin<&mut Index>) -> Result<()>;
fn close_index(self: Pin<&mut Index>);
fn append(self: Pin<&mut Index>, v: &[f32]) -> Result<i32>;
fn remove(self: Pin<&mut Index>, id: usize) -> Result<()>;
fn search(
self: Pin<&mut Index>,
v: &[f32],
k: usize,
radius: f32,
epsilon: f32,
) -> UniquePtr<CxxVector<SearchResult>>;
fn get_object(self: Pin<&mut Index>, id: usize) -> Result<*mut f32>;
fn get_dimension(self: Pin<&mut Index>) -> Result<usize>;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add safety documentation for FFI interface

The FFI interface uses unsafe code and raw pointers but lacks documentation about safety requirements. This is particularly important for functions like get_object that return raw pointers.

Add safety documentation for FFI functions, especially:

/// # Safety
///
/// Caller must ensure:
/// - The index is properly initialized
/// - The id is valid and within bounds
/// - The returned pointer must not outlive the Index instance
/// - The dimension returned by get_dimension() matches the actual object size
unsafe extern "C++" {
    // ... existing declarations ...
}

Comment on lines +383 to +392
pub fn get_object(&mut self, id: usize) -> Result<&[f32], cxx::Exception> {
let dim = match self.inner.pin_mut().get_dimension() {
Ok(dim) => dim,
Err(e) => return Err(e),
};
match self.inner.pin_mut().get_object(id) {
Ok(v) => Ok(unsafe { slice::from_raw_parts(v, dim) }),
Err(e) => Err(e),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Document safety requirements for unsafe code

The get_object method uses unsafe code to create a slice from a raw pointer without clear documentation of safety requirements.

 pub fn get_object(&mut self, id: usize) -> Result<&[f32], cxx::Exception> {
+    // SAFETY: We ensure safety by:
+    // 1. Getting the dimension first to ensure valid size
+    // 2. Checking that the pointer is non-null (done by Result)
+    // 3. The slice lifetime is tied to self, ensuring the underlying data remains valid
     let dim = match self.inner.pin_mut().get_dimension() {
         Ok(dim) => dim,
         Err(e) => return Err(e),
     };
     match self.inner.pin_mut().get_object(id) {
         Ok(v) => Ok(unsafe { slice::from_raw_parts(v, dim) }),
         Err(e) => Err(e),
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn get_object(&mut self, id: usize) -> Result<&[f32], cxx::Exception> {
let dim = match self.inner.pin_mut().get_dimension() {
Ok(dim) => dim,
Err(e) => return Err(e),
};
match self.inner.pin_mut().get_object(id) {
Ok(v) => Ok(unsafe { slice::from_raw_parts(v, dim) }),
Err(e) => Err(e),
}
}
pub fn get_object(&mut self, id: usize) -> Result<&[f32], cxx::Exception> {
// SAFETY: We ensure safety by:
// 1. Getting the dimension first to ensure valid size
// 2. Checking that the pointer is non-null (done by Result)
// 3. The slice lifetime is tied to self, ensuring the underlying data remains valid
let dim = match self.inner.pin_mut().get_dimension() {
Ok(dim) => dim,
Err(e) => return Err(e),
};
match self.inner.pin_mut().get_object(id) {
Ok(v) => Ok(unsafe { slice::from_raw_parts(v, dim) }),
Err(e) => Err(e),
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants