-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] rust-agent-qbg #2751
base: main
Are you sure you want to change the base?
[Draft] rust-agent-qbg #2751
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThe changes in this pull request introduce a new library, Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
[CHATOPS:HELP] ChatOps commands.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
🧹 Outside diff range and nitpick comments (10)
rust/libs/algorithm/Cargo.toml (1)
25-25
: Documentation needed for QBG integrationSince this is a draft PR introducing a new algorithm component, please provide:
- Description of what QBG is and its purpose
- Integration requirements and dependencies
- Any performance implications or considerations
Consider adding these details to:
- PR description
- README in the QBG module
- Code documentation
rust/libs/algorithms/qbg/src/input.h (2)
32-39
: Add parameter names toset_qbg_construction_parameters
for clarityIncluding parameter names in the function declaration enhances readability and helps developers understand the purpose of each parameter.
49-65
: Include parameter names inset_qbg_build_parameters
functionProviding parameter names in the function declaration improves code comprehension and maintainability.
rust/libs/algorithms/qbg/src/lib.rs (5)
59-62
: Inconsistent Integer Types inset_hierarchical_clustering_init_mode
The function
set_hierarchical_clustering_init_mode
usesi16
for the parameterhierarchical_clustering_init_mode
, whereas similar parameters in other functions usei32
. Consider using consistent integer types to avoid potential issues.
63-65
: Type Mismatch inset_number_of_second_objects
The parameter
number_of_second_objects
inset_number_of_second_objects
is of typeu32
, while other similar parameters useusize
. For consistency and to prevent potential bugs, consider usingusize
here as well.
109-130
: Use Meaningful Values in Test Property SettersIn the test
test_qbg
, many property setters are assigned the value1
. Using more realistic and diverse values can improve the effectiveness of the test by covering a wider range of scenarios.
184-185
: Simplify Return Statements in Test FunctionsThe lines
return Ok(());
can be simplified to justOk(())
for brevity and idiomatic Rust code.Also applies to: 216-217
84-85
: Clarify Parameter Naming and Types innew_index
andnew_prebuilt_index
The
new_index
function takes aPin<&mut Property>
as the second parameter namedp
, whilenew_prebuilt_index
takes abool
namedp
. This overloading ofp
with different types may cause confusion. Consider renaming the boolean parameter to something more descriptive, likeprebuilt
, for clarity.rust/libs/algorithms/qbg/src/input.cpp (2)
315-328
: Missing Error Handling in Factory FunctionsThe factory functions
new_index
andnew_prebuilt_index
do not handle exceptions that may be thrown during object creation, potentially leading to unhandled exceptions.Consider using exception handling to capture and report any errors during object creation:
std::unique_ptr<Index> new_index(const rust::String &path, Property &p) { try { return std::make_unique<Index>(path, p); } catch (const std::exception &e) { // Handle exception, possibly log the error std::cerr << "Error creating Index: " << e.what() << std::endl; return nullptr; } }
227-238
: Use ofconst char*
Instead ofstd::string
inopen_index
Passing
cpath.c_str()
may lead to issues ifcpath
goes out of scope beforec_str()
is used.Since
cpath
is a local variable, it's safe here, but consider passingstd::string
directly if the underlying function accepts it, to reduce unnecessary conversions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
rust/Cargo.toml
(1 hunks)rust/libs/algorithm/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/build.rs
(1 hunks)rust/libs/algorithms/qbg/src/input.cpp
(1 hunks)rust/libs/algorithms/qbg/src/input.h
(1 hunks)rust/libs/algorithms/qbg/src/lib.rs
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- rust/libs/algorithms/qbg/Cargo.toml
- rust/libs/algorithms/qbg/build.rs
🔇 Additional comments (4)
rust/Cargo.toml (2)
24-24
: LGTM! The workspace member addition follows project conventions.
The new qbg algorithm library is properly placed alongside other algorithm implementations (ngt, faiss).
24-24
: Verify qbg library setup
Let's verify that the qbg library directory exists and has the required Cargo.toml configuration.
✅ Verification successful
QBG library setup is properly configured
The verification shows that the qbg library is correctly set up with:
- Directory structure present at
rust/libs/algorithms/qbg/
- Required files:
Cargo.toml
,build.rs
, andsrc/
directory - Proper Cargo.toml configuration with:
- Package metadata
- Required dependencies (anyhow, cxx)
- Build dependencies (cxx-build, miette)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify qbg library structure and configuration
# Check if the directory exists
if [ ! -d "rust/libs/algorithms/qbg" ]; then
echo "Error: qbg directory does not exist at expected path"
exit 1
fi
# Check for Cargo.toml and essential files
echo "Checking qbg library structure:"
ls -la rust/libs/algorithms/qbg/
# Display Cargo.toml contents if it exists
if [ -f "rust/libs/algorithms/qbg/Cargo.toml" ]; then
echo -e "\nQBG Cargo.toml contents:"
cat rust/libs/algorithms/qbg/Cargo.toml
else
echo "Error: Cargo.toml not found in qbg directory"
fi
Length of output: 1461
rust/libs/algorithm/Cargo.toml (1)
25-25
: Verify QBG integration path and dependencies
The dependency path looks consistent with other algorithm dependencies, but let's verify the integration.
✅ Verification successful
QBG dependency path and integration verified
The QBG integration appears to be properly structured:
- QBG directory exists at the correct path (
rust/libs/algorithms/qbg
) - The library implements a C++ bridge for QBG index construction and search
- Proper license headers and version metadata are present
- No direct usage in algorithm lib yet, which is expected for a draft PR
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify QBG integration
# Check if qbg directory exists
echo "Checking qbg directory..."
if [ -d "rust/libs/algorithms/qbg" ]; then
echo "✓ QBG directory exists"
else
echo "✗ QBG directory not found"
fi
# Check for other potential integration points
echo -e "\nChecking for other QBG references..."
rg -l "qbg" --type rust
Length of output: 336
Script:
#!/bin/bash
# Let's check the content of lib.rs to understand the integration points
echo "=== Content of QBG lib.rs ==="
cat rust/libs/algorithms/qbg/src/lib.rs
echo -e "\n=== Checking Cargo.toml for metadata and description ==="
cat rust/libs/algorithms/qbg/Cargo.toml
echo -e "\n=== Checking for any usage in algorithm lib ==="
rg "qbg" rust/libs/algorithm/src/ -A 2 -B 2
Length of output: 9834
rust/libs/algorithms/qbg/src/input.cpp (1)
273-273
: Incorrect Use of v.length()
in append
The use of v.length()
may be incorrect if v
does not have a length()
method or if it does not return the expected size.
Verify that v.length()
correctly returns the number of elements in the slice. If not, use v.size()
or adjust accordingly.
Run the following script to check for the correct method to retrieve the size of rust::Slice
:
Ensure that the correct method is consistently used across the codebase.
✅ Verification successful
Let's search for actual usage patterns of rust::Slice in the codebase to understand how it's being used and what methods are available. I've also included a search for the specific vector construction pattern to see the surrounding context.
Use of v.length()
is correct for rust::Slice
Looking at the codebase usage, v.length()
is the correct method for rust::Slice
. This is evidenced by its usage in qbg/src/input.cpp
and supported by the fact that rust::Slice
is a Rust-C++ bridge type where the Rust convention for getting collection size is length()
. The code correctly constructs a vector from the slice's iterators and uses length()
to get its size.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for the correct method to get size from rust::Slice in the codebase.
# Search for usage of rust::Slice methods
rg --type-add 'cpp' --files-with-matches 'rust::Slice' | xargs grep -E 'rust::Slice.*\.(length\(\)|size\(\))' -A 2
Length of output: 172
Script:
#!/bin/bash
# Search for rust::Slice usage patterns in cpp files
rg "rust::Slice" -A 3 -g "*.cpp"
# Also search for the specific line context
rg "std::vector<float>\(.*\.begin\(\),.*\.end\(\)\)" -g "*.cpp" -A 2
Length of output: 2993
d6e423f
to
e70ea8d
Compare
Deploying vald with Cloudflare Pages
|
e70ea8d
to
6de8a6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (2)
rust/libs/algorithms/qbg/src/input.cpp (1)
31-34
: Add const correctness to getter methods.The getter methods should be marked as const since they don't modify the object's state.
-QBGConstructionParameters *Property::get_qbg_construction_parameters() +QBGConstructionParameters *Property::get_qbg_construction_parameters() const { return qbg_construction_parameters; } -QBGBuildParameters *Property::get_qbg_build_parameters() +QBGBuildParameters *Property::get_qbg_build_parameters() const { return qbg_build_parameters; }Also applies to: 95-98
rust/libs/algorithms/qbg/src/lib.rs (1)
362-443
: Add error case testingThe test only covers the happy path. Add tests for error cases:
- Invalid dimensions
- Empty vectors
- Out of bounds K values
- Invalid file paths
Add error case tests:
#[test] fn test_ffi_qbg_errors() -> Result<()> { let path = "invalid/path".to_string(); let mut p = Property::new(); p.init_qbg_construction_parameters(); // Test invalid path assert!(Index::new(&path, &mut p).is_err()); // Test invalid dimensions p.set_dimension(0); assert!(Index::new(&"index".to_string(), &mut p).is_err()); Ok(()) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
rust/Cargo.toml
(1 hunks)rust/libs/algorithm/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/build.rs
(1 hunks)rust/libs/algorithms/qbg/src/input.cpp
(1 hunks)rust/libs/algorithms/qbg/src/input.h
(1 hunks)rust/libs/algorithms/qbg/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- rust/Cargo.toml
- rust/libs/algorithm/Cargo.toml
- rust/libs/algorithms/qbg/Cargo.toml
- rust/libs/algorithms/qbg/build.rs
- rust/libs/algorithms/qbg/src/input.h
🔇 Additional comments (2)
rust/libs/algorithms/qbg/src/input.cpp (2)
230-233
: Uncomment and implement the destructor.
The destructor is currently commented out, which could lead to resource leaks. The index should be properly closed when the object is destroyed.
335-348
: LGTM! Well-implemented factory functions.
The factory functions make good use of modern C++ practices with std::unique_ptr for automatic memory management.
void Property::set_extended_dimension(rust::usize extended_dimension) | ||
{ | ||
qbg_construction_parameters->extended_dimension = extended_dimension; | ||
} | ||
|
||
void Property::set_dimension(rust::usize dimension) | ||
{ | ||
qbg_construction_parameters->dimension = dimension; | ||
} | ||
|
||
void Property::set_number_of_subvectors(rust::usize number_of_subvectors) | ||
{ | ||
qbg_construction_parameters->number_of_subvectors = number_of_subvectors; | ||
} | ||
|
||
void Property::set_number_of_blobs(rust::usize number_of_blobs) | ||
{ | ||
qbg_construction_parameters->number_of_blobs = number_of_blobs; | ||
} | ||
|
||
void Property::set_internal_data_type(rust::i32 internal_data_type) | ||
{ | ||
qbg_construction_parameters->internal_data_type = internal_data_type; | ||
} | ||
|
||
void Property::set_data_type(rust::i32 data_type) | ||
{ | ||
qbg_construction_parameters->data_type = data_type; | ||
} | ||
|
||
void Property::set_distance_type(rust::i32 distance_type) | ||
{ | ||
qbg_construction_parameters->distance_type = distance_type; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add parameter validation in setter methods.
The setter methods for construction parameters should validate their input values to ensure they are within acceptable ranges. For example, dimension and number_of_subvectors should be positive values.
Example implementation for one setter:
void Property::set_dimension(rust::usize dimension)
{
+ if (dimension == 0) {
+ throw std::invalid_argument("dimension must be positive");
+ }
qbg_construction_parameters->dimension = dimension;
}
Committable suggestion skipped: line range outside the PR's diff.
void Index::search( | ||
rust::Slice<const rust::f32> v, | ||
rust::usize k, | ||
rust::i32 *ids, | ||
rust::f32 *distances) | ||
{ | ||
QBGQuery query; | ||
qbg_initialize_query(&query); | ||
std::vector<float> vec(v.begin(), v.end()); | ||
query.query = vec.data(); | ||
|
||
NGTError err = ngt_create_error_object(); | ||
NGTObjectDistances results = ngt_create_empty_results(err); | ||
bool ok = qbg_search_index(index, query, results, err); | ||
if (!ok) | ||
{ | ||
std::cerr << "Error: " << __func__ << std::endl; | ||
std::cerr << ngt_get_error_string(err) << std::endl; | ||
qbg_destroy_results(results); | ||
ngt_destroy_error_object(err); | ||
return; | ||
} | ||
|
||
size_t rsize = qbg_get_result_size(results, err); | ||
size_t limit = std::min(k, rsize); | ||
for (size_t i = 0; i < limit; i++) | ||
{ | ||
NGTObjectDistance obj = qbg_get_result(results, i, err); | ||
ids[i] = obj.id; | ||
distances[i] = obj.distance; | ||
} | ||
|
||
qbg_destroy_results(results); | ||
ngt_destroy_error_object(err); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider thread safety in search method.
The search method uses shared resources (index
) without synchronization. If this method is called from multiple threads, it could lead to race conditions.
Consider adding mutex protection:
+#include <mutex>
+
class Index {
+private:
+ std::mutex search_mutex;
+
public:
void search(
rust::Slice<const rust::f32> v,
rust::usize k,
rust::i32 *ids,
rust::f32 *distances)
{
+ std::lock_guard<std::mutex> lock(search_mutex);
// ... rest of the method
}
};
Committable suggestion skipped: line range outside the PR's diff.
rust::i32 Index::append(rust::Slice<const rust::f32> v) | ||
{ | ||
NGTError err = ngt_create_error_object(); | ||
std::vector<float> vec(v.begin(), v.end()); | ||
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err); | ||
if (id == 0) | ||
{ | ||
std::cerr << ngt_get_error_string(err) << std::endl; | ||
ngt_destroy_error_object(err); | ||
return 0; | ||
} | ||
ngt_destroy_error_object(err); | ||
return id; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling in append method.
The method returns 0 on error, which could be confused with a valid ID. Consider using exceptions or a more explicit error handling mechanism.
-rust::i32 Index::append(rust::Slice<const rust::f32> v)
+rust::i32 Index::append(rust::Slice<const rust::f32> v)
{
NGTError err = ngt_create_error_object();
std::vector<float> vec(v.begin(), v.end());
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
if (id == 0)
{
- std::cerr << ngt_get_error_string(err) << std::endl;
+ std::string error_msg = ngt_get_error_string(err);
ngt_destroy_error_object(err);
- return 0;
+ throw std::runtime_error("Failed to append object: " + error_msg);
}
ngt_destroy_error_object(err);
return id;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
rust::i32 Index::append(rust::Slice<const rust::f32> v) | |
{ | |
NGTError err = ngt_create_error_object(); | |
std::vector<float> vec(v.begin(), v.end()); | |
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err); | |
if (id == 0) | |
{ | |
std::cerr << ngt_get_error_string(err) << std::endl; | |
ngt_destroy_error_object(err); | |
return 0; | |
} | |
ngt_destroy_error_object(err); | |
return id; | |
} | |
rust::i32 Index::append(rust::Slice<const rust::f32> v) | |
{ | |
NGTError err = ngt_create_error_object(); | |
std::vector<float> vec(v.begin(), v.end()); | |
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err); | |
if (id == 0) | |
{ | |
std::string error_msg = ngt_get_error_string(err); | |
ngt_destroy_error_object(err); | |
throw std::runtime_error("Failed to append object: " + error_msg); | |
} | |
ngt_destroy_error_object(err); | |
return id; | |
} |
rust/libs/algorithms/qbg/src/lib.rs
Outdated
pub fn open_index(&mut self, path: &String, prebuilt: bool) { | ||
self.inner.pin_mut().open_index(path, prebuilt); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for index operations
Methods like open_index
, build_index
, and save_index
don't return Results, making it impossible to handle failures gracefully.
Add consistent error handling:
- pub fn open_index(&mut self, path: &String, prebuilt: bool) {
- self.inner.pin_mut().open_index(path, prebuilt);
+ pub fn open_index(&mut self, path: &String, prebuilt: bool) -> Result<(), cxx::Exception> {
+ self.inner.pin_mut().open_index(path, prebuilt)?;
+ Ok(())
}
- pub fn build_index(&mut self, path: &String, p: &mut property::Property) {
- self.inner.pin_mut().build_index(path, p.get_property());
+ pub fn build_index(&mut self, path: &String, p: &mut property::Property) -> Result<(), cxx::Exception> {
+ self.inner.pin_mut().build_index(path, p.get_property())?;
+ Ok(())
}
Also applies to: 334-336, 338-340
rust/libs/algorithms/qbg/src/lib.rs
Outdated
unsafe fn search( | ||
self: Pin<&mut Index>, | ||
v: &[f32], | ||
k: usize, | ||
ids: *mut i32, | ||
distances: *mut f32, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add safety documentation for unsafe search function
The unsafe search function uses raw pointers without clear documentation about:
- Required buffer sizes for
ids
anddistances
- Alignment requirements
- Ownership and lifetime guarantees
- Valid parameter ranges
Add comprehensive safety documentation:
+ /// # Safety
+ ///
+ /// Caller must ensure:
+ /// - `ids` points to a buffer of size >= k
+ /// - `distances` points to a buffer of size >= k
+ /// - Both buffers are properly aligned
+ /// - Buffers remain valid for the duration of the call
+ /// - k > 0 and k <= number of indexed vectors
unsafe fn search(
self: Pin<&mut Index>,
v: &[f32],
k: usize,
ids: *mut i32,
distances: *mut f32,
);
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
unsafe fn search( | |
self: Pin<&mut Index>, | |
v: &[f32], | |
k: usize, | |
ids: *mut i32, | |
distances: *mut f32, | |
); | |
/// # Safety | |
/// | |
/// Caller must ensure: | |
/// - `ids` points to a buffer of size >= k | |
/// - `distances` points to a buffer of size >= k | |
/// - Both buffers are properly aligned | |
/// - Buffers remain valid for the duration of the call | |
/// - k > 0 and k <= number of indexed vectors | |
unsafe fn search( | |
self: Pin<&mut Index>, | |
v: &[f32], | |
k: usize, | |
ids: *mut i32, | |
distances: *mut f32, | |
); |
rust/libs/algorithms/qbg/src/lib.rs
Outdated
pub fn set_qbg_construction_parameters( | ||
&mut self, | ||
extended_dimension: usize, | ||
dimension: usize, | ||
number_of_subvectors: usize, | ||
number_of_blobs: usize, | ||
internal_data_type: i32, | ||
data_type: i32, | ||
distance_type: i32, | ||
) { | ||
self.inner | ||
.pin_mut() | ||
.set_extended_dimension(extended_dimension); | ||
self.inner.pin_mut().set_dimension(dimension); | ||
self.inner | ||
.pin_mut() | ||
.set_number_of_subvectors(number_of_subvectors); | ||
self.inner.pin_mut().set_number_of_blobs(number_of_blobs); | ||
self.inner | ||
.pin_mut() | ||
.set_internal_data_type(internal_data_type); | ||
self.inner.pin_mut().set_data_type(data_type); | ||
self.inner.pin_mut().set_distance_type(distance_type); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation for construction parameters
The set_qbg_construction_parameters
method accepts parameters without validation. Consider adding checks for:
- Non-zero dimensions
- Valid relationships between dimensions and subvectors
- Valid enum values for data types and distance types
Add validation:
pub fn set_qbg_construction_parameters(
&mut self,
extended_dimension: usize,
dimension: usize,
number_of_subvectors: usize,
number_of_blobs: usize,
internal_data_type: i32,
data_type: i32,
distance_type: i32,
) {
+ if dimension == 0 {
+ panic!("dimension must be greater than 0");
+ }
+ if number_of_subvectors == 0 || number_of_subvectors > dimension {
+ panic!("invalid number_of_subvectors");
+ }
+ // Add more validation as needed
self.inner
.pin_mut()
.set_extended_dimension(extended_dimension);
Committable suggestion skipped: line range outside the PR's diff.
rust/libs/algorithms/qbg/src/lib.rs
Outdated
// Search | ||
let mut ids: Vec<i32> = vec![0; K]; | ||
let mut distances: Vec<f32> = vec![0.0; K]; | ||
let mut vec = vec![0.0; DIMENSION]; | ||
for i in 0..DIMENSION { | ||
vec[i] = i as f32 | ||
} | ||
unsafe { | ||
index.pin_mut().search( | ||
vec.as_slice(), | ||
K, | ||
&mut ids[0] as *mut i32, | ||
&mut distances[0] as *mut f32, | ||
) | ||
}; | ||
println!("ids:\n{:?}", ids); | ||
println!("distances:\n{:?}", distances); | ||
index.pin_mut().close_index(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance search result validation in tests
The test prints search results without verifying their correctness. Add assertions to validate:
- Result ordering (by distance)
- Distance values are non-negative and properly ordered
- IDs are within valid range
Add validation:
unsafe {
index.pin_mut().search(
vec.as_slice(),
K,
&mut ids[0] as *mut i32,
&mut distances[0] as *mut f32,
)
};
- println!("ids:\n{:?}", ids);
- println!("distances:\n{:?}", distances);
+ // Validate results
+ assert!(ids.iter().all(|&id| id >= 0 && id < 100), "Invalid IDs in results");
+ assert!(distances.windows(2).all(|w| w[0] <= w[1]), "Distances not properly ordered");
+ assert!(distances.iter().all(|&d| d >= 0.0), "Negative distances found");
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Search | |
let mut ids: Vec<i32> = vec![0; K]; | |
let mut distances: Vec<f32> = vec![0.0; K]; | |
let mut vec = vec![0.0; DIMENSION]; | |
for i in 0..DIMENSION { | |
vec[i] = i as f32 | |
} | |
unsafe { | |
index.pin_mut().search( | |
vec.as_slice(), | |
K, | |
&mut ids[0] as *mut i32, | |
&mut distances[0] as *mut f32, | |
) | |
}; | |
println!("ids:\n{:?}", ids); | |
println!("distances:\n{:?}", distances); | |
index.pin_mut().close_index(); | |
// Search | |
let mut ids: Vec<i32> = vec![0; K]; | |
let mut distances: Vec<f32> = vec![0.0; K]; | |
let mut vec = vec![0.0; DIMENSION]; | |
for i in 0..DIMENSION { | |
vec[i] = i as f32 | |
} | |
unsafe { | |
index.pin_mut().search( | |
vec.as_slice(), | |
K, | |
&mut ids[0] as *mut i32, | |
&mut distances[0] as *mut f32, | |
) | |
}; | |
// Validate results | |
assert!(ids.iter().all(|&id| id >= 0 && id < 100), "Invalid IDs in results"); | |
assert!(distances.windows(2).all(|w| w[0] <= w[1]), "Distances not properly ordered"); | |
assert!(distances.iter().all(|&d| d >= 0.0), "Negative distances found"); | |
index.pin_mut().close_index(); |
6de8a6a
to
6e5aa96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
rust/Cargo.toml
(1 hunks)rust/libs/algorithm/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/build.rs
(1 hunks)rust/libs/algorithms/qbg/src/input.cpp
(1 hunks)rust/libs/algorithms/qbg/src/input.h
(1 hunks)rust/libs/algorithms/qbg/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- rust/Cargo.toml
- rust/libs/algorithm/Cargo.toml
- rust/libs/algorithms/qbg/Cargo.toml
- rust/libs/algorithms/qbg/build.rs
- rust/libs/algorithms/qbg/src/input.h
- rust/libs/algorithms/qbg/src/lib.rs
🔇 Additional comments (1)
rust/libs/algorithms/qbg/src/input.cpp (1)
335-348
: LGTM! Well-designed factory functions.
The factory functions are well-implemented:
- Use of
std::unique_ptr
ensures proper memory management - Clear and concise interface for object creation
- Follow modern C++ practices
void Property::set_qbg_construction_parameters( | ||
rust::usize extended_dimension, | ||
rust::usize dimension, | ||
rust::usize number_of_subvectors, | ||
rust::usize number_of_blobs, | ||
rust::i32 internal_data_type, | ||
rust::i32 data_type, | ||
rust::i32 distance_type) | ||
{ | ||
qbg_initialize_construction_parameters(qbg_construction_parameters); | ||
qbg_construction_parameters->extended_dimension = extended_dimension; | ||
qbg_construction_parameters->dimension = dimension; | ||
qbg_construction_parameters->number_of_subvectors = number_of_subvectors; | ||
qbg_construction_parameters->number_of_blobs = number_of_blobs; | ||
qbg_construction_parameters->internal_data_type = internal_data_type; | ||
qbg_construction_parameters->data_type = data_type; | ||
qbg_construction_parameters->distance_type = distance_type; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add comprehensive parameter validation in bulk parameter setters.
The bulk parameter setters set_qbg_construction_parameters
and set_qbg_build_parameters
should validate their input parameters before setting them. This helps catch invalid configurations early.
Example implementation for set_qbg_construction_parameters
:
void Property::set_qbg_construction_parameters(
rust::usize extended_dimension,
rust::usize dimension,
rust::usize number_of_subvectors,
rust::usize number_of_blobs,
rust::i32 internal_data_type,
rust::i32 data_type,
rust::i32 distance_type)
{
+ if (dimension == 0) {
+ throw std::invalid_argument("dimension must be positive");
+ }
+ if (number_of_subvectors == 0) {
+ throw std::invalid_argument("number_of_subvectors must be positive");
+ }
+ if (number_of_blobs == 0) {
+ throw std::invalid_argument("number_of_blobs must be positive");
+ }
+ if (extended_dimension < dimension) {
+ throw std::invalid_argument("extended_dimension must be >= dimension");
+ }
qbg_initialize_construction_parameters(qbg_construction_parameters);
// ... rest of the method
}
Also applies to: 105-136
Index::~Index() | ||
{ | ||
// close_index(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment and implement the destructor.
The destructor is currently commented out, which will lead to resource leaks. The index needs to be properly closed when the object is destroyed.
Apply this fix:
Index::~Index()
{
- // close_index();
+ close_index();
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Index::~Index() | |
{ | |
// close_index(); | |
} | |
Index::~Index() | |
{ | |
close_index(); | |
} |
6e5aa96
to
de84cf0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
rust/libs/algorithms/qbg/src/input.h (2)
23-30
: Add const correctness to getter methods.The getter methods should be marked as const since they don't modify the object's state.
Apply this diff to improve const correctness:
- rust::u32 get_id() { return id; } - rust::f32 get_distance() { return distance; } + rust::u32 get_id() const { return id; } + rust::f32 get_distance() const { return distance; }
59-75
: Improve parameter documentation for set_qbg_build_parameters.The method has many parameters with only partial documentation. Consider adding detailed parameter descriptions or using a parameter struct for better organization.
Example documentation format:
/** * Sets the QBG build parameters * @param init_mode Hierarchical clustering initialization mode * @param first_objects Number of objects for first level * ... */ void set_qbg_build_parameters(...);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
rust/Cargo.toml
(1 hunks)rust/libs/algorithm/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/build.rs
(1 hunks)rust/libs/algorithms/qbg/src/input.cpp
(1 hunks)rust/libs/algorithms/qbg/src/input.h
(1 hunks)rust/libs/algorithms/qbg/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- rust/Cargo.toml
- rust/libs/algorithm/Cargo.toml
- rust/libs/algorithms/qbg/Cargo.toml
- rust/libs/algorithms/qbg/build.rs
- rust/libs/algorithms/qbg/src/lib.rs
🔇 Additional comments (5)
rust/libs/algorithms/qbg/src/input.h (4)
1-22
: LGTM! License and includes are properly set up.
The license header is complete, and all necessary includes are present.
94-94
: Replace void *index
with a specific pointer type.
Using void *
reduces type safety and can lead to errors.
112-114
: LGTM! Factory functions are well-designed.
The use of std::unique_ptr
for factory functions follows modern C++ best practices and ensures proper resource management.
1-114
: Verify Rust FFI integration and usage patterns.
Let's ensure the Rust side correctly uses this C++ interface.
✅ Verification successful
FFI integration is properly implemented and safe
The C++ interface is correctly integrated with Rust through CXX bridge with proper memory management:
- Uses
cxx::bridge
for safe FFI bindings - Proper use of
UniquePtr
for C++ object lifetime management - Safe wrapper implementations in Rust modules (
property
andindex
) - Comprehensive test coverage validating the integration
- Type-safe conversions between Rust and C++ types
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check Rust FFI usage patterns
# Look for Rust FFI bindings
rg -t rust "extern \"C\"" rust/libs/algorithms/qbg/src/
# Look for cxx bridge usage
rg -t rust "#\[cxx::bridge\]" rust/libs/algorithms/qbg/src/
# Check for potential memory management patterns
rg -t rust "Box::into_raw|from_raw" rust/libs/algorithms/qbg/src/
Length of output: 236
Script:
#!/bin/bash
# Let's examine the Rust bridge implementation and usage patterns
# Check the Rust lib.rs content to see the bridge implementation
cat rust/libs/algorithms/qbg/src/lib.rs
# Look for any other Rust files that might use this interface
fd -e rs . rust/libs/algorithms/qbg/src/
# Look for tests that might show usage patterns
fd -e rs . rust/libs/algorithms/qbg/tests/
Length of output: 20764
rust/libs/algorithms/qbg/src/input.cpp (1)
333-346
: LGTM! Well-implemented factory functions
The factory functions properly use smart pointers and modern C++ practices for memory management.
QBGConstructionParameters *qbg_construction_parameters; | ||
QBGBuildParameters *qbg_build_parameters; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use smart pointers instead of raw pointers.
Raw pointers are error-prone and make memory management difficult. Consider using std::unique_ptr
for better memory safety.
Apply this diff:
- QBGConstructionParameters *qbg_construction_parameters;
- QBGBuildParameters *qbg_build_parameters;
+ std::unique_ptr<QBGConstructionParameters> qbg_construction_parameters;
+ std::unique_ptr<QBGBuildParameters> qbg_build_parameters;
Committable suggestion skipped: line range outside the PR's diff.
void Property::set_number_of_second_objects(rust::u32 number_of_second_objects) | ||
{ | ||
qbg_build_parameters->number_of_second_objects = number_of_second_objects; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential integer overflow risk in number_of_second_objects parameter type
The method accepts rust::u32
while other similar methods use rust::usize
. This inconsistency could lead to integer overflow issues on 64-bit systems when handling large datasets.
-void Property::set_number_of_second_objects(rust::u32 number_of_second_objects)
+void Property::set_number_of_second_objects(rust::usize number_of_second_objects)
{
qbg_build_parameters->number_of_second_objects = number_of_second_objects;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
void Property::set_number_of_second_objects(rust::u32 number_of_second_objects) | |
{ | |
qbg_build_parameters->number_of_second_objects = number_of_second_objects; | |
} | |
void Property::set_number_of_second_objects(rust::usize number_of_second_objects) | |
{ | |
qbg_build_parameters->number_of_second_objects = number_of_second_objects; | |
} |
void Index::build_index(const rust::String &path, Property &p) | ||
{ | ||
NGTError err = ngt_create_error_object(); | ||
std::string cpath(path); | ||
bool ok = qbg_build_index(cpath.c_str(), p.get_qbg_build_parameters(), err); | ||
if (!ok) | ||
{ | ||
std::cerr << "Error: " << __func__ << std::endl; | ||
std::cerr << ngt_get_error_string(err) << std::endl; | ||
ngt_destroy_error_object(err); | ||
return; | ||
} | ||
ngt_destroy_error_object(err); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error propagation in build_index
The method silently fails on error, making it difficult for callers to handle failures appropriately. Consider throwing an exception or returning a status code.
-void Index::build_index(const rust::String &path, Property &p)
+bool Index::build_index(const rust::String &path, Property &p)
{
NGTError err = ngt_create_error_object();
std::string cpath(path);
bool ok = qbg_build_index(cpath.c_str(), p.get_qbg_build_parameters(), err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
ngt_destroy_error_object(err);
- return;
+ return false;
}
ngt_destroy_error_object(err);
+ return true;
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
void Index::build_index(const rust::String &path, Property &p) | |
{ | |
NGTError err = ngt_create_error_object(); | |
std::string cpath(path); | |
bool ok = qbg_build_index(cpath.c_str(), p.get_qbg_build_parameters(), err); | |
if (!ok) | |
{ | |
std::cerr << "Error: " << __func__ << std::endl; | |
std::cerr << ngt_get_error_string(err) << std::endl; | |
ngt_destroy_error_object(err); | |
return; | |
} | |
ngt_destroy_error_object(err); | |
} | |
bool Index::build_index(const rust::String &path, Property &p) | |
{ | |
NGTError err = ngt_create_error_object(); | |
std::string cpath(path); | |
bool ok = qbg_build_index(cpath.c_str(), p.get_qbg_build_parameters(), err); | |
if (!ok) | |
{ | |
std::cerr << "Error: " << __func__ << std::endl; | |
std::cerr << ngt_get_error_string(err) << std::endl; | |
ngt_destroy_error_object(err); | |
return false; | |
} | |
ngt_destroy_error_object(err); | |
return true; | |
} |
std::unique_ptr<std::vector<SearchResult>> Index::search(rust::Slice<const rust::f32> v, rust::usize k) | ||
{ | ||
QBGQuery query; | ||
qbg_initialize_query(&query); | ||
std::vector<float> vec(v.begin(), v.end()); | ||
query.query = vec.data(); | ||
|
||
NGTError err = ngt_create_error_object(); | ||
NGTObjectDistances results = ngt_create_empty_results(err); | ||
bool ok = qbg_search_index(index, query, results, err); | ||
if (!ok) | ||
{ | ||
std::cerr << "Error: " << __func__ << std::endl; | ||
std::cerr << ngt_get_error_string(err) << std::endl; | ||
qbg_destroy_results(results); | ||
ngt_destroy_error_object(err); | ||
return nullptr; | ||
} | ||
|
||
size_t rsize = qbg_get_result_size(results, err); | ||
size_t limit = std::min(k, rsize); | ||
std::vector<SearchResult> searchResults; | ||
for (size_t i = 0; i < limit; i++) | ||
{ | ||
NGTObjectDistance obj = qbg_get_result(results, i, err); | ||
searchResults.push_back(SearchResult{obj.id, obj.distance}); | ||
} | ||
|
||
qbg_destroy_results(results); | ||
ngt_destroy_error_object(err); | ||
|
||
return std::make_unique<std::vector<SearchResult>>(searchResults); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential resource leak in search method
The error handling in the search method could leak resources if qbg_get_result
fails. Consider using RAII or ensuring cleanup in all error paths.
std::unique_ptr<std::vector<SearchResult>> Index::search(rust::Slice<const rust::f32> v, rust::usize k)
{
QBGQuery query;
qbg_initialize_query(&query);
std::vector<float> vec(v.begin(), v.end());
query.query = vec.data();
NGTError err = ngt_create_error_object();
NGTObjectDistances results = ngt_create_empty_results(err);
+ // Use RAII to ensure cleanup
+ struct ResultsGuard {
+ NGTObjectDistances results;
+ NGTError err;
+ ~ResultsGuard() {
+ qbg_destroy_results(results);
+ ngt_destroy_error_object(err);
+ }
+ } guard{results, err};
bool ok = qbg_search_index(index, query, results, err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
- qbg_destroy_results(results);
- ngt_destroy_error_object(err);
return nullptr;
}
- size_t rsize = qbg_get_result_size(results, err);
+ size_t rsize = qbg_get_result_size(guard.results, err);
size_t limit = std::min(k, rsize);
std::vector<SearchResult> searchResults;
for (size_t i = 0; i < limit; i++)
{
- NGTObjectDistance obj = qbg_get_result(results, i, err);
+ NGTObjectDistance obj = qbg_get_result(guard.results, i, guard.err);
searchResults.push_back(SearchResult{obj.id, obj.distance});
}
- qbg_destroy_results(results);
- ngt_destroy_error_object(err);
return std::make_unique<std::vector<SearchResult>>(searchResults);
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
std::unique_ptr<std::vector<SearchResult>> Index::search(rust::Slice<const rust::f32> v, rust::usize k) | |
{ | |
QBGQuery query; | |
qbg_initialize_query(&query); | |
std::vector<float> vec(v.begin(), v.end()); | |
query.query = vec.data(); | |
NGTError err = ngt_create_error_object(); | |
NGTObjectDistances results = ngt_create_empty_results(err); | |
bool ok = qbg_search_index(index, query, results, err); | |
if (!ok) | |
{ | |
std::cerr << "Error: " << __func__ << std::endl; | |
std::cerr << ngt_get_error_string(err) << std::endl; | |
qbg_destroy_results(results); | |
ngt_destroy_error_object(err); | |
return nullptr; | |
} | |
size_t rsize = qbg_get_result_size(results, err); | |
size_t limit = std::min(k, rsize); | |
std::vector<SearchResult> searchResults; | |
for (size_t i = 0; i < limit; i++) | |
{ | |
NGTObjectDistance obj = qbg_get_result(results, i, err); | |
searchResults.push_back(SearchResult{obj.id, obj.distance}); | |
} | |
qbg_destroy_results(results); | |
ngt_destroy_error_object(err); | |
return std::make_unique<std::vector<SearchResult>>(searchResults); | |
} | |
std::unique_ptr<std::vector<SearchResult>> Index::search(rust::Slice<const rust::f32> v, rust::usize k) | |
{ | |
QBGQuery query; | |
qbg_initialize_query(&query); | |
std::vector<float> vec(v.begin(), v.end()); | |
query.query = vec.data(); | |
NGTError err = ngt_create_error_object(); | |
NGTObjectDistances results = ngt_create_empty_results(err); | |
// Use RAII to ensure cleanup | |
struct ResultsGuard { | |
NGTObjectDistances results; | |
NGTError err; | |
~ResultsGuard() { | |
qbg_destroy_results(results); | |
ngt_destroy_error_object(err); | |
} | |
} guard{results, err}; | |
bool ok = qbg_search_index(index, query, results, err); | |
if (!ok) | |
{ | |
std::cerr << "Error: " << __func__ << std::endl; | |
std::cerr << ngt_get_error_string(err) << std::endl; | |
return nullptr; | |
} | |
size_t rsize = qbg_get_result_size(guard.results, err); | |
size_t limit = std::min(k, rsize); | |
std::vector<SearchResult> searchResults; | |
for (size_t i = 0; i < limit; i++) | |
{ | |
NGTObjectDistance obj = qbg_get_result(guard.results, i, guard.err); | |
searchResults.push_back(SearchResult{obj.id, obj.distance}); | |
} | |
return std::make_unique<std::vector<SearchResult>>(searchResults); | |
} |
9abde97
to
cd17301
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
rust/libs/algorithms/qbg/src/lib.rs (1)
96-102
: Add safety documentation for the search function.The search function handles raw data and should include documentation about:
- Valid parameter ranges (k, radius, epsilon)
- Input vector requirements (size, alignment)
- Performance implications
Add documentation:
/// Searches the index for nearest neighbors. /// /// # Arguments /// * `v` - Query vector. Must have the same dimension as indexed vectors /// * `k` - Number of nearest neighbors to return. Must be > 0 /// * `radius` - Search radius. Use 0.0 for unlimited radius /// * `epsilon` - Error tolerance for approximate search /// /// # Returns /// Vector of (id, distance) pairs for the k nearest neighbors fn search( self: Pin<&mut Index>, v: &[f32], k: usize, radius: f32, epsilon: f32, ) -> UniquePtr<CxxVector<SearchResult>>;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
rust/Cargo.toml
(1 hunks)rust/libs/algorithm/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/build.rs
(1 hunks)rust/libs/algorithms/qbg/src/input.cpp
(1 hunks)rust/libs/algorithms/qbg/src/input.h
(1 hunks)rust/libs/algorithms/qbg/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- rust/Cargo.toml
- rust/libs/algorithm/Cargo.toml
- rust/libs/algorithms/qbg/build.rs
- rust/libs/algorithms/qbg/Cargo.toml
- rust/libs/algorithms/qbg/src/input.h
🔇 Additional comments (4)
rust/libs/algorithms/qbg/src/input.cpp (3)
19-31
: LGTM! Constructor and destructor properly manage memory.
The Property class correctly handles memory allocation in the constructor and cleanup in the destructor.
233-236
:
Uncomment and implement the destructor.
The destructor is commented out, which could lead to resource leaks. The index needs to be properly closed when the object is destroyed.
Index::~Index()
{
- // close_index();
+ close_index();
}
Likely invalid or redundant comment.
155-158
:
Fix type inconsistency in number_of_second_objects parameter.
The method uses rust::u32
while other similar methods use rust::usize
. This inconsistency could lead to integer overflow issues on 64-bit systems when handling large datasets.
-void Property::set_number_of_second_objects(rust::u32 number_of_second_objects)
+void Property::set_number_of_second_objects(rust::usize number_of_second_objects)
{
qbg_build_parameters->number_of_second_objects = number_of_second_objects;
}
Likely invalid or redundant comment.
rust/libs/algorithms/qbg/src/lib.rs (1)
461-475
: 🛠️ Refactor suggestion
Add assertions to verify search results.
The test prints search results without verifying their correctness. Add assertions to validate:
- Result ordering (by distance)
- Distance values are non-negative
- Number of results matches k
let distances: Vec<f32> = search_results
.pin_mut()
.into_iter()
.map(|s| s.get_distance())
.collect();
- println!("ids:\n{:?}", ids);
- println!("distances:\n{:?}", distances);
+ assert_eq!(ids.len(), K, "Wrong number of results");
+ assert!(distances.iter().all(|&d| d >= 0.0), "Negative distances found");
+ assert!(distances.windows(2).all(|w| w[0] <= w[1]), "Results not ordered by distance");
Likely invalid or redundant comment.
pub fn set_qbg_construction_parameters( | ||
&mut self, | ||
extended_dimension: usize, | ||
dimension: usize, | ||
number_of_subvectors: usize, | ||
number_of_blobs: usize, | ||
internal_data_type: i32, | ||
data_type: i32, | ||
distance_type: i32, | ||
) { | ||
self.inner.pin_mut().set_qbg_construction_parameters( | ||
extended_dimension, | ||
dimension, | ||
number_of_subvectors, | ||
number_of_blobs, | ||
internal_data_type, | ||
data_type, | ||
distance_type, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation for construction parameters.
The set_qbg_construction_parameters method should validate inputs before passing them to C++.
Add validation:
pub fn set_qbg_construction_parameters(
&mut self,
extended_dimension: usize,
dimension: usize,
number_of_subvectors: usize,
number_of_blobs: usize,
internal_data_type: i32,
data_type: i32,
distance_type: i32,
) {
if dimension == 0 {
panic!("dimension must be positive");
}
if number_of_subvectors == 0 || number_of_subvectors > dimension {
panic!("invalid number_of_subvectors");
}
if extended_dimension < dimension {
panic!("extended_dimension must be >= dimension");
}
self.inner.pin_mut().set_qbg_construction_parameters(
extended_dimension,
dimension,
number_of_subvectors,
number_of_blobs,
internal_data_type,
data_type,
distance_type,
)
}
rust/libs/algorithms/qbg/src/lib.rs
Outdated
pub fn new(path: &String, p: &mut property::Property) -> Result<Self, cxx::Exception> { | ||
let inner = match ffi::new_index(path, p.get_property()) { | ||
Ok(inner) => inner, | ||
Err(e) => panic!("{}", e), | ||
}; | ||
Ok(Index { inner }) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling in new method.
Using panic! for error handling is not idiomatic Rust. Consider propagating the error instead.
- let inner = match ffi::new_index(path, p.get_property()) {
- Ok(inner) => inner,
- Err(e) => panic!("{}", e),
- };
+ let inner = ffi::new_index(path, p.get_property())?;
Ok(Index { inner })
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pub fn new(path: &String, p: &mut property::Property) -> Result<Self, cxx::Exception> { | |
let inner = match ffi::new_index(path, p.get_property()) { | |
Ok(inner) => inner, | |
Err(e) => panic!("{}", e), | |
}; | |
Ok(Index { inner }) | |
} | |
pub fn new(path: &String, p: &mut property::Property) -> Result<Self, cxx::Exception> { | |
let inner = ffi::new_index(path, p.get_property())?; | |
Ok(Index { inner }) | |
} |
cd17301
to
c7fce8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
rust/libs/algorithms/qbg/src/lib.rs (1)
400-720
: Enhance test coverage and assertionsThe test implementations primarily verify basic functionality but lack comprehensive assertions and error case testing.
Add the following improvements to tests:
- Verify search result ordering:
// In test_index() let search_results = index.search(vec.as_slice(), K, RADIUS, EPSILON)?; assert!( search_results.windows(2).all(|w| w[0].1 <= w[1].1), "Search results should be ordered by distance" );
- Add error case testing:
#[test] fn test_index_error_cases() -> Result<()> { let mut index = Index::new("nonexistent_path", &mut Property::new())?; assert!(index.get_object(999999).is_err(), "Should error on invalid ID"); Ok(()) }
- Test parameter validation:
#[test] #[should_panic(expected = "dimension must be positive")] fn test_invalid_dimension() { let mut p = Property::new(); p.set_qbg_construction_parameters(1, 0, 1, 1, 1, 1, 1); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
rust/Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
rust/Cargo.toml
(1 hunks)rust/libs/algorithm/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/Cargo.toml
(1 hunks)rust/libs/algorithms/qbg/build.rs
(1 hunks)rust/libs/algorithms/qbg/src/input.cpp
(1 hunks)rust/libs/algorithms/qbg/src/input.h
(1 hunks)rust/libs/algorithms/qbg/src/lib.rs
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- rust/Cargo.toml
- rust/libs/algorithm/Cargo.toml
- rust/libs/algorithms/qbg/Cargo.toml
- rust/libs/algorithms/qbg/build.rs
- rust/libs/algorithms/qbg/src/input.h
👮 Files not reviewed due to content moderation or server errors (1)
- rust/libs/algorithms/qbg/src/input.cpp
🔇 Additional comments (1)
rust/libs/algorithms/qbg/src/lib.rs (1)
131-150
: 🛠️ Refactor suggestion
Add input validation for construction parameters
The set_qbg_construction_parameters
method accepts parameters without validation, which could lead to undefined behavior in the C++ code.
pub fn set_qbg_construction_parameters(
&mut self,
extended_dimension: usize,
dimension: usize,
number_of_subvectors: usize,
number_of_blobs: usize,
internal_data_type: i32,
data_type: i32,
distance_type: i32,
) {
+ if dimension == 0 {
+ panic!("dimension must be positive");
+ }
+ if number_of_subvectors == 0 || number_of_subvectors > dimension {
+ panic!("invalid number_of_subvectors");
+ }
+ if extended_dimension < dimension {
+ panic!("extended_dimension must be >= dimension");
+ }
self.inner.pin_mut().set_qbg_construction_parameters(
extended_dimension,
dimension,
number_of_subvectors,
number_of_blobs,
internal_data_type,
data_type,
distance_type,
)
}
Likely invalid or redundant comment.
unsafe extern "C++" { | ||
include!("qbg/src/input.h"); | ||
|
||
type Property; | ||
fn new_property() -> UniquePtr<Property>; | ||
fn init_qbg_construction_parameters(self: Pin<&mut Property>); | ||
fn set_qbg_construction_parameters( | ||
self: Pin<&mut Property>, | ||
extended_dimension: usize, | ||
dimension: usize, | ||
number_of_subvectors: usize, | ||
number_of_blobs: usize, | ||
internal_data_type: i32, | ||
data_type: i32, | ||
distance_type: i32, | ||
); | ||
fn set_extended_dimension(self: Pin<&mut Property>, extended_dimension: usize); | ||
fn set_dimension(self: Pin<&mut Property>, dimension: usize); | ||
fn set_number_of_subvectors(self: Pin<&mut Property>, number_of_subvectors: usize); | ||
fn set_number_of_blobs(self: Pin<&mut Property>, number_of_blobs: usize); | ||
fn set_internal_data_type(self: Pin<&mut Property>, internal_data_type: i32); | ||
fn set_data_type(self: Pin<&mut Property>, data_type: i32); | ||
fn set_distance_type(self: Pin<&mut Property>, distance_type: i32); | ||
fn init_qbg_build_parameters(self: Pin<&mut Property>); | ||
fn set_qbg_build_parameters( | ||
self: Pin<&mut Property>, | ||
hierarchical_clustering_init_mode: i32, | ||
number_of_first_objects: usize, | ||
number_of_first_clusters: usize, | ||
number_of_second_objects: usize, | ||
number_of_second_clusters: usize, | ||
number_of_third_clusters: usize, | ||
number_of_objects: usize, | ||
number_of_subvectors: usize, | ||
optimization_clustering_init_mode: i32, | ||
rotation_iteration: usize, | ||
subvector_iteration: usize, | ||
number_of_matrices: usize, | ||
rotation: bool, | ||
repositioning: bool, | ||
); | ||
fn set_hierarchical_clustering_init_mode( | ||
self: Pin<&mut Property>, | ||
hierarchical_clustering_init_mode: i16, | ||
); | ||
fn set_number_of_first_objects(self: Pin<&mut Property>, number_of_first_objects: usize); | ||
fn set_number_of_first_clusters(self: Pin<&mut Property>, number_of_first_clusters: usize); | ||
fn set_number_of_second_objects(self: Pin<&mut Property>, number_of_second_objects: u32); | ||
fn set_number_of_second_clusters( | ||
self: Pin<&mut Property>, | ||
number_of_second_clusters: usize, | ||
); | ||
fn set_number_of_third_clusters(self: Pin<&mut Property>, number_of_third_clusters: usize); | ||
fn set_number_of_objects(self: Pin<&mut Property>, number_of_objects: usize); | ||
fn set_number_of_subvectors_for_bp(self: Pin<&mut Property>, number_of_subvectors: usize); | ||
fn set_optimization_clustering_init_mode( | ||
self: Pin<&mut Property>, | ||
optimization_clustering_init_mode: i32, | ||
); | ||
fn set_rotation_iteration(self: Pin<&mut Property>, rotation_iteration: usize); | ||
fn set_subvector_iteration(self: Pin<&mut Property>, subvector_iteration: usize); | ||
fn set_number_of_matrices(self: Pin<&mut Property>, number_of_matrices: usize); | ||
fn set_rotation(self: Pin<&mut Property>, rotation: bool); | ||
fn set_repositioning(self: Pin<&mut Property>, repositioning: bool); | ||
|
||
type SearchResult; | ||
fn get_id(self: Pin<&mut SearchResult>) -> u32; | ||
fn get_distance(self: Pin<&mut SearchResult>) -> f32; | ||
|
||
type Index; | ||
fn new_index(path: &String, p: Pin<&mut Property>) -> Result<UniquePtr<Index>>; | ||
fn new_prebuilt_index(path: &String, p: bool) -> Result<UniquePtr<Index>>; | ||
fn open_index(self: Pin<&mut Index>, path: &String, prebuilt: bool) -> Result<()>; | ||
fn build_index(self: Pin<&mut Index>, path: &String, p: Pin<&mut Property>) -> Result<()>; | ||
fn save_index(self: Pin<&mut Index>) -> Result<()>; | ||
fn close_index(self: Pin<&mut Index>); | ||
fn append(self: Pin<&mut Index>, v: &[f32]) -> Result<i32>; | ||
fn remove(self: Pin<&mut Index>, id: usize) -> Result<()>; | ||
fn search( | ||
self: Pin<&mut Index>, | ||
v: &[f32], | ||
k: usize, | ||
radius: f32, | ||
epsilon: f32, | ||
) -> UniquePtr<CxxVector<SearchResult>>; | ||
fn get_object(self: Pin<&mut Index>, id: usize) -> Result<*mut f32>; | ||
fn get_dimension(self: Pin<&mut Index>) -> Result<usize>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add safety documentation for FFI interface
The FFI interface uses unsafe code and raw pointers but lacks documentation about safety requirements. This is particularly important for functions like get_object
that return raw pointers.
Add safety documentation for FFI functions, especially:
/// # Safety
///
/// Caller must ensure:
/// - The index is properly initialized
/// - The id is valid and within bounds
/// - The returned pointer must not outlive the Index instance
/// - The dimension returned by get_dimension() matches the actual object size
unsafe extern "C++" {
// ... existing declarations ...
}
pub fn get_object(&mut self, id: usize) -> Result<&[f32], cxx::Exception> { | ||
let dim = match self.inner.pin_mut().get_dimension() { | ||
Ok(dim) => dim, | ||
Err(e) => return Err(e), | ||
}; | ||
match self.inner.pin_mut().get_object(id) { | ||
Ok(v) => Ok(unsafe { slice::from_raw_parts(v, dim) }), | ||
Err(e) => Err(e), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document safety requirements for unsafe code
The get_object
method uses unsafe code to create a slice from a raw pointer without clear documentation of safety requirements.
pub fn get_object(&mut self, id: usize) -> Result<&[f32], cxx::Exception> {
+ // SAFETY: We ensure safety by:
+ // 1. Getting the dimension first to ensure valid size
+ // 2. Checking that the pointer is non-null (done by Result)
+ // 3. The slice lifetime is tied to self, ensuring the underlying data remains valid
let dim = match self.inner.pin_mut().get_dimension() {
Ok(dim) => dim,
Err(e) => return Err(e),
};
match self.inner.pin_mut().get_object(id) {
Ok(v) => Ok(unsafe { slice::from_raw_parts(v, dim) }),
Err(e) => Err(e),
}
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pub fn get_object(&mut self, id: usize) -> Result<&[f32], cxx::Exception> { | |
let dim = match self.inner.pin_mut().get_dimension() { | |
Ok(dim) => dim, | |
Err(e) => return Err(e), | |
}; | |
match self.inner.pin_mut().get_object(id) { | |
Ok(v) => Ok(unsafe { slice::from_raw_parts(v, dim) }), | |
Err(e) => Err(e), | |
} | |
} | |
pub fn get_object(&mut self, id: usize) -> Result<&[f32], cxx::Exception> { | |
// SAFETY: We ensure safety by: | |
// 1. Getting the dimension first to ensure valid size | |
// 2. Checking that the pointer is non-null (done by Result) | |
// 3. The slice lifetime is tied to self, ensuring the underlying data remains valid | |
let dim = match self.inner.pin_mut().get_dimension() { | |
Ok(dim) => dim, | |
Err(e) => return Err(e), | |
}; | |
match self.inner.pin_mut().get_object(id) { | |
Ok(v) => Ok(unsafe { slice::from_raw_parts(v, dim) }), | |
Err(e) => Err(e), | |
} | |
} |
Description
Related Issue
Versions
Checklist
Special notes for your reviewer
Summary by CodeRabbit