why using rfind_byte
in line_buffer.rs
#2342
-
I don't understand why using // Update our `last_lineterm` positions if we read one.
if let Some(i) = newbytes.rfind_byte(self.config.lineterm) {
self.last_lineterm = oldend + i + 1;
return Ok(true);
} And I add a fail test to line_buffer.rs: // line_buffer.rs
#[test]
fn buffer_limited_capacity4() {
let bytes = "home\nr\nlisa\nmaggie";
let mut linebuf = LineBufferBuilder::new()
.capacity(1)
.buffer_alloc(BufferAllocation::Error(6))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.fill().unwrap());
assert_eq!(rdr.bstr(), "home\n");
rdr.consume_all();
assert!(rdr.fill().unwrap());
assert_eq!(rdr.bstr(), "r\n");
rdr.consume_all();
assert!(rdr.fill().unwrap());
assert_eq!(rdr.bstr(), "lisa\n");
rdr.consume_all();
// We have just enough space.
assert!(rdr.fill().unwrap());
assert_eq!(rdr.bstr(), "maggie");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
} The test case fail with:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Judging by your failing test case, my guess is that you might be misunderstanding what a ripgrep/crates/searcher/src/line_buffer.rs Lines 464 to 465 in af6b6c5 Namely, a But the key here is at least one complete line. The whole point of a line buffer is that it tries to collect as many lines as possible, but without actually finding the boundaries of each of those lines. Because finding the boundaries of every line will slow grep programs down quite a bit. Indeed, in most searches, ripgrep never looks for line boundaries outside of lines that match. (Even computing line numbers doesn't require it. Computing line numbers can be done faster by simply counting rather than also returning the offsets of every line.) All we really need to do is read chunks from an underlying source, make sure it only contains complete lines and then hand it to the regex search code. Since reading bytes from a source might not fall precisely at a line boundary, we only return the internal buffer up to the last complete line that we read from the source. Discovering where that is merely requires a single reverse search to find the last line terminator in the buffer. The remaining bytes (the bytes after the last line terminator) constitute what is likely an incomplete line. So once the search code is done with the buffer, it gets "rolled" such that the complete lines are discarded and the incomplete line is shuffled to the beginning. More bytes from the source are then read and appended to the incomplete line. (The incomplete line may become a complete line if the next And so your test failing seems expected to me, since ripgrep/crates/searcher/src/line_buffer.rs Lines 172 to 176 in af6b6c5 |
Beta Was this translation helpful? Give feedback.
Judging by your failing test case, my guess is that you might be misunderstanding what a
LineBuffer
is. It is not a buffer for holding a single line, but rather a sequence of complete lines. That's what this comment is referring to:ripgrep/crates/searcher/src/line_buffer.rs
Lines 464 to 465 in af6b6c5
Namely, a
LineBuffer
guarantees that its buffer contains at least one complete line. How does the buffer determine whether it has a complete line or not? That only occurs when either a line terminator is seen or when EOF is reached.But the k…