why using `rfind_byte` in line_buffer.rs #2342

baoti · 2022-10-31T14:38:06Z

baoti
Oct 31, 2022

I don't understand why using rfind_byte in these code in line_buffer.rs.

// Update our `last_lineterm` positions if we read one.
if let Some(i) = newbytes.rfind_byte(self.config.lineterm) {
    self.last_lineterm = oldend + i + 1;
    return Ok(true);
}

And I add a fail test to line_buffer.rs:

// line_buffer.rs
#[test]
fn buffer_limited_capacity4() {
    let bytes = "home\nr\nlisa\nmaggie";
    let mut linebuf = LineBufferBuilder::new()
        .capacity(1)
        .buffer_alloc(BufferAllocation::Error(6))
        .build();
    let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);

    assert!(rdr.fill().unwrap());
    assert_eq!(rdr.bstr(), "home\n");
    rdr.consume_all();

    assert!(rdr.fill().unwrap());
    assert_eq!(rdr.bstr(), "r\n");
    rdr.consume_all();

    assert!(rdr.fill().unwrap());
    assert_eq!(rdr.bstr(), "lisa\n");
    rdr.consume_all();

    // We have just enough space.
    assert!(rdr.fill().unwrap());
    assert_eq!(rdr.bstr(), "maggie");
    rdr.consume_all();

    assert!(!rdr.fill().unwrap());
}

The test case fail with:

thread 'line_buffer::tests::buffer_limited_capacity4' panicked at 'assertion failed: (left == right)
left: "home\nr\n",
right: "home\n"', crates/searcher/src/line_buffer.rs:772:9

Answered by BurntSushi

Oct 31, 2022

Judging by your failing test case, my guess is that you might be misunderstanding what a LineBuffer is. It is not a buffer for holding a single line, but rather a sequence of complete lines. That's what this comment is referring to:

ripgrep/crates/searcher/src/line_buffer.rs

Lines 464 to 465 in af6b6c5

     // At this point, if we couldn't find a line terminator, then we  
   // don't have a complete line. Therefore, we try to read more!

Namely, a LineBuffer guarantees that its buffer contains at least one complete line. How does the buffer determine whether it has a complete line or not? That only occurs when either a line terminator is seen or when EOF is reached.

But the k…

View full answer

BurntSushi · 2022-10-31T15:47:24Z

BurntSushi
Oct 31, 2022
Maintainer

Judging by your failing test case, my guess is that you might be misunderstanding what a LineBuffer is. It is not a buffer for holding a single line, but rather a sequence of complete lines. That's what this comment is referring to:

ripgrep/crates/searcher/src/line_buffer.rs

Lines 464 to 465 in af6b6c5

    
           // At this point, if we couldn't find a line terminator, then we 
        
           // don't have a complete line. Therefore, we try to read more!

Namely, a LineBuffer guarantees that its buffer contains at least one complete line. How does the buffer determine whether it has a complete line or not? That only occurs when either a line terminator is seen or when EOF is reached.

But the key here is at least one complete line. The whole point of a line buffer is that it tries to collect as many lines as possible, but without actually finding the boundaries of each of those lines. Because finding the boundaries of every line will slow grep programs down quite a bit. Indeed, in most searches, ripgrep never looks for line boundaries outside of lines that match. (Even computing line numbers doesn't require it. Computing line numbers can be done faster by simply counting rather than also returning the offsets of every line.) All we really need to do is read chunks from an underlying source, make sure it only contains complete lines and then hand it to the regex search code. Since reading bytes from a source might not fall precisely at a line boundary, we only return the internal buffer up to the last complete line that we read from the source. Discovering where that is merely requires a single reverse search to find the last line terminator in the buffer. The remaining bytes (the bytes after the last line terminator) constitute what is likely an incomplete line. So once the search code is done with the buffer, it gets "rolled" such that the complete lines are discarded and the incomplete line is shuffled to the beginning. More bytes from the source are then read and appended to the incomplete line. (The incomplete line may become a complete line if the next read call returns EOF.)

And so your test failing seems expected to me, since home\nr\n is 7 bytes and you've configured the LineBuffer to be able to allocate an additional 6 bytes on top of the initial 1 byte capacity given. See:

ripgrep/crates/searcher/src/line_buffer.rs

Lines 172 to 176 in af6b6c5

    
           /// Note that this setting only applies to the amount of *additional* 
        
           /// memory to allocate, beyond the capacity of the buffer. That means that 
        
           /// a value of `0` is sensible, and in particular, will guarantee that a 
        
           /// line buffer will never allocate additional memory beyond its initial 
        
           /// capacity.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why using `rfind_byte` in line_buffer.rs #2342

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

	// At this point, if we couldn't find a line terminator, then we
	// don't have a complete line. Therefore, we try to read more!

why using rfind_byte in line_buffer.rs #2342

baoti Oct 31, 2022

Replies: 1 comment

BurntSushi Oct 31, 2022 Maintainer

why using `rfind_byte` in line_buffer.rs #2342

baoti
Oct 31, 2022

BurntSushi
Oct 31, 2022
Maintainer