Skip to content

Commit

Permalink
Add missing word
Browse files Browse the repository at this point in the history
Based on context, I believe this work was omitted.
  • Loading branch information
look authored Nov 19, 2024
1 parent 451b1bf commit 5db2618
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion crates/bpe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ This type of algorithm is interesting for use cases where a certain token budget
This benchmark shows the runtime for the appending encoder when a text is encoded byte-by-byte.
For comparison we show the runtime of the backtracking encoder when it encodes the whole text at once.

The benchmark measured the runtime of encoding of slices of lengths 10, 100, 1000, and 10000 from a random 20000 token original using the o200k token set.
The benchmark measured the runtime of encoding of slices of lengths 10, 100, 1000, and 10000 from a random 20000 token original text using the o200k token set.

The graph below shows encoding runtime vs slice length.
The overall runtime of byte-by-byte incremental encoder for encoding the full text is comparable to the runtime of the backtracking encoder, with only a constant factor overhead.
Expand Down

0 comments on commit 5db2618

Please sign in to comment.