Thoughts about Nucleo? #3491

Delapouite · 2023-10-26T06:28:17Z

Delapouite
Oct 26, 2023

Hi

In their latest release, the Helix text-editor exposed some details about their underlying fuzzy-matching library: https://helix-editor.com/news/release-23-10-highlights/

Reading more about this project, called Nucleo, I found refreshing that the README acknowledge fzf as a strong source of technical inspiration : https://github.com/helix-editor/nucleo

Then, the authors of this library claims that they manage to squeeze even more performance.

I'm not too verse into the intricacies of the various algorithms existing in the design space of "fuzzy engines", but I just wonder if there may be a few learning in the approach adopted by Nucleo that could benefit back to fzf?

junegunn · 2023-10-26T14:28:12Z

junegunn
Oct 26, 2023
Maintainer

Some initial thoughts after skimming through the page.

It looks like a Rust implementation of the fzf algorithm with some tweaks. Nothing groundbreaking.

squeeze even more performance

I wouldn't be surprised if a Rust implementation of the same algorithm is faster because you have more low-level control (e.g. no runtime GC) and thus more room for optimization. But does it really matter? At the current level of performance, any performance gain in the matching algorithm will be of marginal benefit to users, because fzf is already fast for most practical scenarios.

Excerpt: This means that nucleo finds the optimal match more often.

This statement is technically incorrect and misleading. The fzf algorithm is guaranteed to give you the optimal result according to its scoring model. It's a misuse of the term "optimal". Anyway, what they are saying is that they feel their scoring model gives more "natural", "intuitive" results, but the feeling of "intuitiveness" is subjective and we don't have an objective measure. And it can vary greatly depending on the context. There is no one-size-fits-all model that will always produce the most "intuitive" result for all types of input for all users with different expectations.

Excerpt: For example if you match foo in xf foo nucleo will match x__foo but fzf will match xf_oo (you can increase the word length the result will stay the same). The former is the more intuitive match and has a higher score according to the ranking system that both nucleo and fzf.

I didn't understand this part. What exactly are we doing?

Is this what the author is testing? If that's the case, no, fzf doesn't prefer xf_oo.

$ (echo x__foo; echo xf_oo) | fzf -f 'xf foo' | head -1
x__foo

https://github.com/helix-editor/nucleo/blob/master/README.md#comparison-with-fzf

This part was particularly interesting. It took 2~3 seconds for fzf to finish the search, and it took about 12 seconds for the helix editor to finish the search. At the end of the day, this is what matters to end users, so for them, fzf is 5 times faster.

This clicks with my point that improving the matching algorithm's performance won't yield significant benefits for users. Also, the test was run against 3 million items. But how often do you use fzf with that many items? Most of the time you'll be filtering maybe a few thousands, or a few tens of thousands of items, and in that realm, the performance difference will shrink to maybe a few milliseconds or a fraction of a millisecond and it won't be noticeable anyway.

So I believe my time is better spent in different areas of the program. If I'm going to work on the performance, I'd like to make the initial loading faster.

(Theoretically, fzf could do more work in the loading phase and make the subsequent filtering faster, but we don't do that because it results in a worse user experience.)

4 replies

kirawi Oct 31, 2023

Regarding the last part, the difference was clarified in jake-stewart/jfind#19 (comment)

Helix sorts all files alphabetically, which requires a single-threaded transversal (and a lot more processing), fd does not. Fd also doesn't follow symlink but helix does. Logically it takes longer to transverse the FS as a result. Helix even usese the exact same rust library as fd (ignore) we just use a different set of config flags. That is simply because our usecase is usually not opening ~ or / directly so these features were more important to us than transversal speed. I can modify the settings we pass to ignore and waiting for all files to stream in takes about the same time as fd | fzf does (it may be faster it may be slower I didn't really investigate it too much).

pascalkuthe Nov 1, 2023

Its great you took a look at nucleo. I think the README currently isn't the best, so there were some misunderstandings I would like to clear up. I wanted to originally write a blog post about nucleo in detail but didn't get around to it.

It looks like a Rust implementation of the fzf algorithm with some tweaks. Nothing groundbreaking.
I wouldn't be surprised if a Rust implementation of the same algorithm is faster because you have more low-level control (e.g. no runtime GC) and thus more room for optimization. But does it really matter? At the current level of performance, any performance gain in the matching algorithm will be of marginal benefit to users, because fzf is already fast for most practical scenarios.

Originally this crate started out as a line for line port of the fzf matcher. We really just needed a fastenough fuzzy matcher in rust for use in the helix editor. The skim matcher really isn't well optimized at all (well optimized code in rust can often be faster than in go but only if you optmizie your rust code to the same degree which skim didn't do) but I diverged from fzfs algorimth and implementation (but not the ranking system, its exactly identical).

This statement is technically incorrect and misleading. The fzf algorithm is guaranteed to give you the optimal result according to its scoring model. It's a misuse of the term "optimal". Anyway, what they are saying is that they feel their scoring model gives more "natural", "intuitive" results, but the feeling of "intuitiveness" is subjective and we don't have an objective measure. And it can vary greatly depending on the context. There is no one-size-fits-all model that will always produce the most "intuitive" result for all types of input for all users with different expectations.

I have to respectfully disagree here. Fzf algorithm is a modification of the Smith-Waterman algorithm and those modifications make it non-optimal. This is also the case for nucleo, I tried making it truly optimal but it makes the matcher worse in practical uses. I found this while verifying my line-for-line port of the fzf algorithm. I fuzzed the optimal fzf matcher by checking if the greedy matcher (again line for line port from fzfs v1 algorithm) ever produced a higher score. It turned out it did in some niche cases!

The reason for this is as follows: Fzfs bonus system (and nulceos) is stateful. Specifically, adjacent matches use the bonus of the previous character (if it's larger than the bonus for the current character). To find a truly optimal solution for such a formalism backtracking is required (which would lead to O(n^3) complexity, it the same issues as with non-affine gap penalties described here). You don't have to take my word for it either. The example in the README was the result of my minimizing the testcase for this that the fuzzer produced. Sadly I messed up slightly, let me elaborate:

echo 'xf foo' | fzf
type xfoo
observe the matching indecies: xf__oo
x__foo is matched by the fzf v1 algorithm and nucleo (I have a testcase for this)

I wrote more intuitive in the README to explain why this is better but looking at this strictly from a score perspective is quite trivial:

xf__oo has bonus: BONUS_BOUNDARY_WHITE * (BONUS_FIRST_CHAR_MULTIPLIER + 1) - PENALTY_GAP_START - PENALTY_GAP_EXTENSION + BONUS_CONSECUTIVE =30
x__foo has bonus: BONUS_BOUNDARY_WHITE * BONUS_FIRST_CHAR_MULTIPLIER - PENALTY_GAP_START - PENALTY_GAP_EXTENSION + BONUS_BOUNDARY_WHITE *3 = 46

The problem is that this is a bad testcase because fzf does actually handle it correctly. You can test this with echo xf foo\nxb foo | fzf -f xfoo | head -n1 (which correctly outputs xf foo since the score is the same).

The indices fzf computes don't always match the characters which it uses for scoring which threw me off here.

The thing I messed up is changing the separator to a whitespace it should be xf.foo. Here echo "xf.foo\nxb.foo" | fzf -f xfoo | head -n1 outputs xb.foo. So fzf ranks xb.foo higher than xf.foo (assigns a better score) even tough both contain the optimal substring x__foo

xf__oo has bonus: BONUS_BOUNDARY_WHITE * (BONUS_FIRST_CHAR_MULTIPLIER + 1) - PENALTY_GAP_START - PENALTY_GAP_EXTENSION + BONUS_CONSECUTIVE =30
x__foo has bonus: BONUS_BOUNDARY_WHITE * BONUS_FIRST_CHAR_MULTIPLIER - PENALTY_GAP_START - PENALTY_GAP_EXTENSION + BONUS_BOUNDARY *3 = 40

It's really quite simple to understand why this is happening when fzf decides whether to create a match or skip a match based solely on whether the score calculated at that position for the gap is higher than the match. But that is incorrect if this character has a bonus. Since that bonus also applies to any subsequent characters (the oo in this example) it can cause an arbitrarily large bonus for the subsequent characters which could lead to a match being better once you consider the rest of the pattern.

Nucleo can handle this because it actually has two matrices as described in https://www.cs.cmu.edu/~ckingsf/bioinfo-lectures/gaps.pdf nucleo maintains both the m and the p score and therefore essentially perform a single char lookahead/lookbehind. Nucleo is still not optimal since to find the optimal match in some cases still requires arbitrary lookahead/lookbehind which is not performantly possible. Nucleo can however find the optimal match in more cases.

This is not meant as a knock on fzfs algorithm I actually tried implementing a version of nulceo that entirely got rid of the adjacency bonus system but the matches were much worse in practice so I think it's the right design choice. The algorithm just isn't optimal in all cases.

Nucleo is indeed as you said an iterative improvement that is optimal in a couple more cases (but there are still cases where it isn't). I also put a lot of work into the implementation to make it performant (for example I don't actually store the matrix in memory and instead only store the last row of the m/p matrix (leading to much better cache locality) and a bunch of other optimizations. So you get slightly better matches while still being faster. It's nothing revolutionary but it's fair to call it an iterative improvement I think.

This part was particularly interesting. It took 2~3 seconds for fzf to finish the search, and it took about 12 seconds for the helix editor to finish the search. At the end of the day, this is what matters to end users, so for them, fzf is 5 times faster.

As I said the README has some issues, the difference in startup time is not at all caused by nucleo being slow at streaming things in but because I threw it together in a pinch. Helix follows sylinks by default and sorts all files alphabetically (and does a bunch of other things) which make the actual FS transversal much slower.

I agree that other parts are more important than the matcher. I also put a lot of thought into the higher level. Nulceo handles items streaming using an entirely lock-free data structure (no item list swapping, no chunking, ...) which is very fast . If you get two streaming input sources with the same speed nucleo can beat fzf there too (not by much tough, its bottlenecked by the FS in this case). The point of that screencase was to highlight that the matching is faster (lacking better benchmarks due to go being difficult to embed in rust for benchmarking).

This clicks with my point that improving the matching algorithm's performance won't yield significant benefits for users. Also, the test was run against 3 million items. But how often do you use fzf with that many items? Most of the time you'll be filtering maybe a few thousands, or a few tens of thousands of items, and in that realm, the performance difference will shrink to maybe a few milliseconds or a fraction of a millisecond and it won't be noticeable anyway.

So I believe my time is better spent in different areas of the program. If I'm going to work on the performance, I'd like to make the initial loading faster.

Fzf is certainly fast enough I totally understand that. The goal of nucleo is essentially "fzf but as a library" which we needed for helix. The README were my notes for a blogpost I wanted to write and showing what I had worked on because I thaught it would be interesting to share. I ran out of time and dumped it all in the README hoping some people would find it interesting and convince the rust ecosystem to switch from skim (which has a bunch of issues and is much slower than fzf and is also abandoned)

junegunn Nov 1, 2023
Maintainer

Thanks for the detailed comment.

The algorithm just isn't optimal in all cases.

Good to know. Thanks for sharing. I just think that the limitation does not significantly affect the users in practice especially because they don't really analyze the results. And also because of the fact that the item with the highest score from our scoring system is not necessarily the item they expect the most. FWIW, the algorithm of fzf hasn't changed much since like 2016 and we haven't received many complaints.

pascalkuthe Nov 1, 2023

Yeah it's not a huge issue in practice and I never meant to imply that fzf should change its algorithm. I just wanted to clarify my claims in the README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts about Nucleo? #3491

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Thoughts about Nucleo? #3491

Delapouite Oct 26, 2023

Replies: 1 comment · 4 replies

junegunn Oct 26, 2023 Maintainer

kirawi Oct 31, 2023

pascalkuthe Nov 1, 2023

junegunn Nov 1, 2023 Maintainer

pascalkuthe Nov 1, 2023

Delapouite
Oct 26, 2023

Replies: 1 comment 4 replies

junegunn
Oct 26, 2023
Maintainer

junegunn Nov 1, 2023
Maintainer