-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing a stored value from 1 to 0 makes suspicion performance decreasing #9590
Comments
Strangely, replacing that constant zero with an equivalent calculation fixes the issue. Cranelift should constant fold these instructions, so why do they affect machine code generation? 🤔 Replace line 8 in f64.const 0
f64.abs f64.const 0
f64.const 0
f64.add f64.const 42
f64.const 42
f64.sub f64.const 777
f64.const 0
f64.mul |
This is happening because a register is being spilled, but why it's being spilled I'm not sure. Changing from 1 to 0 changes codegen which changes register allocation which causes this. In that sense I think it's worth digging in to why the spill happened here in the 0 case because I'm not sure what's going on. I've got this CLIF:
compiled with
where I've annotated the spill/reload slots. @cfallin would you know how to perhaps debug this further to see why the spill is being inserted? It seems to require some of the bits at the beginning of the function (e.g. calling some other function) so it may be related to caller/callee saves or something like that. I'm not sure if this is related to ABI details though as opposed to register allocator behavior. If it's regalloc-related this might end up leading to some nice wins on other benchmarks if it's not just float-related perhaps? |
Maybe -- FWIW I have zero cycles to dedicate to this right now, so you'll need to dig into RA2 on this, unfortunately. Maybe at some later time I'll have the ability to make another performance push. Sorry! |
Hi @alexcrichton, I went through again the codes and the following are my heuristic thoughts about this bug:
|
In the spilling case Cranelift uses the same value for multiple zero constants. In the non-spilling case it uses two different ones. In the original Generated CLIF after optimisations of @alexcrichton’s example (spills)Note how it uses only
Example that does not spillNote the additional
Generated CLIF after optimisations of the non-spilling example aboveNote how it uses
|
Hi @alexcrichton @primoly, I also find that if I change the Observations about the register spills and reload
Here are the differences between their machine codes:
|
@hungryzzz you might be right yeah! I would have naively expected live range splitting in regalloc2 though to kick in and make it such that the spill, if necessary, only happened around the call rather than through the loop. As for small differences, when it comes to regalloc heuristics AFAIK it's expected that small changes to the input can have subtle changes to the output. Especially with specific decisions like spilling here I'm not entirely surprised. That being said though I don't fully understand everything happening here so I can't really answer with certainty. (e.g. I don't know for certain why @primoly's changes resulted in different register allocation decisions) |
The reason for the spill is the call to another function which leads to Cranelift saving the value of the |
Test Cases
case.zip
Steps to Reproduce
Hi, I run the attached two cases(
good.wasm
&bad.wasm
) inWasmtime
andWasmEdge
(AOT), and collect their execution time respectively (measured bytime
tool).Expected Results & Actual Results
For
good.wasm
, the execution time in different runtimes are as follows:Wasmtime
: 0.99sWasmEdge
: 1.06sFor
bad.wasm
, the execution time in different runtimes are as follows:Wasmtime
: 6.57sWasmEdge
: 1.05sThe difference between the attached two cases is as follows: changing the stored value in line 8 from
1
to0
, which decreases Wasmtime performance by 5.5s but has no negative effect on WasmEdge.More observations & questions:
Wasmtime
compiles the loop conditions inbad.wasm
andgood.wasm
in different ways, but I don't understand why the changes of an instruction that seems unrelated to the loop can affect the compilation strategies to the loop.Versions and Environment
The text was updated successfully, but these errors were encountered: