-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving wasmtime performance on illumos via mmapping /dev/null #9544
Comments
Thanks for writing this up and investigating! I really like the patch you have passing around For a bit of history all our VM apis are a bit of a mish mash of organically evolved "abstractions" over time. The abstractions keep changing because we keep not being able to pin down exactly what we want. In that sense you're more than welcome to apply refactorings to shuffle things around (e.g. having one decommit instead of 3) and don't place too much weight on the current design. What I might recommend for landing these changes if you're up for it is incremental progress towards the refactoring here. The memory-related bits are notoriously tricky in Wasmtime so we ideally prefer to change only small bits at a time to ensure thorough review and everything. In that sense:
Those could all be separate PRs, but separate commits in the same PR is something I'd personally be ok with too. (would primarily prefer to not lump it all into one though). And if you feel there's better divisions of commits/PRs definitely feel free to slice/dice further too. How's that all sound? |
My worry with passing around &Mmap just is about the complexity of that growing out of hand. Do you have a sense of how difficult the refactoring would be? I'll give it a shot as time permits. (I would absolutely split this up into several PRs. I'm one of the biggest stacked PR proponents around :) ) |
Personally I thought sunshowers#1 looked pretty reasonable. I'm notoriously bad at judging ahead of time whether something will work out, so I'm happy to leave it to your discretion too as to whether the refactoring feels right/wrong. Any difficult parts I or others can help out with during review too. One thing I think is ok is to store |
In the PR I do store raw pointers to an Mmap in a couple of spots, because I found it a bit complex to change the APIs. I could probably replace those with Arc. I'll also try doing a more aggressive refactor (eg introducing lifetimes in a few places) now that you're on board with the general approach. Thanks! |
(I'm still a novice to the wasmtime code so please let me know if I've made any mistakes. Thanks!)
Motivation
In #9535 we added initial support for illumos. (Thanks!) As noted there, we did notice some performance issues, particularly around freeing memory at shutdown.
Some of my colleagues and I investigated the situation today, and we have a pretty good sense of what's going on (some of my colleagues will be writing up the related illumos issues at some point, but it's essentially that anonymous
MAP_NORESERVE
allocations of size N take up O(N) CPU, rather than apparently O(1) as on some other platforms. The constant factor is very small, but wasmtime allocates enough memory that it is noticeable.)However, we've found that there's an alternative mmapping strategy that works well, bringing illumos perf to within the same ballpark as Linux (at least as far as the wast test suite goes). The alternative strategy is:
PROT_NONE
mappings, rather than creating a large amount of anonymous memory, create a map to/dev/null
.I have a quick and dirty prototype that shows how
/dev/null
mapping might work, though it needs a lot of work to be made shippable. The prototype is at sunshowers#1 --mmap.rs
has the meaty interval tree logic if you're interested.With this PR,
(Note that this strategy doesn't work on Linux --
mmap
ing/dev/null
producesENODEV
. Naturally, if we go this route, this will be a platform-specific impl of mmapping on illumos, and possibly other systems where/dev/null
-based mapping might be faster.)Challenges
The most important implication of this approach is that all memory management must go through the mmap. This means that nothing outside the mmap code should call
mmap
,mmap_anonymous
ormprotect
directly.This turns out to be a bit of a sticking point, sadly. Several parts of the wasmtime VM store pointers to base addresses and operate on them directly.
cow.rs
callsexpose_existing_mapping
directly, which on Unix callsmprotect
. With/dev/null
-based mapping, that would no longer be possible -- that code must go throughMmap
instead.While some components that currently store addresses can be passed a
&Mmap
, this turns out to be more challenging for other parts -- particularly theRuntimeLinearMemory
API, which is generic over both an ownedMmapMemory
and a logically-borrowedStaticMemory
.In my prototype I just decided to store a
SendSyncPtr<Mmap>
insideStaticMemory
, with the hope/promise that theMmap
outlives theStaticMemory
. I don't think that's hugely worse than storing a raw*mut u8
as we do today where there's the same implicit promise, but it is arguably not great.I'll also say that in general, this change leads to some nice internal improvements. For example, there are currently at least three different implementations of decommitting memory in the kernel, and only one of them does
MADV_DONTNEED
on Linux. With my prototype, all of that lives in one spot.Possible solutions
So given that we've established the benefits of changing out the style of mapping, at least on illumos -- and given the challenges I encountered, I think there are a few approaches we could take:
/dev/null
-based mapping is also the officially recommended way to do this on illumos.*mut u8
, if it is too inconvenient to pass in a&Mmap
, store a pointer to theMmap
instead. If I understand correctly, these spots all already have an implicit requirement that theMmap
outlives them, so this would probably not be worse than today. But it is new unsafe Rust.&'map Mmap
or similar. This seemed a bit daunting to me -- particularly theBox<dyn RuntimeLinearMemory>
abstracting over both owned and borrowed memory mappings -- but as I said in the beginning, I'm a novice so it's probably a lot easier for an experienced wasmtime dev.Mmap
into anArc
and store (strong? weak?) refs to it where needed.I'd love to hear y'all's thoughts on this!
The text was updated successfully, but these errors were encountered: