Proposed instruction format for reintroducing exnref #281

aheejin · 2023-09-06T17:59:39Z

It’s been a while since we posted #280 and did a presentation in the 8/1 CG meeting. I think this is time we kick off more concrete discussions on what to do for the next steps.

CG Meeting Schedule

We reserved two slots in September CG meeting:
On 9/12, we are going to make a brief announcement that we have @titzer as a ch-champion of the EH proposal, and let people know that discussions are taking place in Github. On 9/26, we will have a 15-min slot for the recap of our proposed option and some possible discussions. I hoped we could do it all on 9/12 but the slot had already been fully reserved. And we plan to do a vote on the actual switch at the October hybrid CG meeting (remote option is available).

Proposed Instruction Format

And we’d like to propose below as our endorsed option to reintroduce exnref.

try blocktype
catch i li
catch j lj
...
catch_all l
  instruction*
end

In the folded format, this can be also written as

(try (result ...) (catch $i $li) (catch $j $lj) ... (catch_all $l)
  ...
)

When an exception is thrown within instruction*, it is checked against each of the tags in order in the list of catches, and if it is caught by one of the catches, the control flow branches to the corresponding label with extracted values and the exception’s exnref. So if a tag contains a single i32, its target label’s result type will be a multivalue of (i32, exnref). The catch_all label’s return type, if present, is just an exnref.

So an example of a try with two tags and a catch_all would be:

block $li (result …, exnref)
  block $lj  (result …, exnref)
    block $l (result exnref)
      try blocktype (catch i li) (catch j lj) (catch_all l)
        instruction*
      end
    end
    handler code for catch_all
  end
  handler code for catch j
end
handler code for catch i

This is a variant of Option B we presented on 8/1. The difference is we bundle the catch labels with try, preceding the instructions that they guard, for ease of decoding. Because catches are with the try, we end the block with a normal end, as in blocks and loops.

This option doesn’t have the many-partitioned blocks of try-catch-...-catch_all anymore; instead now try is a block that additionally has attached catch and catch_all clauses. Note that we don’t need the (do) syntax in the text format anymore. Also in binary format, the handler list can be just a vector of immediates as in br_table, so we don’t need actual opcodes for catch and catch_all.

(Our presentation on 8/1 incorrectly included br_on_exn for Option B, which is not necessary because catch pushes extracted values along with an exnref. The slides have been fixed.)

rethrow will take an explicit exnref, not the label. The current rethrow instruction, which takes a label, creates semantic dependency on the lexical context which has been a challenge to some compiler code transformations.

throw instruction will remain the same.

try-delegate will be removed from the new spec because we can achieve the same semantics using exnref and branches.

We chose this option because several stakeholders we talked to expressed preference for it. The reasons include:

More in line with other Wasm regular control flow instructions, e.g., blocks and branches
The number of required opcodes are about half of the current proposal (no try-delegate, no need for catch and catch_all opcodes)
Simpler decoding from the VM side

I was able to talk with several VM and toolchain authors and they said the plan seemed doable.

@rossberg prototyped this option in https://github.com/WebAssembly/exception-handling/tree/exnref-1b branch. You can also see the diff in main...exnref-1b#diff-667145a59d9b08f9350bb76322feec7bbe56dda640a84dd5a9f0a65097ec452e.

Migration Plan

If we choose to adopt this option, the migration will progress cautiously given the circumstance that the current EH instructions have been already shipped to the web. The VMs will be supporting both current and the new instructions for the foreseeable future so apps already deployed won’t be disrupted. Emscripten + LLVM toolchain will support the new option under a flag first, and switch to it after it is stabilized. We also plan to provide a converter tool from the old binary to the new format to facilitate the use of the new instructions without the need for recompilation. We have a tentative plan, or more of a hope, for the deprecation of the current instructions, but this will only happen when we can be sure that the usage of the old instructions is low enough, which may turn out infeasible. Before any deprecation, we will maintain the documentation of the current instructions in the supplemental Phase 3 documentation.

The text was updated successfully, but these errors were encountered:

eqrion · 2023-09-15T20:28:51Z

@aheejin This looks good to me and matches my understanding.

Two small text format questions:

Can we give the new try a different name? This would make it easier to know which opcode to emit in text parsers. We could guess based on whether the try is immediately followed by catch's or instructions, but this wouldn't allow the text format to use the new opcode for empty try's. For my prototyping, I have used the try_table opcode as this instruction feels similar to a br_table instruction. But open to other options.
Instead of rethrow could we name the new instruction throw_ref? That would make the instruction a bit more future proof if we added instructions to allocate exceptions without throwing them. In that case, the rethrow would really be the first throw.

rossberg · 2023-09-15T20:47:14Z

Yes, throw_ref makes sense. Not sure I like try_table, however — since the intent is to deprecate the old instructions, I'd rather give those cumbersome (re)names?

kg · 2023-09-16T03:19:28Z

Yes, throw_ref makes sense. Not sure I like try_table, however — since the intent is to deprecate the old instructions, I'd rather give those cumbersome (re)names?

Wouldn't renaming the old opcodes mean that old text-format wasm no longer compiles and produces confusing errors? Vs retiring the old opcodes with their names intact, and using new names for the new ones that allows straightforward 'x is deprecated, use y' error messages. I am not sure how I feel about try_table as a name but the relationship between the new try and br_table does seem pretty clear to me.

Changing rethrow to throw_ref <exnref> vs the old label-based one seems nice.

eqrion · 2023-09-18T16:12:44Z

Yes, throw_ref makes sense. Not sure I like try_table, however — since the intent is to deprecate the old instructions, I'd rather give those cumbersome (re)names?

I agree that it would be nice to rename the existing instructions. Practically speaking though, that breaks a lot of text format tests that we have right now, and is very disruptive to other users of the text format. I think we could consider it in the future, but for now I'd prefer to have a unique name for the 'new thing'.

rossberg · 2023-09-19T11:52:46Z

IIUC, the only ambiguity is the case of a try with no handlers. But in that case, both interpretations also happen to be semantically equivalent. Would it not work to stick to try for both and reinterpret that one case to be the new form?

eqrion · 2023-09-19T15:17:15Z

Yeah, that is the only grammatical ambiguity. It would be difficult to do though in the parser I work with (bytecode-alliance/wasm-tools), where no expression AST is built up for while parsing. There we can only see if the next tokens is an 'immediate handler' (catch $tag $label) when deciding which kind of try opcode to emit. We could not defer that decision until after we've seen the whole block to see if there were legacy catch delimiters.

So we could interpret an empty try as the old opcode easily, but not as the new one.

But aside from the implementation details, I would feel uncomfortable diverging the text format from the binary encoding details here. Even if we want to remove these instructions eventually, in the meantime we need to be able to precisely emit both of them for tests (even if semantically equivalent). And also see exactly which one a binary is using.

eqrion · 2023-09-19T15:22:56Z

I suppose another option could be to disallow an empty try for the new instructions. It'd be trivial for tools to rewrite those to blocks.

eqrion · 2023-09-19T15:44:12Z

I added #282 for some details on the exnref type. Feedback appreciated.

rossberg · 2023-09-19T17:12:28Z

It would be difficult to do though in the parser

I see.

I suppose another option could be to disallow an empty try for the new instructions. It'd be trivial for tools to rewrite those to blocks.

Let's not introduce special cases. I'd rather go with try_table in that case. :)

eqrion · 2023-09-28T15:35:32Z

I have a working implementation in SpiderMonkey that I hope to land soon. I had to fill in some binary details and some small execution semantics. I tried to summarize them here. Let me know if I got anything wrong.

Types

exnref is encoded as SLEB128(-0x17) also known as 0x69
exnref is a new type hierarchy, no subtyping with other reference types.

Exception handling with GC proposal should define a bottom ref type of nullexnref and bottom heap type of noexn.

ToJSValue(v, 'exnref'):
  throw TypeError(); // as v128 does today

ToWebAssemblyValue(v, 'exnref')
  throw TypeError(); // as v128 does today

Instructions

Text format

throw_ref

try_table blocktype (catch | catch_ref $tag $label)* (catch_all | catch_all_ref $catch_all_label)?
  instr*
end

Binary format

throw_ref ::= 0x0A

catch_kind ::=
  0x0 => catch
  0x1 => catch_ref

catch ::=
  $kind:catch_kind $tag:u32 $label:u32 => ($kind $tag $label)

catch_all ::=
  0x0 => nil
  0x1 $label:u32 => (catch_all $label)
  0x2 $label:u32 => (catch_all_ref $label)

try_table ::=
  0x1F $ty: blocktype $c: vec(catch) $ca: catch_all
    instr*
  end
    =>
  try_table $ty $c $ca
    instr*
  end

Validation

throw_ref is valid for type:
  [exnref] -> [t*]
for any t* (stack polymorphic, like throw and unreachable)

try_table $blocktype ($catch_kind $tag $label)* ($catch_all_kind $catch_all_label)?
  * validates as (block $blocktype) would
  * every ($catch_kind $tag $label) must refer to a valid tag index and label index
    * for kind == catch, the label must match a target type of [tag-params...]
    * for kind == catch_ref, the label must match a target type of [tag-params..., exnref]
  * any (catch_all $catch_all_label) must refer to a valid label index
    * for kind == catch_all, the label must match a target type of []
    * for kind == catch_all_ref, the label must match a target type of [exnref]

Execution

  throw_ref will throw the received exnref value, trap if it is null.

  try_table will execute as (block $blocktype) would if there are no thrown
  exceptions. If an exception is thrown, its tag is checked against the list
  of catches in order to find a match. If there is a match, then execution
  continues in that label with the exception unpacked. If there is no match
  and a catch_all, execution continues in that label. Otherwise, the exception
  is propagated upward in the control stack.

titzer · 2023-09-28T15:43:06Z

Should we also consider a case that unpacks and drops the exnref, such as catch_drop? I've argued elsewhere that that would allow engines to elide the package allocation for such catching paths and thus avoid significant overhead.

These types and instructions are part of the exception handling rework tracked in (WebAssembly/exception-handling#281).

rossberg · 2023-09-28T16:57:28Z

That mostly seems to match what I defined here and implemented in the reference interpreter, except for the choice of opcodes. I don't feel strongly about those, but if you made different choices you probably won't be able to run the test suite binaries generated by the interpreter for now.

kg · 2023-09-28T17:40:10Z

Should we also consider a case that unpacks and drops the exnref, such as catch_drop? I've argued elsewhere that that would allow engines to elide the package allocation for such catching paths and thus avoid significant overhead.

fwiw, we would use this in our JIT and (if llvm exposed it) I think we might use it in our AOT compiler too. We use exceptions more like signals ('something happened, please unwind') than values containing information

These types and instructions are part of the exception handling rework tracked in (WebAssembly/exception-handling#281).

eqrion · 2023-09-28T18:43:28Z

That mostly seems to match what I defined here and implemented in the reference interpreter, except for the choice of opcodes. I don't feel strongly about those, but if you made different choices you probably won't be able to run the test suite binaries generated by the interpreter for now.

@rossberg

Oh, I completely forgot you implemented this already. I adopted your opcodes for throw_ref and try_table. It looks like your exnref type code overlaps with GC though, so I am keeping it at -0x17. I'll edit the above post too to be up to date.

Should we also consider a case that unpacks and drops the exnref, such as catch_drop? I've argued elsewhere that that would allow engines to elide the package allocation for such catching paths and thus avoid significant overhead.

@titzer

That would be fine with me. I think the simplest way to do that would be to add a flag in the binary for each of the catches for whether it's a catch_ref (which has the exnref) or plain catch (which does not). And the existing binary flag for catch_all could be a three-way for 'no catch_all' 'catch_all_ref' and 'catch_all'.

WDYT?

titzer · 2023-09-28T19:22:01Z

@eqrion A flag for each catch sounds fine, and the catch_all that drops sounds fine as well.

rossberg · 2023-09-29T06:11:33Z

Extra flag byte sounds fine to me and is extensible. (If we wanted to save the extra byte we could also use a hack and represent the labelidx (or the tagidx) as a signed LEB, and abuse the sign as the flag. But I don't think that's worth it.)

Oops, yes, you are right about the type opcode. I changed it to -0x17 as well, thanks.

titzer · 2023-09-29T16:59:01Z

Agree that flags byte is probably the best bet.

aheejin · 2023-09-29T20:17:08Z

About the catch_drop optimization, would it be effective when the whole program uses it, or it's worth it if only a part of the program uses it? In Emscripten (+ LLVM)'s C++ exception support, I'm not sure how many trys can use the optimized version.

try {
  ...
} catch (int i) {
}

This kind of code needs exnref because if the exception is not an int we should rethrow. Also all destructor calls, which consists of I guess more than 95% of try usage in most C++ programs, needs exnref because we need to rethrow the exception after running the destructors.

Which is to say, not sure how effectively we can use the catch_drop version in C++ support. It may be useful for other toolchains though.

eqrion · 2023-10-02T16:27:31Z

About the catch_drop optimization, would it be effective when the whole program uses it, or it's worth it if only a part of the program uses it? In Emscripten (+ LLVM)'s C++ exception support, I'm not sure how many trys can use the optimized version.

My understanding is this would only help in cases where the nearest enclosing try for a throw is a catch_drop. I agree, it seems not useful for C++, but possibly for others. If the binary size of the extra flag byte becomes an issue, we could maybe use Andreas suggestion (or some other idea) to minimize the cost for C++.

I updated the previous comment that documents what SM is targeting. I chose [catch/catch_ref] and [catch_all/catch_all_ref] instead of [_drop] terminology, as I think that matches the rest of the language design better. But it's not that important.

* wast: Add exnref, try_table, and throw_ref These types and instructions are part of the exception handling rework tracked in (WebAssembly/exception-handling#281). * Add catch_ref and catch_all_ref as catch variants that capture the exnref * Fix order of flag bytes

In Wasm EH, even if we set `-sASSERTION` or `-sEXCEPTION_STACK_TRACES`, if we rethrow an exception in the code, we lose the effect of that option because previously we called `__throw_exception_with_stack_trace` only in `__cxa_throw`: https://github.com/emscripten-core/emscripten/blob/9ce7020632aa6f7578c6e40e197b800a4dd7073f/system/lib/libcxxabi/src/cxa_exception.cpp#L294-L296 This adds the same mechanism to `__cxa_rethrow` (which is used for C++ `throw;`) and `__cxa_rethrow_primary_exception` (which is used for `std::rethrow_exception`). This does not solve the problem of losing stack traces _before_ the rethrowing. libc++abi's `__cxa_rethrow` and `__cxa_rethrow_primary_exception` are implemented as throwing a pointer in the same way we first throw it and they are not aware of any metadata. This happens even in the native platform with GDB; GDB's backtrace only shows stack traces after rethrowing. We may try to fix this later by passing `exnref` (WebAssembly/exception-handling#281) to the library, but this is likely to take some time. Partially fixes emscripten-core#20301.

rossberg · 2023-10-03T07:30:45Z

I'm fine with or without dropping versions. I don't think the extra byte is an issue, since catch clauses shouldn't exactly be frequent, and their size is certainly dominated by the block structure needed for the jump targets. Even if we decided not to have the dropping versions now, it'd probably be wise to reserve the extra byte.

In Wasm EH, even if we set `-sASSERTION` or `-sEXCEPTION_STACK_TRACES`, if we rethrow an exception in the code, we lose the effect of that option because previously we called `__throw_exception_with_stack_trace` only in `__cxa_throw`: https://github.com/emscripten-core/emscripten/blob/9ce7020632aa6f7578c6e40e197b800a4dd7073f/system/lib/libcxxabi/src/cxa_exception.cpp#L294-L296 This adds the same mechanism to `__cxa_rethrow` (which is used for C++ `throw;`) and `__cxa_rethrow_primary_exception` (which is used for `std::rethrow_exception`). This does not solve the problem of losing stack traces _before_ the rethrowing. libc++abi's `__cxa_rethrow` and `__cxa_rethrow_primary_exception` are implemented as throwing a pointer in the same way we first throw it and they are not aware of any metadata. This happens even in the native platform with GDB; GDB's backtrace only shows stack traces after rethrowing. We may try to fix this later by passing `exnref` (WebAssembly/exception-handling#281) to the library, but this is likely to take some time. Partially fixes #20301.

rossberg · 2023-10-12T15:36:14Z

The exnref-1b branch now has the latest instruction format (Option B').

rossberg · 2023-10-14T10:52:16Z

I also updated the spec document for Option B', see #283. That should check all the spec and interpreter boxes for moving Option B' to phase 4. Now it's up to implementations and toolchains. :)

keithw · 2023-10-14T16:58:54Z

From one runtime-implementer perspective, it would be helpful if we could

merge this to main so the testsuite repo can pick these changes up
update binary.wast to test the changes made to the binary format (e.g. ExnRefType/try_table/throw_ref and the encoding of catches)
have the "legacy" tests also visible in the testsuite repo somehow (for implementations that want to preserve support)

rossberg · 2023-10-14T17:15:39Z

@keithw, happy to merge this to main once the code review is done.

For the test suite, see #284.

Agreed regarding more binary tests, but somebody else should perhaps volunteer for that one. :) In fact, it'd be good to have more tests for this proposal in general.

keithw · 2023-10-14T17:49:42Z

In fact, it'd be good to have more tests for this proposal in general.

Agreed -- I hope we can find a way to merge #270 and #277 now (after appropriate review).

trcrsired · 2024-01-30T01:30:12Z

This situation is deteriorating rapidly. Your repeated changes have significantly disrupted the implementation of EH in wavm. My efforts to incorporate exception handling into a wasm virtual machine are being hampered by your changes. It's disappointing and frustrating to see the obstacles you're creating.

Also, you do not change llvm itself, how could i supposed to support this if you do not support LLVM?

conrad-watt · 2024-01-30T03:59:48Z

@trcrsired if you have concerns about the state of the EH proposal that go beyond the text/binary encoding, can we discuss them in this issue?

eqrion added a commit to eqrion/wasm-tools that referenced this issue Sep 28, 2023

wast: Add exnref, try_table, and throw_ref

e8a7c3f

These types and instructions are part of the exception handling rework tracked in (WebAssembly/exception-handling#281).

eqrion mentioned this issue Sep 28, 2023

wast: Add exnref, try_table, and throw_ref bytecodealliance/wasm-tools#1225

Merged

eqrion added a commit to eqrion/wasm-tools that referenced this issue Sep 28, 2023

wast: Add exnref, try_table, and throw_ref

53f978f

These types and instructions are part of the exception handling rework tracked in (WebAssembly/exception-handling#281).

eqrion added a commit to eqrion/wasm-tools that referenced this issue Sep 28, 2023

wast: Add exnref, try_table, and throw_ref

1ddd56d

These types and instructions are part of the exception handling rework tracked in (WebAssembly/exception-handling#281).

aheejin mentioned this issue Oct 2, 2023

[EH] Make Wasm EH rethrow print stack traces emscripten-core/emscripten#20372

Merged

conrad-watt mentioned this issue Jan 30, 2024

Reintroducing exnref #280

Open

hoodmane mentioned this issue Oct 21, 2024

Add support for wasm exception handling to Emscripten target rust-lang/rust#131830

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed instruction format for reintroducing exnref #281

Proposed instruction format for reintroducing exnref #281

aheejin commented Sep 6, 2023 •

edited

Loading

eqrion commented Sep 15, 2023

rossberg commented Sep 15, 2023

kg commented Sep 16, 2023

eqrion commented Sep 18, 2023

rossberg commented Sep 19, 2023

eqrion commented Sep 19, 2023

eqrion commented Sep 19, 2023

eqrion commented Sep 19, 2023

rossberg commented Sep 19, 2023

eqrion commented Sep 28, 2023 •

edited

Loading

titzer commented Sep 28, 2023

rossberg commented Sep 28, 2023 •

edited

Loading

kg commented Sep 28, 2023

eqrion commented Sep 28, 2023

titzer commented Sep 28, 2023

rossberg commented Sep 29, 2023

titzer commented Sep 29, 2023

aheejin commented Sep 29, 2023

eqrion commented Oct 2, 2023

rossberg commented Oct 3, 2023

rossberg commented Oct 12, 2023

rossberg commented Oct 14, 2023

keithw commented Oct 14, 2023 •

edited

Loading

rossberg commented Oct 14, 2023

keithw commented Oct 14, 2023

trcrsired commented Jan 30, 2024 •

edited

Loading

conrad-watt commented Jan 30, 2024 •

edited

Loading

Proposed instruction format for reintroducing exnref #281

Proposed instruction format for reintroducing exnref #281

Comments

aheejin commented Sep 6, 2023 • edited Loading

CG Meeting Schedule

Proposed Instruction Format

Migration Plan

eqrion commented Sep 15, 2023

rossberg commented Sep 15, 2023

kg commented Sep 16, 2023

eqrion commented Sep 18, 2023

rossberg commented Sep 19, 2023

eqrion commented Sep 19, 2023

eqrion commented Sep 19, 2023

eqrion commented Sep 19, 2023

rossberg commented Sep 19, 2023

eqrion commented Sep 28, 2023 • edited Loading

Types

Instructions

Text format

Binary format

Validation

Execution

titzer commented Sep 28, 2023

rossberg commented Sep 28, 2023 • edited Loading

kg commented Sep 28, 2023

eqrion commented Sep 28, 2023

titzer commented Sep 28, 2023

rossberg commented Sep 29, 2023

titzer commented Sep 29, 2023

aheejin commented Sep 29, 2023

eqrion commented Oct 2, 2023

rossberg commented Oct 3, 2023

rossberg commented Oct 12, 2023

rossberg commented Oct 14, 2023

keithw commented Oct 14, 2023 • edited Loading

rossberg commented Oct 14, 2023

keithw commented Oct 14, 2023

trcrsired commented Jan 30, 2024 • edited Loading

conrad-watt commented Jan 30, 2024 • edited Loading

aheejin commented Sep 6, 2023 •

edited

Loading

eqrion commented Sep 28, 2023 •

edited

Loading

rossberg commented Sep 28, 2023 •

edited

Loading

keithw commented Oct 14, 2023 •

edited

Loading

trcrsired commented Jan 30, 2024 •

edited

Loading

conrad-watt commented Jan 30, 2024 •

edited

Loading