RTASM.cbi

#[c]RTASM - Runtime Assembler for C++

(C) Copyright 2005, 2012 Kristian Dupont Knudsen. MIT license -- see LICENSE.md for details
This file is most easily read with Code Browser: http://tibleiz.net/code-browser/

#[l]:README.md:README.md
#[l]:LICENSE.md:LICENSE.md


#[c]Code:
#[l]:	assembler.h:assembler.h
#[l]:	assembler.cpp:assembler.cpp
#[l]:	language.h:language.h

#[c]Test:
#[l]:	test.cpp:test.cpp
#[l]:	assembler_test.h:assembler_test.h
#[l]:	assembler_test.cpp:assembler_test.cpp
#[l]:	language_test.h:language_test.h
#[l]:	language_test.cpp:language_test.cpp

#[c]Precompiled headers:
#[l]:	stdafx.h:stdafx.h
#[l]:	stdafx.cpp:stdafx.cpp

#[c]Knowledge base:
#[of]:	A Summary of the 80486 Opcodes and Instructions
#[c]A Summary of the 80486 Opcodes and Instructions

Original source:
http://groups.google.dk/groups?selm=3gmn58%24rbk%40DGS.dgsys.com&output=gplain
 
[ Article crossposted from alt.lang.asm ]
[ Author was Mark Hopkins ]
[ Posted on 31 Jan 1995 01:17:40 -0600 ]

#[of]:(1) The 80x86 is an Octal Machine
(1) The 80x86 is an Octal Machine
   This is a follow-up and revision of an article posted in alt.lang.asm on
7-5-92 concerning the 80x86 instruction encoding.
 
   The only proper way to understand 80x86 coding is to realize that ALL 80x86
OPCODES ARE CODED IN OCTAL.  A byte has 3 octal digits, ranging from 000 to
377.  In fact, each octal group (000-077, 100-177, etc.) tends to encode a
specific variety of operation.  All of these are features inherited from the
8080/8085/Z80.
   For some reason absolutely everybody misses all of this, even the Intel
people who wrote the reference on the 8086 (and even the 8080).  The opcode
scheme outlined briefly below is expanded starting in the 80386, but
consistently with the overall scheme here.
 
   As an example to see how this works, the mov instructions in octal are:
 
210 xrm         mov Eb, Rb
211 xrm         mov Ew, Rw
212 xrm         mov Rb, Eb
213 xrm         mov Rw, Ew
214 xsm         mov Ew, SR
216 xsm         mov SR, Ew
 
The meanings of the octal digits (x, m, r, s) and their correspondence to the
operands (Eb, Ew, Rb, Rw, SR) are the following:
 
The digit r (0-7) encodes the register operand as follows:
REGISTER (r):                0   1   2   3   4   5   6   7
   Rb = Byte-sized register AL  CL  DL  BL  AH  CH  DL  BH
   Rw = Word-sized register AX  CX  DX  BX  SP  BP  SI  DI
 
The segment register digit s (0-7) encodes the segment register as follows:
SEGMENT REGISTER (s):      0   1   2   3   4   5   6   7
   SR = Segment register  ES  CS  SS  DS    <Reserved>
 
The digits x (0-3), and m (0-7) encode the address mode according to
the following scheme.  One or more bytes (labeled: Disp) may immediately
follow xrm as described below.
 
TABLE 1:     16-BIT ADDRESSING MODE (x, m):
   Eb = Address of byte-sized object in memory or register
   Ew = Address of word-sized object in memory or register
   Dw = Unsigned word
   Dc = Signed byte ("character"), range: -128 to +127 (decimal).
   Db = Unsigned byte
 
   x  m  Disp  Eb  Ew
   ------------------
   3  r        Rb  Rw
   0  6   Dw   DS:[Dw]
   0  m        Base:[0]   (except for xm = 06).
   1  m   Dc   Base:[Dc]
   2  m   Dw   Base:[Dw]
 
   x  0  Disp  DS:[BX + SI + Disp]
   x  1  Disp  DS:[BX + DI + Disp]
   x  2  Disp  SS:[BP + SI + Disp]
   x  3  Disp  SS:[BP + DI + Disp]
   x  4  Disp  DS:[SI + Disp]
   x  5  Disp  SS:[DI + Disp]
   x  6  Disp  DS:[BP + Disp]   (except for xm = 06)
   x  7  Disp  DS:[BX + Disp]
 
This expands into the following table:
 
TABLE 1a:     16-BIT ADDRESSING MODE (x, m) for the expansion impaired. :)
xm    Eb/Ew			xm Eb/Ew              			xm Eb/Ew              			xm Eb/Ew
00     DS:[BX + SI]	10 Dc DS:[BX + SI + Dc]	20 Dw DS:[BX + SI + Dw]	30 AL/AX
01     DS:[BX + DI]	11 Dc DS:[BX + BI + Dc]	21 Dw DS:[BX + DI + Dw]	31 CL/CX
02     SS:[BX + SI]	12 Dc SS:[BP + SI + Dc]	22 Dw SS:[BP + SI + Dw]	32 DL/DX
03     SS:[BX + DI]	13 Dc SS:[BP + DI + Dc]	23 Dw SS:[BP + DI + Dw]	33 BL/BX
04     DS:[SI]		14 Dc DS:[SI + Dc]			24 Dw DS:[SI + Dw]		34 AH/SP
05     DS:[DI]		15 Dc DS:[DI + Dc]			25 Dw DS:[DI + Dw]		35 CH/BP
06     Dw DS:[Dw]	16 Dc SS:[BP + Dc]			26 Dw SS:[BP + Dw]		36 DH/SI
07     DS:[BX]		17 Dc DS:[BX + Dc]			27 Dw DS:[BX + Dw]		37 BH/DI
 
Operands where x is 0, 1, or 2 are all pointers.  If the instruction is a WORD
instruction (211, 213, 214, 216 are), then this pointer addresses a
word-sized object.  The format of the object at the indicated address will
always be low-order byte first, and high-order byte second.  Otherwise the
instruction is a BYTE instruction (210, 212) and the pointer addresses
byte-sized object at the indicated address.
 
The default segments (DS:, SS:) can be overridden with a segment prefix.  In
all cases it's understood that everything has the default segment DS, except
for the two stack/frame pointers (BP and SP) whose default segment is SS.
That will be explained below.
 
Modes where x = 1, or 2 will require displacement bytes (Dc or Dw) to follow
the opcode as explained above.
 
When x = 3, WORD sized instructions address the word registers (AX, CX, ...)
and the BYTE size instructions the byte registers (AL, CL, ...).
 
EXAMPLE 1: The instruction opcode: 210 135 375
   Here, xm = 15, and r = 3, so the operands are:
 
                            mov Eb, Rb
                               =>
                    mov byte ptr DS:[DI + Dc], BL
 
The displacement, Dc, is 375 (or fd in hexadecimal), which is the signed byte
-3.  So the instruction reads:
 
                    mov byte ptr DS:[DI - 3], BL
 
or just:
                         mov [DI - 3], BL
 
In C-like notation, the meaning of this operation would be:
 
                    ((byte *)DS) [DI - 3] = BL;
 
EXAMPLE 2: The instruction opcode: 216 332
   Here, xm = 32, and s = 3, so the operands are:
 
                           mov SR, Ew
                               =>
                           mov DS, DX
 
A move to CS is not possible (because the far jump instruction already does
that) so that the opcode sequence:
 
                             216 x2m
 
is free to be used for encoding something else.
 
EXAMPLE 3: As an illustration of why it's better to think in octal, just look
           at the opcodes for the binary arithmetic instructions:
 
0P0 xrm         Op Eb, Rb
0P1 xrm         Op Ew, Rw
0P2 xrm         Op Rb, Eb
0P3 xrm         Op Rw, Ew
0P4 Db          Op AL, Db
0P5 Dw          Op AX, Dw
 
They all have the same form, with a single digit encoding the operator as
follows:
                  P     Op          P     Op
                  0    add          1     or
                  2    adc          3    sbb
                  4    and          5    sub
                  6    xor          7    cmp
 
That's a good fraction of your reference table right there.
 
EXAMPLE 4: The same mapping is used in the immediate to memory/register form
           of these operations:
 
200 xPm Db      Op Eb, Db
201 xPm Dw      Op Ew, Dw
203 xPm Dc      Op Ew, Dc
#[cf]
#[of]:(2) An Outline of 80x86 Instructions and Encoding
(2) An Outline of 80x86 Instructions and Encoding
   The authors of 8080 and 8086 references (including Intel's own references)
are apparently not aware of the octal nature of their own machines, and the
result is an almost grotesque complication and bungling up in the presentation
of something that is actually fairly simple.  Thus, people claim that it's
almost impossible to know 8086 binary by heart, whereas in fact I know most of
it by memory.  I'll straighten out the mess for you here.
 
   As alluded to above, instructions are encoded as follows:
 
                            op xrm Const
 
   where * op is a 1 or 2 byte opcode,
         * xrm (if present) constitutes 3 octal digits whose normal uses are:
               r = Register operand, xm = Memory or Register operand.
           It may be followed immediately by a "displacement" byte or word,
           depending solely on the digits x and m.
         * Const (if present) denotes a byte or word value whose presence and
           format depends solely on what op (and sometimes xrm) is.
 
In some cases, the opcode itself may be separated out into octal digits, e.g.
 
                 0s6 = push (Segment Register #s).
 
   The one major exception to the coding scheme are all the conditional code
operations.  Since there are 16 distinct conditional codes, they are
represented as a hexadecimal digit.  The conditional jump in octal ranges
from 160 to 177, which is 7x in hexadecimal, where x is a hex digit encoding
the jump's condition.  I'll represent them by the format: 160+CC.
 
The register and address encoding was described above.  The '386 expands on
this a little with the addition of two segment registers:
 
SEGMENT REGISTER (s):      0   1   2   3   4   5   6   7
   SR = Segment register  ES  CS  SS  DS  FS  GS <Reserved>
 
In TABLE 1, note that the addresses encoded on modes 0m, 1m, 2m are the same
regardless of whether you're referring to Eb or Ew.  What distinguishes them
is the size of the object being pointed to and this can be explicitly
indicated in traditional '86 assemblers like the following examples:
 
                    byte ptr [BP]
                    word ptr [BX + DI]
 
As explained before, all addresses, except those involving BP refer to the
data segment, DS.  All the BP's refer to the stack segment, SS.  This
is about to be explained.
 
#[cf]
#[of]:(3) Segmentation and Registers
(3) Segmentation and Registers
   The 80x86 was designed with more or less specific uses for its registers.
In fact, the names are supposed to reflect their main uses:
 
                 AX (AH:AL) = Accumulator
                 BX (BH:BL) = Base Register
                 CX (CH:CL) = Counting Register
                 DX (DH:DL) = Data Register
 
    CS = Code Segment -- where constants and programs lie.
    DS = Data Segment -- where static variables lie.
    SS = Stack Segment -- where auto variables and function parameters lie.
         SP, BP = Stack and Frame Pointers, used to segment out the
                  local variables and function parameters.
    ES = Extra Segment -- used in combination with the index registers for
         string operations as follows:
         DS:[SI] -- points to the Source of the string operation.
         ES:[DI] -- points to the Destinction of the string operation.
 
The typical setup for the stack is as follows:
 
      High Addresses       FUNCTION DEFINITION:    FUNCTION CALL:
      ...                  mov BP, SP              push Parameters
      Parameters           push BP                 call Function
      Return Address       sub SP, Locals
BP -> Old BP               ... function ...
      Local Variables      mov SP, BP
SP -> ...                  pop BP
      Low Addresses        ret Parameters
 
this dictates a certain protocol in calling functions with parameters and
returning from them, as shown above.  In fact, this is so much so that the
opening and closing sequences above have all been defined as single operations
starting with the 80286 so that the function definition above can be rewritten
as:
                           FUNCTION DEFINITION:
                           enter Locals, 0
                           ... function ...
                           leave
                           ret Parameters
 
#[cf]
#[of]:(4) Word and Address Size on the 80386 and Above
(4) Word and Address Size on the 80386 and Above
   Starting with the 80386, operations can be done with not just 16-bit words
but also 32 bit words.  Generally the same operation is defined for both sets
and context is used to determine which is which in the following two ways:
 
      * Which mode the machine is running in
        Protected mode -- both word sizes and address sizes are 32-bits
        Real & Virtual modes -- 16-bits.
      * The presence of certain prefixes to override either the default
        word size, address size or both on an instruction-by-instruction
        basis.
 
   (a) Word Size
   When the word size for the current operation is 32-bits, everything listed
above as "word" is interpreted as 32-bits, including registers.  The register
numbering corresponding to this word size is:
 
REGISTER (r):                  0   1   2   3   4   5   6   7
   Rb = Byte-sized register   AL  CL  DL  BL  AH  CH  DL  BH
   Rd = Dword-sized register EAX ECX EDX EBX ESP EBP ESI EDI
 
   (b) Address Size
   When the address size is switched to 32-bits, the address scheme listed in
TABLE 1 is altered in its entirety.
 
   TABLE 2: 32-BIT ADDRESSING MODE (x, m):      Encoding of scaled index SI:
   x  m    Disp  Eb  Ew                         si       SI
   --------------------                         ---------------
   0  6     Dw   DS:[Dw]                        s0    EAX * 2^s
   0  4 sir      [Rd + SI + 0]                  s1    ECX * 2^s
   1  4 sir Dc   [Rd + SI + Dc]                 s2    EDX * 2^s
   2  4 sir Dw   [Rd + SI + Dw]                 s3    EBX * 2^s
   0  r          [Rd + 0]  (except r = 4, 6)    04        0
   1  r     Dc   [Rd + Dc] (except r = 4)       s5    EBP * 2^s
   2  r     Dw   [Rd + Dw] (except r = 4)       s6    ESI * 2^s
   3  r          Rb  Rw                         s7    EDI * 2^s
 
The encodings si = 14, 24 and 34 remain undefined.
 
   This alteration is INDEPENDENT of the word size setting.  That means that
even the "Dw"'s, "Rw"'s in the chart above will vary in interpretation as
16-bit or 32-bit objects depending on the word size setting.  That leads to
4 possible combinations, not just 2.
 
EXAMPLE 5:  The opcode sequence 211 135 375
   This is the operation
                      mov Ew, Rw
where xm = 15, r = 3 and Disp = -3.  The 4 combinations are:
 
Addr-Size  Word-Size Operation
   16         16     mov word ptr [DI - 3], BX
   16         32     mov dword ptr [DI - 3], EBX
   32         16     mov word ptr [EBP - 3], BX
   32         32     mov dword ptr [EBP - 3], EBX
 
EXAMPLE 6: The opcode sequence 211 134 302 375 with 32-bit addressing.
   This is the move instruction where xm = 14 and r = 3.
 
                    mov Ew, [E]BX     ([E]BX since r = 3)
 
It uses the indexed register addressing.  The address, Ew, may be derived
as follows:
             x m  sir Disp      Ew                     Comments
             1 4  sir Dc   [EDX + SI + Dc]
             1 4  si2 375  [EDX + SI - 3]        (Rd = EDX for r = 2)
             1 4  302 375  [EDX + 8*EAX - 3]     (SI = 8*EAX for si = 30)
 
Therefore, this instruction represents one of the following:
 
         Word-Size     Operation: 211 134 302 375
            16         mov word ptr [EDX + 8*EAX - 3], BX
            32         mov dword ptr [EDX + 8*EAX - 3], EBX
 
#[cf]
#[of]:(5) The Opcode Summary
(5) The Opcode Summary
   The chart below summarises all the machine instructions.  The following
abbreviations are used:
 
Registers:                 Immediate Data Constant:
   Rb (byte sized)         Db (byte sized)
   Rw (word sized)         Dw (word sized)
   Rd (dword sized)        Dc (signed byte)
 
Register/Memory Address:   Relative Code Address:
   Eb (byte sized)         Cb (byte sized)
   Ew (word sized)         Cw (word sized)
 
Memory Address:                 Code Address:
   Es (16 bit selector)         Af (32/48 bit absolute far code address)
   En (near 16/32 bit pointer)
   Ef (far i32/48 bit pointer)
   Ep (pointer to 6-byte object)
   Ea (generic address)
 
Processor Extensions:
* = 80186 extension
$ = 80286 extension
# = 80386 extension
@ = 80486 extension
 
The switch between 16 and 32 bit word size affects all operands labeled
Rw, Ew, Dw, Cw, En and even Af and Ef.  The latter two objects refer to
far code addresses which are 4 bytes when the word size is 16 bits, and
6 bytes else.
 
The only such operands not actually affected by the word-size switch are
those whose size a consequence of the operation's meaning.  These include
the following: RET, BOUND, ARPL, SMSW, LMSW, LAR and LSL.
 
The switch between 16 and 32 bit address size affects all the operands
labeled Eb, Ew, Es, En, Ef, Ep, and Ea.  Each of these is interpreted
according to the xm digts in the opcode according to either the 16-bit
addres table described near the start of the article or the 32-bit address
table just described above.
 
NOTE: In the following presentation everything is in octal.
 
#[of]:ARITHMETIC & LOGIC
#[c]ARITHMETIC & LOGIC
#[c]------------------
#[c]Comments:
#[c]   * All of these operations affect all 6 arithmetic flags, except NOT (which
#[c]     affects no flags), and INC and DEC (which don't affect CF).
#[c]   * IMUL and MUL only affect CF and OF predictably.
#[c]   * IDIV and DIV affect no flags predictably.
#[c]   * AND, OR, XOR, and TEST all set CF and OF to 0 and alter AF unpredictably.
#[c]   * CMP and TEST have no affect on any operands.  They're used for setting
#[c]     flags.  CMP is used for doing relational operators (< > <= >= == !=), and
#[c]     TEST for doing bit-testing.
#[c]   * CMP and TEST can have their operands listed in either order.
#[c]
#[c]P Op			Description
#[c]0 ADD L, E		L += E
#[c]2 ADC L, E		L += E + CF
#[c]5 SUB L, E		L -= E
#[c]3 SBB L, E		L -= E + CF
#[c]7 CMP L, E		(void)(L - E)
#[c]1 OR L, E		L |= E
#[c]4 AND L, E		L &= E
#[c]6 XOR L, E		L ^= E
#[c]  0P0 xrm		Op Eb, Rb
#[c]  0P1 xrm		Op Ew, Rw
#[c]  0P2 xrm		Op Rb, Eb
#[c]  0P3 xrm		Op Rw, Ew
#[c]  0P4 Db		Op AL, Db
#[c]  0P5 Dw		Op AX, Dw
#[c]  200 xPm Db	Op Eb, Db
#[c]  201 xPm Dw	Op Ew, Dw
#[c]  203 xPm Dc	Op Ew, Dc
#[c] 
#[c]NOT L			L = ~L
#[c]  366 x2m		not Eb
#[c]  367 x2m		not Ew
#[c]
#[c]NEG L			L = -L
#[c]  366 x3m		neg Eb
#[c]  367 x3m		neg Ew
#[c] 
#[c]INC L			L++
#[c]  10r			inc Rw
#[c]  376 x0m		inc Eb
#[c]  377 x0m		inc Ew
#[c]
#[c]DEC L			L--
#[c]  11r			dec Rw
#[c]  376 x1m		dec Eb
#[c]  377 x1m		dec Ew
#[c] 
#[c]TEST L, E		(void)(L&E)
#[c]  204 xrm		test Rb, Eb
#[c]  205 xrm		test Rw, Ew
#[c]  250 Db		test AL, Db
#[c]  251 Dw		test AX, Dw
#[c]  366 x0m Db	test Eb, Db
#[c]  367 x0m Dw	test Ew, Dw
#[c] 
#[c]IMUL L, E, D		L = (signed)E*D
#[c]IMUL L, E		L = (signed)L*E
#[c]# 017 257 xrm Dw   imul Rw, Ew
#[c]* 151 xrm Dw	imul Rw, Ew, Dw
#[c]* 153 xrm Db	imul Rw, Ew, Dc
#[c] 
#[c]In the following operations:
#[c]         Operand Size   ACC'     ACC
#[c]              1         AX       AL
#[c]              2        DX:AX     AX
#[c]              4       EDX:EAX   EAX
#[c]P Op           Description
#[c]4 MUL E        ACC' = (unsigned) ACC*E
#[c]5 IMUL E       ACC' = (signed)   ACC*E
#[c]6 DIV E        ACC' = (unsigned) ACC%E : ACC/E
#[c]7 IDIV E       ACC' = (signed)   ACC%E : ACC/E
#[c]  366 xPm          Op Eb
#[c]  367 xPm          Op Ew
#[c] 
#[cf]
#[of]:SHIFTS & ROTATIONS
#[c]SHIFTS & ROTATIONS
#[c]------------------
#[c]Comments:
#[c]   * Where applicable, N is masked off by 0x1f.
#[c]   * For Rxx and Sxx, OF is predictably affected only when N is 1.
#[c]   * SHLD and SHRD affect all 6 arithmetic flags, but OF and AF unpredictably.
#[c]   * RxL: OF = (CF != high order bit of L) before shift
#[c]   * RxR: OF = (high order bit of L != next high order bit of L) before shift
#[c]   * SxL: OF = (CF != sign bit of L) after shift
#[c]   * SxR: OF = (sign bit of L) after shift
#[c]
#[c]P Op			Description
#[c]0 ROL			CF <- [<-<-<-] <- high order bit   Rotate
#[c]1 ROR			low order bit -> [->->->] -> CF
#[c]2 RCL			CF <- [<-<-<-] <- CF               Rotate Through CF
#[c]3 RCR			CF -> [->->->] -> CF
#[c]4 SHL			CF <- [<-<-<-] <- 0                Shift (unsigned)
#[c]5 SHR			0 -> [->->->] -> CF
#[c]4 SAL			CF <- [<-<-<-] <- 0                Shift (signed)
#[c]7 SAR			sign bit -> [->->->] -> CF
#[c]* 300 xPm Db	Op Eb, Db
#[c]* 301 xPm Db	Op Ew, Db
#[c]  320 xPm		Op Eb, 1
#[c]  321 xPm		Op Ew, 1
#[c]  322 xPm		Op Eb, CL
#[c]  323 xPm		Op Ew, CL
#[c] 
#[c]SHLD L, E, N	CF:L = L:E << N
#[c]SHRD L, E, N	L:CF = E:L >> N
#[c]# 017 244 Db	shld Ew, Rw, Db
#[c]# 017 245		shld Ew, Rw, CL
#[c]# 017 254 Db	shrd Ew, Rw, Db
#[c]# 017 255		shrd Ew, Rw, CL
#[c] 
#[cf]
#[of]:TYPE CONVERSIONS
#[c]TYPE CONVERSIONS
#[c]----------------
#[c][] Decimal Conversions
#[c]Comments:
#[c]   * DAA and DAS are used for adjusting the results of addition and subtraction
#[c]     respectively back to packed BCD format.  They will alter all 6 of the
#[c]     arithmetic flags, OF unpredictably.
#[c]   * AAA, AAS, AAD, and AAM are used for adjusting the results of the four
#[c]     basic arithmetic operations back to unpacked BCD format or ASCII format.
#[c]     However, AAD is used *before* a divide operation.  They too affect all
#[c]     6 of the arithmetic flags, but only AF and CF predictably (for AAA and
#[c]     AAS) or SF, ZF and PF (for AAD and AAM).
#[c]   * In the following, A0 stands for the lower 4 bits of AL and A1 the upper
#[c]     4 bits of AL.
#[c]   * The binary codes for AAM and AAD each consist of an opcode followed by
#[c]     a constant 10 (012 in octal).  It has been said that this "10" is
#[c]     actually a hidden parameter to a more general AAD and AAM operator,
#[c]     which can actually be used for any base other than 10.  Some processors
#[c]     will not allow AAD to be generalized in this way, however.  The reason it
#[c]     was left out in the open like this was supposedly because the original
#[c]     8086 design literally ran out of space to pack in the opcode.
#[c]
#[c]DAA			if (A0 > 9) AF = 1;   if (AF) AL += (0x10 - 10);
#[c]				if (A1 > 9) CF = 1;   if (CF) AL += (0x10 - 10)*0x10;
#[c]DAS			if (A0 > 9) AF = 1;   if (AF) AL -= (0x10 - 10);
#[c]				if (A1 > 9) CF = 1;   if (CF) AL += (0x10 - 10)*0x10;
#[c]AAA			if (A0 > 9) AF = 1;  CF = AF;  if (CF) A0 += (0x10 - 10), AH++;
#[c]AAS			if (A0 > 9) AF = 1;  CF = AF;  if (CF) A0 -= (0x10 - 10), AH--;
#[c]AAM			AX = AL/10 : AL%10
#[c]AAD			AX = (10*AH + AL)%0x10
#[c]  047			daa
#[c]  057			das
#[c]  067			aaa
#[c]  077			aas
#[c]  324 012		aam
#[c]  325 012		aad
#[c] 
#[c][] Sign Conversions
#[c]Comments:
#[c]   * In converting from a shorter to longer operand size, sign conversion
#[c]     involves either taking the leading (sign) bit and replicating it leftward
#[c]     (conversion to signed), or placing zero's on the left (for conversion
#[c]     to unsigned).
#[c]     
#[c]MOVSX L, E		L = (signed)E
#[c]MOVZX L, E		L = (unsigned)E
#[c]# 017 266 xrm	movsx Rw, Eb
#[c]# 017 267 xrm	movsx Rw, Ew
#[c]# 017 266 xrm	movzx Ew, Rb
#[c]# 017 277 xrm	movzx Ew, Rw
#[c] 
#[c]CBW			AX = (signed)AL
#[c]CWDE			EAX = (signed)AX
#[c]CWD			DX:AX = (signed)AX
#[c]CDQ			EDX:EAX = (signed)EAX
#[c]  230			cbw  /  (#) cwde
#[c]  231			cwd  /  (#) cdq
#[c] 
#[c][] Byte Ordering
#[c]   * Used to convert between "little Endian" (Intel byte ordering) and "big
#[c]     Endian" (Motorola byte ordering).  Typical use: networking applications.
#[c]BSWAP L		L[0]:L[1]:L[2]:L[3] = L[3]:L[2]:L[1]:L[0]
#[c]@  017 31r		bswap Rd
#[c] 
#[c][] Table Lookup
#[c]XLATB			AL = [BX + AL]
#[c]   327			xlatb
#[c] 
#[cf]
#[of]:SEMAPHORES & SYNCHRONIZATION
#[c]SEMAPHORES & SYNCHRONIZATION
#[c]----------------------------
#[c]Comments:
#[c]   * All these operations affect all 6 arithmetic flags.  BT, BTS, BTR, BTC
#[c]     affect only CF predictably; and BSF and BSR affect only ZF predictably.
#[c]   * ACC is either AL, AX or EAX in CMPXCHG, depending on the operand size.
#[c]   * WAIT is used in the '486 to force a pending unmasked interrupt from the
#[c]     internal floating point processing unit.
#[c]   * LOCK is a prefix used in multi-CPU contexts to assure exclusive access to
#[c]     memory for the following two-step read & modify operations:
#[c]        (INC, DEC, NEG, NOT) Mem       (ADD, ADC, SUB, SBB) Mem, Src
#[c]        (BT, BTS, BTR, BTC) Mem, Src   (AND, XOR, OR) Mem, Src
#[c]              XCHG Reg, Mem                  XCHG Mem, Reg
#[c]     But XCHG automatically does its own LOCK so does not need to be prefixed.
#[c]
#[c]P Op				Description
#[c]4 BT L, N			CF = L.N;
#[c]5 BTS L, N			CF = L.N; L.N = 1;
#[c]6 BTR L, N			CF = L.N; L.N = 0;
#[c]7 BTC L, N			CF = L.N; L.N = !L.N;
#[c]#  017 2P3 xrm		Op Ew, Rw
#[c]#  017 272 xPm Db	Op Ew, Db
#[c] 
#[c]BSF L, E				ZF = !E;  if (ZF) L = First 1-bit position in E; else L = ???
#[c]BSR L, E			ZF = !E;  if (ZF) L = Last 1-bit position in E; else L = ???
#[c]#  017 274 xrm		bsf Rw, Ew
#[c]#  017 275 xrm		bsr Rw, Ew
#[c] 
#[c]CMPXCHG L, E		ZF = (ACC == L);  if (ZF) L = E; else ACC = L;
#[c]@  017 246 xrm		cmpxchg Eb, Rb
#[c]@  017 247 xrm		cmpxchg Ew, Rw
#[c]XADD L, L'     <L, L'> = <L + L', L>
#[c]@  017 300 xrm		xadd Eb, Rb
#[c]@  017 301 xrm		xadd Ew, Rw
#[c] 
#[c]NOP				Delay 1 cycle.
#[c]WAIT				Wait for coprocessor unit.
#[c]LOCK				Hardware memory bus semaphore.
#[c]HLT					Wait for a reset or interrupt.
#[c]   220				nop
#[c]   233				wait
#[c]   360				lock
#[c]   364				hlt
#[c] 
#[c]INT N				push [E]FLAGS, CS, [E]IP; TF = 0;
#[c]					if (the Nth entry in the IDT is a Interrupt Gate) IF = 0;
#[c]					jmp to the far address listed under the Nth entry in the IDT
#[c]INTO				if (OF) INT 4
#[c]IRET				if (NT) return to task listed under TSS.BackLink;
#[c]					else pop [E]IP, CS, [E]FLAGS;
#[c]   314				int 3
#[c]   315 Db			int Db
#[c]   316				into
#[c]   317				iret
#[c] 
#[cf]
#[of]:FLAGS
#[c]FLAGS
#[c]-----
#[c]Comments:
#[c]   * No flags are affected except the explicit moves to the FLAGS register:
#[c]     POPF[D] and SAHF, but SAHF only sets the arithmetic flags (except OF).
#[c]POPF		pop FLAGS
#[c]POPFD		pop EFLAGS
#[c]PUSHF		push FLAGS
#[c]PUSHFD		push EFLAGS
#[c]SAHF		FLAGS |= (AH & 0xd5)
#[c]LAHF		AH = FLAGS;
#[c]   234		pushf / (#) pushfd
#[c]   235		popf  / (#) popfd
#[c]   236		sahf
#[c]   237		lahf
#[c]CMC		CF = !CF
#[c]CLC		CF = 0
#[c]STC		CF = 1
#[c]CLI			IF = 0 (Interrupts off)
#[c]STI			IF = 1 (Interrupts on)
#[c]CLD		DF = 0 (Set string ops to increment)
#[c]STD		DF = 1 (Set string ops to decrement)
#[c]   365		cmc
#[c]   370		clc
#[c]   371		stc
#[c]   372		cli
#[c]   373		sti
#[c]   374		cld
#[c]   375		std
#[c] 
#[cf]
#[of]:CONDITIONAL OPERATIONS
#[c]CONDITIONAL OPERATIONS
#[c]----------------------
#[c](NOTE: The values listed for CC are in octal).
#[c] 
#[c]CC   Condition(s)	Definition			Descriptions
#[c]07   A  NBE			!CF && !ZF			x > y   x > 0  (unsigned)
#[c]03   AE NB			!CF					x >= y  x >= 0 (unsigned)
#[c]02   B  NAE			CF					x < y   x < 0  (unsigned)
#[c]06   BE NA			CF || ZF				x <= y  x <= 0 (unsigned)
#[c]17   G  NLE			SF == OF && !ZF	x > y   x > 0  (signed)
#[c]15   GE NL			SF == OF			x >= y  x >= 0 (signed)
#[c]14   L  NGE			SF != OF			x < y   x < 0  (signed)
#[c]16   LE NG			SF != OF || ZF		x <= y  x <= 0 (signed)
#[c]04   E  Z			ZF					x == y  x == 0
#[c]05   NE NZ			!ZF					x != y  x != 0
#[c]00   O				OF					Overflow (signed overflow)
#[c]01   NO				!OF					No overflow (signed overflow)
#[c]02   C				CF					Carry (unsigned overflow)
#[c]03   NC				!CF					No carry (unsigned overflow)
#[c]10   S				SF					(Negative) sign
#[c]11   NS				!SF					No (negative) sign
#[c]12   P  PE			PF					Parity [even]
#[c]13   NP PO			!PF					No parity (parity odd)
#[c]CC   cc				Cond.
#[c] 
#[c]Jcc Rel        if (Cond) EIP += Rel;
#[c]SETcc L        L = (Cond)? 1: 0;
#[c]#  017 200+CC Cw   jcc Cw
#[c]#  017 220+CC x0m  setcc Rb
#[c]   160+CC          jcc Cb
#[c] 
#[cf]
#[of]:STACK OPERATIONS
#[c]STACK OPERATIONS
#[c]----------------
#[c]Comments:
#[c]   * PUSHA[D] uses the value SP had before the operation started.
#[c]   * POPA[D] doesn't actually affect [E]SP, which is why it's bracketed out.
#[c]   * POP CS is not allowed because it's already subsumed by the RET (far)
#[c]     operation.  Instead, 017 is used as a 2-byte operation prefix.
#[c]   * POP SS inhibits interrupts in order to allow [E]SP to be altered in the
#[c]     following operation -- for what should be obvious reasons.
#[c]PUSH E			SP -= sizeof E; SS:[SP] = E;
#[c]PUSHA			push AX, CX, DX, BX, SP, BP, SI, DI
#[c]PUSHAD		push EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI
#[c]#  017 240		push FS
#[c]#  017 250		push GS
#[c]   0s6			push SR   (s = 0-3)
#[c]   12r			push Rw
#[c]*  140			pusha / (#) pushad
#[c]   150 Dw		push Dw
#[c]   152 Dc		push Dc
#[c]   377 x6m		push Ew
#[c]POP L			L = SS:[SP]; SP += sizeof L;
#[c]POPA			pop DI, SI, BP, (SP), BX, DX, CX, AX
#[c]POPAD			pop EDI, ESI, EBP, (ESP), EBX, EDX, ECX, EAX
#[c]#  017 241		pop FS
#[c]#  017 251		pop GS
#[c]   0s7			pop SR    (s = 0, 2-3)
#[c]   13r			pop Rw
#[c]*  141			popa  / (#) popad
#[c]   217 x0m		pop Ew
#[c] 
#[cf]
#[of]:TRANSFER OPERATIONS
#[c]TRANSFER OPERATIONS
#[c]-------------------
#[c]Comments:
#[c]   * XCHG can have its operands listed in either order.
#[c]   * MOV CS, ... is not allowed, since this is already subsumed by JMPs.
#[c]   * LCS ... is not allowed either for the same reason.
#[c]   * XCHG AX, AX is one and the same as NOP.
#[c]XCHG L, E			<L, E> = <E, L>
#[c]   206 xrm			xchg Rb, Eb
#[c]   207 xrm			xchg Rw, Ew
#[c]   22r				xchg AX, Rw  (r != 0)
#[c]MOV L, E			L = E;
#[c]   210 xrm			mov Eb, Rb
#[c]   211 xrm			mov Ew, Rw
#[c]   212 xrm			mov Rb, Eb
#[c]   213 xrm			mov Rw, Ew
#[c]   214 xsm			mov Es, SR   (s = 0-3,   (#) 4-5)
#[c]   216 xsm			mov SR, Es   (s = 0,2-3, (#) 4-5)
#[c]   240 Dw			mov AL, [Dw]
#[c]   241 Dw			mov AX, [Dw]
#[c]   242 Dw			mov [Dw], AL
#[c]   243 Dw			mov [Dw], AX
#[c]   26r Db			mov Rb, Db
#[c]   27r Dw			mov Rw, Dw
#[c]   306 x0m Db		mov Eb, Db
#[c]   307 x0m Dw		mov Ew, Dw
#[c]LEA L, An			L = &An;
#[c]   215 xrm			lea Rw, En  (x != 3)
#[c]LSeg L, Af			Seg:L = &Af;
#[c]#  017 262 xrm		lss Rw, Ef  (x != 3)
#[c]#  017 264 xrm		lfs Rw, Ef  (x != 3)
#[c]#  017 265 xrm		lgs Rw, Ef  (x != 3)
#[c]   304 xrm			les Rw, Ef  (x != 3)
#[c]   305 xrm			lds Rw, Ef  (x != 3)
#[c] 
#[cf]
#[of]:ADDRESSING
#[c]ADDRESSING
#[c]----------
#[c]Comments:
#[c]   * The current mode of the machine determines its default mode (16 or 32
#[c]     bits).
#[c]   * RAND: and ADDR: (not an standard name, since Intel has none) are
#[c]     prefixes that alter the default for the next instruction only.
#[c]   * RAND: changes the word size between 16 and 32 bits.
#[c]   * ADDR: changes the address size between 16 and 32 bits.
#[c]   * seg: cannot override the implied ES:[DI] operand in any string op,
#[c]     but can override the DS in the implied DS:[SI] operands there.
#[c]seg:			Segment override prefix
#[c]ADDR:			Address size toggle
#[c]RAND:			Operand size toggle
#[c]   305 xrm		lds Rw, Ef  (x != 3)
#[c]   046			ES:
#[c]   056			CS:
#[c]   066			SS:
#[c]   076			DS:
#[c]#  144			FS:
#[c]#  145			GS:
#[c]#  146			RAND:
#[c]#  147			ADDR:
#[c] 
#[cf]
#[of]:PORT I/O
#[c]PORT I/O
#[c]--------
#[c]Comments:
#[c]   * In protected mode the user of these operations must pass the I/O
#[c]     Privilege Level (IOPL) else they are blocked by an interrupt.
#[c]     This allows the Operating System to spool I/O devices in a
#[c]     multitasking system (since the OS handles interrupts) to avoid having
#[c]     processes all trying to use the same device at once.
#[c]IN ACC, Port	ACC = IO[Port]
#[c]   344 Db		in AL, Db
#[c]   345 Db		in AX, Db
#[c]   354			in AL, DX
#[c]   355			in AX, DX
#[c]OUT Port, ACC	IO[Port] = ACC
#[c]   346 Db		out Db, AL
#[c]   347 Db		out Db, AX
#[c]   356			out DX, AL
#[c]   357			out DX, AX
#[c] 
#[cf]
#[of]:STRING OPERATIONS
#[c]STRING OPERATIONS
#[c]-----------------
#[c]Comments:
#[c]   * In all these operations below, Src denotes DS;[ESI] and Dest ES:[EDI].
#[c]   * Dest cannot be overridden by a segment prefix, only Src.
#[c]   * The pointes (ESI, EDI) are bumped up (DF = 0) or down (DF = 1) after
#[c]     the operation by sizeof Operand.
#[c]   * ACC is either AL, AX or EAX depending on the operand size.
#[c]   * The flags altered are exactly those altered by the corresponding
#[c]     MOV, IN, OUT, or CMP operation (namely: only SCAS and CMPS alter the
#[c]     flags and in the same way as CMP) and these are therefore the only ones
#[c]     that can be prefixed by REP[N]E/REP[N]Z.
#[c]   * REP with all string ops, but REP LODS doesn't do anything sensible.
#[c]INS					in Dest, DX
#[c]OUTS				out DX, Src
#[c]MOVS				mov Dest, Src
#[c]CMPS				cmp Dest, Src
#[c]STOS				mov Dest, ACC
#[c]LODS				mov ACC, Src
#[c]SCAS				cmp ACC, Dest
#[c]*  154				insb
#[c]*  155				insw  / (#) insd
#[c]*  156				outsb
#[c]*  157				outsw / (#) outsd
#[c]   244				movsb
#[c]   245				movsw / (#) movsd
#[c]   246				cmpsb
#[c]   247				cmpsw / (#) cmpsd
#[c]   252				stosb
#[c]   253				stosw / (#) stosd
#[c]   254				lodsb
#[c]   255				lodsw / (#) lodsd
#[c]   256				scasb
#[c]   257				scasw / (#) scasd
#[c]REP Op				while (CX-- > 0) Op
#[c]REPE /REPZ Op		while (CX-- > 0 && ZF) Op
#[c]REPNE/REPNZ Op	while (CX-- > 0 && !ZF) Op
#[c]   362				repne / repnz / rep
#[c]   363				repe  / repz
#[c] 
#[cf]
#[of]:CONTROL FLOW
#[c]CONTROL FLOW
#[c]------------
#[c]Comments:
#[c]   * The distinction between near and far jumps/calls/returns is built right
#[c]     into the 8086 language, which pretty much forces you to explicitly
#[c]     declare a routine as "near" or "far" and be consistent about it.  The
#[c]     intended usage runs pretty much like C's static vs. global functions,
#[c]     with each C file being analogous to an 8086 segment.
#[c]   * The 8086 was specifically designed to be a Pascal (and PL/I) machine,
#[c]     though.  Intel wrongly assumed that one of these languages would become
#[c]     like C is now.  So the ENTER and LEAVE operators were added (and BOUND
#[c]     to do array bounds-checking).  The segmentation structure was intended
#[c]     to support these types of languages.
#[c]JCXZ Rel       if (!CX) IP += Rel;
#[c]JECXZ Rel      if (!ECX) IP += Rel;
#[c]LOOPcc Rel     if (!--CX && cc) IP += Rel;
#[c]   340 Cb          loopnz Cb / loopne Cb
#[c]   341 Cb          loopz Cb  / loope Cb
#[c]   342 Cb          loop Cb
#[c]   343 Cb          jcxz Cb   / (#) jecxz Cb
#[c]JMP Rel        IP += Rel;
#[c]JMP FAR Af     CS:IP = Af;
#[c]CALL Rel       push IP;      IP += Rel;
#[c]CALL FAR Af    push CS, IP;  IP = Af;
#[c]   232 Af          call Af
#[c]   350 Cw          call Cw
#[c]   351 Cw          jmp Cw
#[c]   352 Af          jmp far Af
#[c]   353 Cb          jmp Cb
#[c]   377 x2m         call En
#[c]   377 x3m         call far Ef
#[c]   377 x4m         jmp En
#[c]   377 x5m         jmp far Ef
#[c]RET Params     pop IP;       SP += Params (default: Params = 0)
#[c]RET FAR Params pop IP, CS;   SP += Params (default: Params = 0)
#[c]   302 Dw          ret Dw
#[c]   303             ret
#[c]   312 Dw          ret far Dw
#[c]   313             ret far
#[c]ENTER Locs, N push EBP;
#[c]              (sub EBP, 4;  push [EBP]) N-1 times, if N > 0
#[c]              mov EBP, ESP
#[c]              (add EBP, 4*(N-1);  push EBP),       if N > 0
#[c]              sub ESP, Locs
#[c]LEAVE         mov ESP, EBP;   pop EBP
#[c]*  310 Dw Db       enter Dw, Db
#[c]*  311             leave
#[c] 
#[cf]
#[of]:SYSTEM CONTROL & MEMORY PROTECTION
#[c]SYSTEM CONTROL & MEMORY PROTECTION
#[c]----------------------------------
#[c]BOUND A, AA   if (A not in range AA[0]..AA[1]) INT 5
#[c]ARPL L, E     ZF = (L.RPL < E.RPL);
#[c]              if (ZF) L.RPL = E.RPL;
#[c]*  142 xrm         bound Rw, Ed
#[c]$  143 xrm         arpl Es, Rw
#[c] 
#[c]SLDT Sel      Sel = LDTR
#[c]STR Sel       Sel = TR
#[c]LLDT Sel      LDTR = Sel
#[c]LTR Sel       TR = Sel
#[c]VERR Sel      ZF = (Sel is accessible and has read-access)
#[c]VERW Sel      ZF = (Sel is accessible and has write-access)
#[c]LAR L, Sel    ZF = (Sel is accessible);
#[c]              if (ZF) L = the access rights of Sel's descriptor.
#[c]LSL L, Sel    ZF = (Sel is accessible);
#[c]              if (ZF) L = the segment limit of Sel's descriptor.
#[c]$  017 000 x0m     sldt Ew
#[c]$  017 000 x1m     str Ew
#[c]$  017 000 x2m     lldt Ew
#[c]$  017 000 x3m     ltr Ew
#[c]$  017 000 x4m     verr Ew
#[c]$  017 000 x5m     verw Ew
#[c]$  017 002 xrm     lar Rw, Ew
#[c]$  017 003 xrm     lsl Rw, Ew
#[c] 
#[c]SGDT Desc     Desc = GDTR
#[c]SIDT Desc     Desc = IDTR
#[c]LGDT Desc     GDTR = Desc
#[c]LIDT Desc     IDTR = Desc
#[c]$  017 001 x0m     sgdt Ep
#[c]$  017 001 x1m     sidt Ep
#[c]$  017 001 x2m     lgdt Ep
#[c]$  017 001 x3m     lidt Ep
#[c] 
#[c]SMSW L        L = MSW ... note that MSW is CR0 bits 0-15.
#[c]LMSW E        MSW = E
#[c]CLTS          MSW.3 = 0 ... clears the Task Switched flag.
#[c]$  017 001 x4m     smsw Ew
#[c]$  017 001 x6m     lmsw Ew
#[c]$  017 006         clts
#[c] 
#[c]INVD          Invalidate internal cache.
#[c]WBINVD        Invalidate internal cache, after writing it back.
#[c]INVLPD Ea     Invalidate Ea's page.
#[c]@  017 010         invd
#[c]@  017 011         wbinvd
#[c]@  017 020 x7m     invlpg Ea
#[c] 
#[c]MOV Reg, SysReg
#[c]MOV SysReg, Reg
#[c]#  017 040 3nr     mov Rd, CRn   (n = 0-3)
#[c]#  017 041 3nr     mov Rd, DRn   (n = 0-3, 6-7)
#[c]#  017 042 3nr     mov CRn, Rd   (n = 0, 2-3)
#[c]#  017 043 3nr     mov DRn, Rd   (n = 0-3, 6-7)
#[c]#  017 044 3nr     mov Rd, TRn   (n = 6-7)
#[c]#  017 046 3nr     mov TRn, Rd   (n = 6-7)
#[c] 
#[cf]
#[of]:CO-PROCESSOR ESCAPE SEQUENCE
#[c]CO-PROCESSOR ESCAPE SEQUENCE
#[c]----------------------------
#[c]Comments:
#[c]   * This escape sequence is intended to be used with an external co-processor
#[c]     with the most common application being the 80x87 floating point unit.
#[c]   * Starting in the 80486, the floating point unit was made internal to the
#[c]     processor.
#[c] 
#[c]ESC TL, Ea  Escape, operation TL, address mode Ea.
#[c]   33T xLm         esc TL Ea
#[c] 
#[c](6) Floating Point Operations
#[c]   The Floating Point unit consists of 8 internal registers arranged in a
#[c]circular stack, and the Control Word (CW), Status Word (SW) and Tag Word (TW)
#[c]registers.  The floating point stack registers all store data in Real80
#[c]format (described below).
#[c]   Operations are carried out on data in the following formats (low-order bits
#[c]on right):
#[c] 
#[c]   INTEGER: 16/32/64 bits (Int16, Int32, Int64)
#[c]   BCD: (BCD80)
#[c]         S 0000000  D D D D D D D D D D D D D D D D D D
#[c]           S = 1-bit sign (1 = negative, 0 = positive)
#[c]           D = 4-bit digit (encodes digits 0-9).
#[c]   FLOATING POINT: 32/64/80 bits (Real32, Real64, Real80)
#[c]         S Exponent Mantissa
#[c]           S = 1-bit sign (1 = negative, 0 = positive)
#[c]           Exponent = 8/11/15 bit biased exponent
#[c]           Mantissa = 23/52/64 bit decimal fraction.
#[c]   The values of floating point numbers in each format are as follows:
#[c]        Real32: (-1)^S (1 + Mantissa)/2^23 x 2^(Exponent - 127)
#[c]        Real64: (-1)^S (1 + Mantissa)/2^52 x 2^(Exponent - 1023)
#[c]        Real80: (-1)^S     Mantissa/2^63   x 2^(Exponent - 16383)
#[c] 
#[c]The floatng point formats do not cover all the logical combination of binary
#[c]0's and 1's, and the remaining combinations are defined for special purposes:
#[c] 
#[c]         Sign  Exponent      Mantissa      Meaning
#[c]          S   0 0 0 ... 0   0 0 0 ... 0       0
#[c]          S   0 0 0 ... 0   ... 1 ...      DENORMAL (Infinitesimal)
#[c]          S   1 1 1 ... 1   0 0 0 ... 0    INFINITY
#[c]          S   1 1 1 ... 1   0 ... 1 ...    Signalling NaN (Not a Number)
#[c]          S   1 1 1 ... 1   1 ...          Quiet NaN
#[c] 
#[c]This is all IEEE standard format.  Quiet NaN's are set by the FP Unit to
#[c]indicate invalid operations.
#[c] 
#[c]Notation:
#[c]ST(n) -- the nth item below the stack top.
#[c]ST ----- ST(0), the stack top.
#[c]Int*, BCD*, Real* -- described above.
#[c] 
#[c]All Int*, BCD*, and Real* operands are stored in memory and are encoded in
#[c]the 80x86's current addressing mode (16 or 32 bit).  All opcodes are
#[c]listed in the format:
#[c] 
#[c]                  T L xm    for    8086 escape code 33T xLm
#[c] 
#[c]Since only memory addresses are used in the operations, that frees up all
#[c]the combinations xm where x = 3.  These are generally used to encode the
#[c]operations that do not involve memory addresses.  In the following
#[c]presentation where "xm" is listed generally, it is understood that x is not 3.
#[c] 
#[c]   The operations FENI, FDISI are specific to the 8887; FSETPM to the 80287
#[c]and FUCOM*, FPREM1, and the trig. operations FSIN, FCOS, FSINCOS are all
#[c]present only in the 80387 and after.
#[c] 
#[cf]
#[of]:DATA TRANSFER
#[c]DATA TRANSFER
#[c]-------------
#[c]Comments:
#[c]   * The followng table is used:
#[c]      P    0    2    3
#[c]     F-OP fld  fst  fstp
#[c]     I-OP fild fist fistp
#[c]FLD Arg       ST = (Real80)Arg
#[c]FST Arg       Arg = (typeof Arg)ST
#[c]FSTP Arg      Arg = (typeof Arg)ST; pop();
#[c]FXCH Arg      Arg <--> ST, with appropriate type conversions.
#[c]   1 P xm          F-OP Real32
#[c]   3 P xm          I-OP Int32
#[c]   5 P xm          F-OP Real64
#[c]   7 P xm          I-OP Int16
#[c]   3 5 xm          fld Real80
#[c]   3 7 xm          fstp Real80
#[c]   7 4 xm          fbld BCD80
#[c]   7 5 xm          fild Int64
#[c]   7 6 xm          fbstp BCD80
#[c]   7 7 xm          fistp Int64
#[c]   1 0 3m          fld ST(m)
#[c]   1 1 3m          fxch ST(m)
#[c]   5 2 3m          fst ST(m)
#[c]   5 3 3m          fstp ST(m)
#[c] 
#[cf]
#[of]:COMPARISON
#[c]COMPARISON
#[c]----------
#[c]Comments:
#[c]   * The followng table is used:
#[c]      P    2     3
#[c]     F-OP fcom  fcomp
#[c]     I-OP ficom ficomp
#[c]FCOM Arg      cmp ST, Arg
#[c]FCOMP Arg     cmp ST, Arg; pop();
#[c]   0 P xm          F-OP Real32
#[c]   2 P xm          I-OP Int32
#[c]   4 P xm          F-OP Real64
#[c]   6 P xm          I-OP Int16
#[c]   0 P 3m          F-OP ST(m)
#[c]FCOMPP        cmp ST, ST(1); pop(); pop();
#[c]   6 3 31          fcompp
#[c]FTST          cmp ST, 0.0
#[c]   1 4 34          ftst
#[c]FXAM          examine ST
#[c]   1 4 35          fxam
#[c]FUCOM Arg     unordered compare ST, Arg
#[c]FUCOMP Arg    unordered compare ST, Arg; pop();
#[c]FUCOMPP Arg   unordered compare ST, ST(1); pop(); pop();
#[c]   5 4 3m          fucom ST(m)
#[c]   5 5 3m          fucomp ST(m)
#[c]   2 5 31          fucompp
#[c] 
#[cf]
#[of]:ARITHMETIC OPERATIONS
#[c]ARITHMETIC OPERATIONS
#[c]---------------------
#[c]Comments:
#[c]   * The followng table is used:
#[c]      P    0      1      4      5      6       7
#[c]     F-OP fadd   fmul   fsub   fsubr  fdiv   fdivr
#[c]     I-OP fiadd  fimul  fisub  fisubr fidiv  fidivr
#[c]     P-OP faddp  fmulp  fsubp  fsubrp fdivp  fdivrp
#[c]   * Dest is ST and Src the listed operand except where noted below.
#[c]FADD Arg      Dest += Src
#[c]FSUB Arg      Dest += Src
#[c]FSUBR Arg     Dest = Src - Dest
#[c]FMUL Arg      Dest *= Src
#[c]FDIV Arg      Dest /= Src
#[c]FDIVR Arg     Dest = Src/Dest
#[c]   0 P xm          F-OP Real32
#[c]   2 P xm          I-OP Int32
#[c]   4 P xm          F-OP Real64
#[c]   6 P xm          I-OP Int16
#[c]   0 P 3m          F-OP ST(m)
#[c]   4 P 3m          F-OP ST(m)   (Dest = ST(m), Src = ST)
#[c]   6 P 3m          P-OP ST(m)   (Dest = ST(m), Src = ST)
#[c] 
#[cf]
#[of]:CONSTANTS
#[c]CONSTANTS
#[c]---------
#[c]FLD1          ST = 1.0
#[c]FLDL2T        ST = log_2(10)
#[c]FLDL2E        ST = log_2(e)
#[c]FLDPI         ST = pi
#[c]FLDLG2        ST = log_10(2)
#[c]FLDLN2        ST = ln(2)
#[c]FLDZ          ST = 0.0
#[c]   1 5 30          fld1
#[c]   1 5 31          fldl2t
#[c]   1 5 32          fldl2e
#[c]   1 5 33          fldpi
#[c]   1 5 34          fldlg2
#[c]   1 5 35          fldln2
#[c]   1 5 36          fldz
#[c] 
#[cf]
#[of]:BUILT-IN FUNCTIONS
#[c]BUILT-IN FUNCTIONS
#[c]------------------
#[c]Comments:
#[c]   * The stack replacements entail pop()'s.
#[c]FCHS          ST = -ST
#[c]FABS          ST = |ST|
#[c]F2XM1         ST = 2^ST - 1
#[c]FYL2X         Replace the stack: ST(1), ST -> ST(1)*log_2(ST)
#[c]FPTAN         Replace the stack: ST -> tan(ST), 1.0
#[c]FPATAN        Replace the stack: ST(1), ST -> atan(ST(1)/ST)
#[c]FXTRACT       Replace the stack: ST -> exponent(ST), mantissa(ST)
#[c]FPREM1        ST = remainder(ST/ST(1)), IEEE consistent
#[c]FPREM         ST = remainder(ST/ST(1))
#[c]FYL2XPI       Replace the stack: ST(1), ST -> ST(1)*log_2(ST + 1)
#[c]FSQRT         ST = sqrt(ST)
#[c]FSINCOS       Replace the stack: ST -> sin(ST), cos(ST)
#[c]FRNDINT       ST = round(ST)
#[c]FSCALE        ST *= 2^(int)ST(1)
#[c]FSIN          ST = sin(ST)
#[c]FCOS          ST = cos(ST)
#[c]   1 4 30          fchs
#[c]   1 4 31          fabs
#[c]   1 6 30          f2xm1
#[c]   1 6 31          fyl2x
#[c]   1 6 32          fptan
#[c]   1 6 33          fpatan
#[c]   1 6 34          fxtract
#[c]   1 6 35          fprem1
#[c]   1 7 30          fprem
#[c]   1 7 31          fyl2xpi
#[c]   1 7 32          fsqrt
#[c]   1 7 33          fsincos
#[c]   1 7 34          frndint
#[c]   1 7 35          fscale
#[c]   1 7 36          fsin
#[c]   1 7 37          fcos
#[c] 
#[cf]
#[of]:CONTROL
#[c]CONTROL
#[c]-------
#[c]Comments:
#[c]   * The save and load operations for the environment and state are used
#[c]     primarily for multitasking applications where 2 or more processes are
#[c]     using the FP unit concurrently.
#[c]FNOP          Delay 1 cycle.
#[c]FLDENV Arg    Load FP environment from [Arg]
#[c]FLDCW Arg     CW = Arg
#[c]FSTENV Arg    Save FP environment to [Arg]
#[c]FSTCW Arg     Arg = CW
#[c]FDECSTP       TOP = (TOP - 1) mod 8
#[c]FINCSTP       TOP = (TOP + 1) mod 8
#[c]FENI          Enable interrupts (8087 only)
#[c]FDISI         Disable interrupts (8087 only)
#[c]FCLEX         Clear out FP exception flags
#[c]FINIT         Initialize FP registers
#[c]FSETPM        Enter Protected Mode (80287 only)
#[c]FFREE ST(m)   Mark register m as unused.
#[c]FRSTOR Arg    Restore FP state from [Arg]
#[c]FSAVE Arg     Save FP state to [Arg]
#[c]FSTSW Arg     Arg = SW
#[c]   1 2 30          fnop
#[c]   1 4 xm          fldenv Ea
#[c]   1 5 xm          fldcw Ea
#[c]   1 6 xm          fstenv Ea
#[c]   1 7 xm          fstcw Ea
#[c]   1 6 36          fdecstp
#[c]   1 6 37          fincstp
#[c]   3 4 30          feni
#[c]   3 4 31          fdisi
#[c]   3 4 32          fclex
#[c]   3 4 33          finit
#[c]   3 4 34          fsetpm
#[c]   5 0 3m          ffree ST(m)
#[c]   5 4 xm          frstor Ea
#[c]   5 6 xm          fsave Ea
#[c]   5 7 xm          fstsw Ea
#[c]   7 4 30          fstsw AX
#[cf]
#[cf]
#[cf]
#[of]:	Calling conventions on the x86 platform
#[c]Calling conventions on the x86 platform

Original source:
http://www.angelcode.com/dev/callconv/callconv.html

2005/02/13, Andreas Jönsson

This is a document that I wrote as research for the AngelCode Scripting Library. Since the library uses assembly to make the interaction 
between the script engine and the host application I needed to have complete knowledge of how the calling conventions are implemented by 
different compilers. To my surprise there were a lot more differences than I had initially thought. Most of the differences are related to C++ features, 
so the differences can be understood as there were no standard when the compilers were first written. Today there is a standard, but I believe 
that it doesn't mention how calling conventions should be implemented. Which leads to binary incompatibility between compilers, even though 
the source code is compatible.

The differences doesn't stop AngelScript from supporting each of the compilers. Though new compilers may have to make a few changes in order 
to follow the conventions used. As support for more compilers are added to AngelScript I will add those compilers to the article.

#[of]:	cdecl
cdecl

This calling convention is the default for C programs and also global functions in C++ programs. Generally the function arguments are passed 
on the stack in reverse order so that the callee can access them in the correct order. The caller is responsible for popping the arguments after 
the function returns, which makes it possible to use the ... to send runtime defined arguments. Return values are returned in the registers.

Visual C++ / Win32

    * Arguments are pushed on the stack in reverse order.
    * The caller pops arguments after return.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * Simple data structures with 8 bytes or less in size are returned in EAX:EDX.
    * Class objects that require special treatment by the exception handler are returned in memory. Classes with a defined constructor, destructor, or 
      overloaded assignment operator are examples of these.
    * Objects larger than 8 bytes are returned in memory.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, 
      and returns the pointer. The caller pops the hidden pointer together with the rest of the arguments. 

MinGW g++ / Win32

    * Arguments are pushed on the stack in reverse order.
    * The caller pops arguments after return.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * Objects with 8 bytes or less in size are returned in EAX:EDX.
    * Objects larger than 8 bytes are returned in memory.
    * Classes that have a destructor are returned in memory regardless of size.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, 
      and returns the pointer. The callee pops the hidden pointer from the stack when returning.
    * Classes that have a destructor are always passed by reference, even if the parameter is defined to be by value. 

GCC g++ / Linux

    * Arguments are pushed on the stack in reverse order.
    * The caller pops arguments after return.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * All structures and classes are returned in memory regardless of complexity or size.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, 
      and returns the pointer. The callee pops the hidden pointer from the stack when returning.
    * Classes that have a destructor are always passed by reference, even if the parameter is defined to be by value. 

#[cf]
#[of]:	stdcall
stdcall

stdcall is the calling conventions used by the Win32 API. It is basically the same as the cdecl convention with the difference in that the callee is responsible 
for popping the arguments from the stack. This makes the call slightly faster, but also prevents the use of the ... operator.
Visual C++ / Win32

    * Arguments are pushed on the stack in reverse order.
    * The callee pops arguments when returning.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * Simple data structures with 8 bytes or less in size are returned in EAX:EDX.
    * Class objects that require special treatment by the exception handler are returned in memory. Classes with a defined constructor, destructor, or overloaded 
      assignment operator are examples of these.
    * Objects larger than 8 bytes are returned in memory.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, 
       and returns the pointer. The callee pops the hidden pointer together with the rest of the arguments. 

MinGW g++ / Win32

    * Arguments are pushed on the stack in reverse order.
    * The callee pops arguments when returning.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * Objects with 8 bytes or less in size are returned in EAX:EDX.
    * Objects larger than 8 bytes are returned in memory.
    * Classes that have a destructor are returned in memory regardless of size.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, 
       and returns the pointer. The callee pops the hidden pointer from the stack together with the rest of the arguments.
    * Classes that have a destructor are always passed by reference, even if the parameter is defined to be by value. 

GCC g++ / Linux

    * Arguments are pushed on the stack in reverse order.
    * The callee pops arguments when returning.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * All structures and classes are returned in memory regardless of complexity and size.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, 
       and returns the pointer. The callee pops the hidden pointer from the stack together with the rest of the arguments.
    * Classes that have a destructor are always passed by reference, even if the parameter is defined to be by value. 

#[cf]
#[of]:	thiscall
thiscall

This calling convention was introduced with C++. The only sure thing about it is that arguments are pushed on the stack in reverse order and that the caller 
passes the object pointer to the function in some way or other.
Visual C++ / Win32

    * Arguments are pushed on the stack in reverse order.
    * The object pointer is passed in ECX.
    * The callee pops arguments when returning.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * Simple data structures with 8 bytes or less in size are returned in EAX:EDX.
    * All classes and structures are returned in memory, regardless of size.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, 
      and returns the pointer. The callee pops the hidden pointer together with the rest of the arguments.
    * If the method takes variable number of arguments, i.e. is declared with the ... operator, then the calling convention instead becomes that of cdecl. 
      The object pointer is pushed on the stack as the first argument instead of passed in ECX, and all arguments are popped from the stack by the caller when the function returns. 

MinGW g++ / Win32

    * Arguments are pushed on the stack in reverse order.
    * The object pointer is pushed on the stack as the first parameter.
    * The caller pops arguments after return.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * Objects with 8 bytes or less in size are returned in EAX:EDX.
    * Objects larger than 8 bytes are returned in memory.
    * Classes that have a destructor are returned in memory regardless of size.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, and returns the 
      pointer. The callee pops the hidden pointer from the stack when returning.
    * Classes that have a destructor are always passed by reference, even if the parameter is defined to be by value. 

GCC g++ / Linux

    * Arguments are pushed on the stack in reverse order.
    * The object pointer is pushed on the stack as the first parameter.
    * The caller pops arguments after return.
    * Primitive data types, except floating point values, are returned in EAX or EAX:EDX depending on the size.
    * float and double are returned in fp0, i.e. the first floating point register.
    * All structures and classes are returned in memory regardless of complexity and size.
    * When a return is made in memory the caller passes a pointer to the memory location as the first parameter (hidden). The callee populates the memory, and returns the pointer. 
      The callee pops the hidden pointer from the stack when returning.
    * Classes that have a destructor are always passed by reference, even if the parameter is defined to be by value. 

#[cf]
#[of]:	fastcall
fastcall

This is a special calling convention that is designed for speed. It is rarely used so I haven't studied it closely, but I understand that the first arguments are passed in registers while 
the rest are pushed on the stack as normal.
Visual C++ / Win32

    * The first two arguments are passed in ECX and EDX. The rest are passed on the stack just like cdecl 

#[cf]
#[of]:	Further reading
Further reading

Not all of these articles are directly related to calling conventions, but they are still a worthy read for anyone interested in interacting with C++ programs on a truly low level.

    * Calling conventions
    * Microsoft Calling Conventions
    * PC Assembly Language
    * C++: Under the Hood
    * Member Function Pointers and the Fastest Possible C++ Delegates 

#[cf]
#[of]:	Revision history
Revision history

    * 2004/08/04 - Article created.
    * 2005/01/19 - Added information on GCC/g++ on Linux. Made a few corrections on the way MinGW/g++ handles classes with declared destructors.
    * 2005/02/13 - Added information about class methods with variable number of arguments for MSVC++. 
    
#[cf]
#[cf]
	http://code.google.com/p/corkami/wiki/x86oddities