From: mark@omnifest.uwm.edu (Mark Hopkins)
Newsgroups: alt.lang.asm
Subject: A Summary of the 80486 Opcodes and Instructions

(1) The 80x86 is an Octal Machine
   This is a follow-up and revision of an article posted in alt.lang.asm on
7-5-92 concerning the 80x86 instruction encoding.
   Some bugs were corrected June, the 20th, 1997 by S.Klose (sven@devcon.net)
(minor bugs in 32bit effective addresses and opcode typoes)
 
   The only proper way to understand 80x86 coding is to realize that ALL 80x86
OPCODES ARE CODED IN OCTAL.  A byte has 3 octal digits, ranging from 000 to
377.  In fact, each octal group (000-077, 100-177, etc.) tends to encode a
specific variety of operation.	All of these are features inherited from the
8080/8085/Z80.
   For some reason absolutely everybody misses all of this, even the Intel
people who wrote the reference on the 8086 (and even the 8080).  The opcode
scheme outlined briefly below is expanded starting in the 80386, but
consistently with the overall scheme here.
 
   As an example to see how this works, the mov instructions in octal are:
 
210 xrm 	mov Eb, Rb
211 xrm 	mov Ew, Rw
212 xrm 	mov Rb, Eb
213 xrm 	mov Rw, Ew
214 xsm 	mov Ew, SR
216 xsm 	mov SR, Ew
 
The meanings of the octal digits (x, m, r, s) and their correspondence to the
operands (Eb, Ew, Rb, Rw, SR) are the following:
 
The digit r (0-7) encodes the register operand as follows:
REGISTER (r):		     0	 1   2	 3   4	 5   6	 7
   Rb = Byte-sized register AL	CL  DL	BL  AH	CH  DL	BH
   Rw = Word-sized register AX	CX  DX	BX  SP	BP  SI	DI
 
The segment register digit s (0-7) encodes the segment register as follows:
SEGMENT REGISTER (s):	   0   1   2   3   4   5   6   7
   SR = Segment register  ES  CS  SS  DS    <Reserved>
 
The digits x (0-3), and m (0-7) encode the address mode according to
the following scheme.  One or more bytes (labeled: Disp) may immediately
follow xrm as described below.
 
TABLE 1:     16-BIT ADDRESSING MODE (x, m):
   Eb = Address of byte-sized object in memory or register
   Ew = Address of word-sized object in memory or register
   Dw = Unsigned word
   Dc = Signed byte ("character"), range: -128 to +127 (decimal).
   Db = Unsigned byte
 
   x  m  Disp  Eb  Ew
   ------------------
   3  r        Rb  Rw
   0  6   Dw   DS:[Dw]
   0  m        Base:[0]   (except for xm = 06).
   1  m   Dc   Base:[Dc]
   2  m   Dw   Base:[Dw]
 
   x  0  Disp  DS:[BX + SI + Disp]
   x  1  Disp  DS:[BX + DI + Disp]
   x  2  Disp  SS:[BP + SI + Disp]
   x  3  Disp  SS:[BP + DI + Disp]
   x  4  Disp  DS:[SI + Disp]
   x  5  Disp  SS:[DI + Disp]
   x  6  Disp  DS:[BP + Disp]	(except for xm = 06)
   x  7  Disp  DS:[BX + Disp]
 
This expands into the following table:
 
TABLE 1a:     16-BIT ADDRESSING MODE (x, m) for the expansion impaired. :)
xm    Eb/Ew	    xm	  Eb/Ew 	     xm    Eb/Ew	      xm Eb/Ew
00    DS:[BX + SI]  10 Dc DS:[BX + SI + Dc]  20 Dw DS:[BX + SI + Dw]  30 AL/AX
01    DS:[BX + DI]  11 Dc DS:[BX + DI + Dc]  21 Dw DS:[BX + DI + Dw]  31 CL/CX
02    SS:[BX + SI]  12 Dc SS:[BP + SI + Dc]  22 Dw SS:[BP + SI + Dw]  32 DL/DX
03    SS:[BX + DI]  13 Dc SS:[BP + DI + Dc]  23 Dw SS:[BP + DI + Dw]  33 BL/BX
04    DS:[SI]	    14 Dc DS:[SI + Dc]	     24 Dw DS:[SI + Dw]       34 AH/SP
05    DS:[DI]	    15 Dc DS:[DI + Dc]	     25 Dw DS:[DI + Dw]       35 CH/BP
06 Dw DS:[Dw]	    16 Dc SS:[BP + Dc]	     26 Dw SS:[BP + Dw]       36 DH/SI
07    DS:[BX]	    17 Dc DS:[BX + Dc]	     27 Dw DS:[BX + Dw]       37 BH/DI
 
Operands where x is 0, 1, or 2 are all pointers.  If the instruction is a WORD
instruction (211, 213, 214, 216 are), then this pointer addresses a
word-sized object.  The format of the object at the indicated address will
always be low-order byte first, and high-order byte second.  Otherwise the
instruction is a BYTE instruction (210, 212) and the pointer addresses
byte-sized object at the indicated address.
 
The default segments (DS:, SS:) can be overridden with a segment prefix.  In
all cases it's understood that everything has the default segment DS, except
for the two stack/frame pointers (BP and SP) whose default segment is SS.
That will be explained below.
 
Modes where x = 1, or 2 will require displacement bytes (Dc or Dw) to follow
the opcode as explained above.
 
When x = 3, WORD sized instructions address the word registers (AX, CX, ...)
and the BYTE size instructions the byte registers (AL, CL, ...).
 
EXAMPLE 1: The instruction opcode: 210 135 375
   Here, xm = 15, and r = 3, so the operands are:
 
			    mov Eb, Rb
			       =>
		    mov byte ptr DS:[DI + Dc], BL
 
The displacement, Dc, is 375 (or fd in hexadecimal), which is the signed byte
-3.  So the instruction reads:
 
		    mov byte ptr DS:[DI - 3], BL
 
or just:
			 mov [DI - 3], BL
 
In C-like notation, the meaning of this operation would be:
 
		    ((byte *)DS) [DI - 3] = BL;
 
EXAMPLE 2: The instruction opcode: 216 332
   Here, xm = 32, and s = 3, so the operands are:
 
			   mov SR, Ew
			       =>
			   mov DS, DX
 
A move to CS is not possible (because the far jump instruction already does
that) so that the opcode sequence:
 
			     216 x2m
 
is free to be used for encoding something else.
 
EXAMPLE 3: As an illustration of why it's better to think in octal, just look
	   at the opcodes for the binary arithmetic instructions:
 
0P0 xrm 	Op Eb, Rb
0P1 xrm 	Op Ew, Rw
0P2 xrm 	Op Rb, Eb
0P3 xrm 	Op Rw, Ew
0P4 Db		Op AL, Db
0P5 Dw		Op AX, Dw
 
They all have the same form, with a single digit encoding the operator as
follows:
		  P	Op	    P	  Op
		  0    add	    1	  or
		  2    adc	    3	 sbb
		  4    and	    5	 sub
		  6    xor	    7	 cmp
 
That's a good fraction of your reference table right there.
 
EXAMPLE 4: The same mapping is used in the immediate to memory/register form
	   of these operations:
 
200 xPm Db	Op Eb, Db
201 xPm Dw	Op Ew, Dw
203 xPm Dc	Op Ew, Dc
 
(2) An Outline of 80x86 Instructions and Encoding
   The authors of 8080 and 8086 references (including Intel's own references)
are apparently not aware of the octal nature of their own machines, and the
result is an almost grotesque complication and bungling up in the presentation
of something that is actually fairly simple.  Thus, people claim that it's
almost impossible to know 8086 binary by heart, whereas in fact I know most of
it by memory.  I'll straighten out the mess for you here.
 
   As alluded to above, instructions are encoded as follows:
 
			    op xrm Const
 
   where * op is a 1 or 2 byte opcode,
	 * xrm (if present) constitutes 3 octal digits whose normal uses are:
	       r = Register operand, xm = Memory or Register operand.
	   It may be followed immediately by a "displacement" byte or word,
	   depending solely on the digits x and m.
	 * Const (if present) denotes a byte or word value whose presence and
	   format depends solely on what op (and sometimes xrm) is.
 
In some cases, the opcode itself may be separated out into octal digits, e.g.
 
		 0s6 = push (Segment Register #s).
 
   The one major exception to the coding scheme are all the conditional code
operations.  Since there are 16 distinct conditional codes, they are
represented as a hexadecimal digit.  The conditional jump in octal ranges
from 160 to 177, which is 7x in hexadecimal, where x is a hex digit encoding
the jump's condition.  I'll represent them by the format: 160+CC.
 
The register and address encoding was described above.	The '386 expands on
this a little with the addition of two segment registers:
 
SEGMENT REGISTER (s):	   0   1   2   3   4   5   6   7
   SR = Segment register  ES  CS  SS  DS  FS  GS <Reserved>
 
In TABLE 1, note that the addresses encoded on modes 0m, 1m, 2m are the same
regardless of whether you're referring to Eb or Ew.  What distinguishes them
is the size of the object being pointed to and this can be explicitly
indicated in traditional '86 assemblers like the following examples:
 
		    byte ptr [BP]
		    word ptr [BX + DI]
 
As explained before, all addresses, except those involving BP refer to the
data segment, DS.  All the BP's refer to the stack segment, SS.  This
is about to be explained.
 
(3) Segmentation and Registers
   The 80x86 was designed with more or less specific uses for its registers.
In fact, the names are supposed to reflect their main uses:
 
		 AX (AH:AL) = Accumulator
		 BX (BH:BL) = Base Register
		 CX (CH:CL) = Counting Register
		 DX (DH:DL) = Data Register
 
    CS = Code Segment -- where constants and programs lie.
    DS = Data Segment -- where static variables lie.
    SS = Stack Segment -- where auto variables and function parameters lie.
	 SP, BP = Stack and Frame Pointers, used to segment out the
		  local variables and function parameters.
    ES = Extra Segment -- used in combination with the index registers for
	 string operations as follows:
	 DS:[SI] -- points to the Source of the string operation.
	 ES:[DI] -- points to the Destinction of the string operation.
 
The typical setup for the stack is as follows:
 
      High Addresses	   FUNCTION DEFINITION:    FUNCTION CALL:
      ...		   mov BP, SP		   push Parameters
      Parameters	   push BP		   call Function
      Return Address	   sub SP, Locals
BP -> Old BP		   ... function ...
      Local Variables	   mov SP, BP
SP -> ...		   pop BP
      Low Addresses	   ret Parameters
 
this dictates a certain protocol in calling functions with parameters and
returning from them, as shown above.  In fact, this is so much so that the
opening and closing sequences above have all been defined as single operations
starting with the 80286 so that the function definition above can be rewritten
as:
			   FUNCTION DEFINITION:
			   enter Locals, 0
			   ... function ...
			   leave
			   ret Parameters
 
(4) Word and Address Size on the 80386 and Above
   Starting with the 80386, operations can be done with not just 16-bit words
but also 32 bit words.	Generally the same operation is defined for both sets
and context is used to determine which is which in the following two ways:
 
      * Which mode the machine is running in
	Protected mode -- both word sizes and address sizes are 32-bits
	Real & Virtual modes -- 16-bits.
      * The presence of certain prefixes to override either the default
	word size, address size or both on an instruction-by-instruction
	basis.
 
   (a) Word Size
   When the word size for the current operation is 32-bits, everything listed
above as "word" is interpreted as 32-bits, including registers.  The register
numbering corresponding to this word size is:
 
REGISTER (r):		       0   1   2   3   4   5   6   7
   Rb = Byte-sized register   AL  CL  DL  BL  AH  CH  DL  BH
   Rd = Dword-sized register EAX ECX EDX EBX ESP EBP ESI EDI
 
   (b) Address Size
   When the address size is switched to 32-bits, the address scheme listed in
TABLE 1 is altered in its entirety.
 
   TABLE 2: 32-BIT ADDRESSING MODE (x, m):	Encoding of scaled index SI:
   x  m    Disp  Eb  Ew 			si	 SI
   -------------------- 			---------------
   0  6     Dw	 DS:[Dw]			s0    EAX * 2^s
   0  4 sir	 [Rd + SI + 0]			s1    ECX * 2^s
   1  4 sir Dc	 [Rd + SI + Dc] 		s2    EDX * 2^s
   2  4 sir Dw	 [Rd + SI + Dw] 		s3    EBX * 2^s
   0  r 	 [Rd + 0]  (except r = 4)	04	  0
   1  r     Dc	 [Rd + Dc] (except r = 4)	s5    inhibits Rd
   2  r     Dw	 [Rd + Dw] (except r = 4)	s6    ESI * 2^s
   3  r 	 Rb  Rw 			s7    EDI * 2^s
 
The encodings si = 14, 24 and 34 remain undefined.
 
   This alteration is INDEPENDENT of the word size setting.  That means that
even the "Dw"'s, "Rw"'s in the chart above will vary in interpretation as
16-bit or 32-bit objects depending on the word size setting.  That leads to
4 possible combinations, not just 2.
 
EXAMPLE 5:  The opcode sequence 211 135 375
   This is the operation
		      mov Ew, Rw
where xm = 15, r = 3 and Disp = -3.  The 4 combinations are:
 
Addr-Size  Word-Size Operation
   16	      16     mov word ptr [DI - 3], BX
   16	      32     mov dword ptr [DI - 3], EBX
   32	      16     mov word ptr [EBP - 3], BX
   32	      32     mov dword ptr [EBP - 3], EBX
 
EXAMPLE 6: The opcode sequence 211 134 302 375 with 32-bit addressing.
   This is the move instruction where xm = 14 and r = 3.
 
		    mov Ew, [E]BX     ([E]BX since r = 3)
 
It uses the indexed register addressing.  The address, Ew, may be derived
as follows:
	     x m  sir Disp	Ew		       Comments
	     1 4  sir Dc   [EDX + SI + Dc]
	     1 4  si2 375  [EDX + SI - 3]	 (Rd = EDX for r = 2)
	     1 4  302 375  [EDX + 8*EAX - 3]	 (SI = 8*EAX for si = 30)
 
Therefore, this instruction represents one of the following:
 
	 Word-Size     Operation: 211 134 302 375
	    16	       mov word ptr [EDX + 8*EAX - 3], BX
	    32	       mov dword ptr [EDX + 8*EAX - 3], EBX
 
(5) The Opcode Summary
   The chart below summarises all the machine instructions.  The following
abbreviations are used:
 
Registers:		   Immediate Data Constant:
   Rb (byte sized)	   Db (byte sized)
   Rw (word sized)	   Dw (word sized)
   Rd (dword sized)	   Dc (signed byte)
 
Register/Memory Address:   Relative Code Address:
   Eb (byte sized)	   Cb (byte sized)
   Ew (word sized)	   Cw (word sized)
 
Memory Address: 		Code Address:
   Es (16 bit selector) 	Af (32/48 bit absolute far code address)
   En (near 16/32 bit pointer)
   Ef (far i32/48 bit pointer)
   Ep (pointer to 6-byte object)
   Ea (generic address)
 
Processor Extensions:
* = 80186 extension
$ = 80286 extension
# = 80386 extension
@ = 80486 extension
 
The switch between 16 and 32 bit word size affects all operands labeled
Rw, Ew, Dw, Cw, En and even Af and Ef.	The latter two objects refer to
far code addresses which are 4 bytes when the word size is 16 bits, and
6 bytes else.
 
The only such operands not actually affected by the word-size switch are
those whose size a consequence of the operation's meaning.  These include
the following: RET, BOUND, ARPL, SMSW, LMSW, LAR and LSL.
 
The switch between 16 and 32 bit address size affects all the operands
labeled Eb, Ew, Es, En, Ef, Ep, and Ea.  Each of these is interpreted
according to the xm digts in the opcode according to either the 16-bit
addres table described near the start of the article or the 32-bit address
table just described above.
 
NOTE: In the following presentation everything is in octal.
 
ARITHMETIC & LOGIC
------------------
Comments:
   * All of these operations affect all 6 arithmetic flags, except NOT (which
     affects no flags), and INC and DEC (which don't affect CF).
   * IMUL and MUL only affect CF and OF predictably.
   * IDIV and DIV affect no flags predictably.
   * AND, OR, XOR, and TEST all set CF and OF to 0 and alter AF unpredictably.
   * CMP and TEST have no affect on any operands.  They're used for setting
     flags.  CMP is used for doing relational operators (< > <= >= == !=), and
     TEST for doing bit-testing.
   * CMP and TEST can have their operands listed in either order.
P Op	       Description
0 ADD L, E     L += E
2 ADC L, E     L += E + CF
5 SUB L, E     L -= E
3 SBB L, E     L -= E + CF
7 CMP L, E     (void)(L - E)
1 OR L, E      L |= E
4 AND L, E     L &= E
6 XOR L, E     L ^= E
  0P0 xrm	   Op Eb, Rb
  0P1 xrm	   Op Ew, Rw
  0P2 xrm	   Op Rb, Eb
  0P3 xrm	   Op Rw, Ew
  0P4 Db	   Op AL, Db
  0P5 Dw	   Op AX, Dw
  200 xPm Db	   Op Eb, Db
  201 xPm Dw	   Op Ew, Dw
  203 xPm Dc	   Op Ew, Dc
 
NOT L	       L = ~L
  366 x2m	   not Eb
  367 x2m	   not Ew
NEG L	       L = -L
  366 x3m	   neg Eb
  367 x3m	   neg Ew
 
INC L	       L++
  10r		   inc Rw
  376 x0m	   inc Eb
  377 x0m	   inc Ew
DEC L	       L--
  11r		   dec Rw
  376 x1m	   dec Eb
  377 x1m	   dec Ew
 
TEST L, E      (void)(L&E)
  204 xrm	   test Rb, Eb
  205 xrm	   test Rw, Ew
  250 Db	   test AL, Db
  251 Dw	   test AX, Dw
  366 x0m Db	   test Eb, Db
  367 x0m Dw	   test Ew, Dw
 
IMUL L, E, D   L = (signed)E*D
IMUL L, E      L = (signed)L*E
# 017 257 xrm Dw   imul Rw, Ew
* 151 xrm Dw	   imul Rw, Ew, Dw
* 153 xrm Db	   imul Rw, Ew, Dc
 
In the following operations:
	 Operand Size	ACC'     ACC
	      1 	AX	 AL
	      2        DX:AX	 AX
	      4       EDX:EAX	EAX
P Op	       Description
4 MUL E        ACC' = (unsigned) ACC*E
5 IMUL E       ACC' = (signed)   ACC*E
6 DIV E        ACC' = (unsigned) ACC%E : ACC/E
7 IDIV E       ACC' = (signed)   ACC%E : ACC/E
  366 xPm	   Op Eb
  367 xPm	   Op Ew
 
SHIFTS & ROTATIONS
------------------
Comments:
   * Where applicable, N is masked off by 0x1f.
   * For Rxx and Sxx, OF is predictably affected only when N is 1.
   * SHLD and SHRD affect all 6 arithmetic flags, but OF and AF unpredictably.
   * RxL: OF = (CF != high order bit of L) before shift
   * RxR: OF = (high order bit of L != next high order bit of L) before shift
   * SxL: OF = (CF != sign bit of L) after shift
   * SxR: OF = (sign bit of L) after shift
P Op	       Description
0 ROL	       CF <- [<-<-<-] <- high order bit   Rotate
1 ROR	       low order bit -> [->->->] -> CF
2 RCL	       CF <- [<-<-<-] <- CF		  Rotate Through CF
3 RCR	       CF -> [->->->] -> CF
4 SHL	       CF <- [<-<-<-] <- 0		  Shift (unsigned)
5 SHR		0 -> [->->->] -> CF
4 SAL	       CF <- [<-<-<-] <- 0		  Shift (signed)
7 SAR	       sign bit -> [->->->] -> CF
* 300 xPm Db	   Op Eb, Db
* 301 xPm Db	   Op Ew, Db
  320 xPm	   Op Eb, 1
  321 xPm	   Op Ew, 1
  322 xPm	   Op Eb, CL
  323 xPm	   Op Ew, CL
 
SHLD L, E, N   CF:L = L:E << N
SHRD L, E, N   L:CF = E:L >> N
# 017 244 Db	   shld Ew, Rw, Db
# 017 245	   shld Ew, Rw, CL
# 017 254 Db	   shrd Ew, Rw, Db
# 017 255	   shrd Ew, Rw, CL
 
TYPE CONVERSIONS
----------------
[] Decimal Conversions
Comments:
   * DAA and DAS are used for adjusting the results of addition and subtraction
     respectively back to packed BCD format.  They will alter all 6 of the
     arithmetic flags, OF unpredictably.
   * AAA, AAS, AAD, and AAM are used for adjusting the results of the four
     basic arithmetic operations back to unpacked BCD format or ASCII format.
     However, AAD is used *before* a divide operation.	They too affect all
     6 of the arithmetic flags, but only AF and CF predictably (for AAA and
     AAS) or SF, ZF and PF (for AAD and AAM).
   * In the following, A0 stands for the lower 4 bits of AL and A1 the upper
     4 bits of AL.
   * The binary codes for AAM and AAD each consist of an opcode followed by
     a constant 10 (012 in octal).  It has been said that this "10" is
     actually a hidden parameter to a more general AAD and AAM operator,
     which can actually be used for any base other than 10.  Some processors
     will not allow AAD to be generalized in this way, however.  The reason it
     was left out in the open like this was supposedly because the original
     8086 design literally ran out of space to pack in the opcode.
DAA	       if (A0 > 9) AF = 1;   if (AF) AL += (0x10 - 10);
	       if (A1 > 9) CF = 1;   if (CF) AL += (0x10 - 10)*0x10;
DAS	       if (A0 > 9) AF = 1;   if (AF) AL -= (0x10 - 10);
	       if (A1 > 9) CF = 1;   if (CF) AL += (0x10 - 10)*0x10;
AAA	       if (A0 > 9) AF = 1;  CF = AF;  if (CF) A0 += (0x10 - 10), AH++;
AAS	       if (A0 > 9) AF = 1;  CF = AF;  if (CF) A0 -= (0x10 - 10), AH--;
AAM	       AX = AL/10 : AL%10
AAD	       AX = (10*AH + AL)%0x10
  047		   daa
  057		   das
  067		   aaa
  077		   aas
  324 012	   aam
  325 012	   aad
 
[] Sign Conversions
Comments:
   * In converting from a shorter to longer operand size, sign conversion
     involves either taking the leading (sign) bit and replicating it leftward
     (conversion to signed), or placing zero's on the left (for conversion
     to unsigned).
MOVSX L, E     L = (signed)E
MOVZX L, E     L = (unsigned)E
# 017 266 xrm	   movsx Rw, Eb
# 017 267 xrm	   movsx Rw, Ew
# 017 266 xrm	   movzx Ew, Rb
# 017 277 xrm	   movzx Ew, Rw
 
CBW	       AX = (signed)AL
CWDE	       EAX = (signed)AX
CWD	       DX:AX = (signed)AX
CDQ	       EDX:EAX = (signed)EAX
  230		   cbw	/  (#) cwde
  231		   cwd	/  (#) cdq
 
[] Byte Ordering
   * Used to convert between "little Endian" (Intel byte ordering) and "big
     Endian" (Motorola byte ordering).  Typical use: networking applications.
BSWAP L        L[0]:L[1]:L[2]:L[3] = L[3]:L[2]:L[1]:L[0]
@  017 31r	   bswap Rd
 
[] Table Lookup
XLATB	       AL = [BX + AL]
   327		   xlatb
 
SEMAPHORES & SYNCHRONIZATION
----------------------------
Comments:
   * All these operations affect all 6 arithmetic flags.  BT, BTS, BTR, BTC
     affect only CF predictably; and BSF and BSR affect only ZF predictably.
   * ACC is either AL, AX or EAX in CMPXCHG, depending on the operand size.
   * WAIT is used in the '486 to force a pending unmasked interrupt from the
     internal floating point processing unit.
   * LOCK is a prefix used in multi-CPU contexts to assure exclusive access to
     memory for the following two-step read & modify operations:
	(INC, DEC, NEG, NOT) Mem       (ADD, ADC, SUB, SBB) Mem, Src
	(BT, BTS, BTR, BTC) Mem, Src   (AND, XOR, OR) Mem, Src
	      XCHG Reg, Mem		     XCHG Mem, Reg
     But XCHG automatically does its own LOCK so does not need to be prefixed.
P Op	       Description
4 BT L, N      CF = L.N;
5 BTS L, N     CF = L.N; L.N = 1;
6 BTR L, N     CF = L.N; L.N = 0;
7 BTC L, N     CF = L.N; L.N = !L.N;
#  017 2P3 xrm	   Op Ew, Rw
#  017 272 xPm Db  Op Ew, Db
 
BSF L, E       ZF = !E;  if (ZF) L = First 1-bit position in E; else L = ???
BSR L, E       ZF = !E;  if (ZF) L = Last 1-bit position in E; else L = ???
#  017 274 xrm	   bsf Rw, Ew
#  017 275 xrm	   bsr Rw, Ew
 
CMPXCHG L, E   ZF = (ACC == L);  if (ZF) L = E; else ACC = L;
@  017 246 xrm	   cmpxchg Eb, Rb
@  017 247 xrm	   cmpxchg Ew, Rw
XADD L, L'     <L, L'> = <L + L', L>
@  017 300 xrm	   xadd Eb, Rb
@  017 301 xrm	   xadd Ew, Rw
 
NOP	       Delay 1 cycle.
WAIT	       Wait for coprocessor unit.
LOCK	       Hardware memory bus semaphore.
HLT	       Wait for a reset or interrupt.
   220		   nop
   233		   wait
   360		   lock
   364		   hlt
 
INT N	       push [E]FLAGS, CS, [E]IP; TF = 0;
	       if (the Nth entry in the IDT is a Interrupt Gate) IF = 0;
	       jmp to the far address listed under the Nth entry in the IDT
INTO	       if (OF) INT 4
IRET	       if (NT) return to task listed under TSS.BackLink;
	       else pop [E]IP, CS, [E]FLAGS;
   314		   int 3
   315 Db	   int Db
   316		   into
   317		   iret
 
FLAGS
-----
Comments:
   * No flags are affected except the explicit moves to the FLAGS register:
     POPF[D] and SAHF, but SAHF only sets the arithmetic flags (except OF).
POPF	       pop FLAGS
POPFD	       pop EFLAGS
PUSHF	       push FLAGS
PUSHFD	       push EFLAGS
SAHF	       FLAGS |= (AH & 0xd5)
LAHF	       AH = FLAGS;
   234		   pushf / (#) pushfd
   235		   popf  / (#) popfd
   236		   sahf
   237		   lahf
CMC	       CF = !CF
CLC	       CF = 0
STC	       CF = 1
CLI	       IF = 0 (Interrupts off)
STI	       IF = 1 (Interrupts on)
CLD	       DF = 0 (Set string ops to increment)
STD	       DF = 1 (Set string ops to decrement)
   365		   cmc
   370		   clc
   371		   stc
   372		   cli
   373		   sti
   374		   cld
   375		   std
 
CONDITIONAL OPERATIONS
----------------------
(NOTE: The values listed for CC are in octal).
 
CC   Condition(s)  Definition	    Descriptions
07   A	NBE	   !CF && !ZF	    x > y   x > 0  (unsigned)
03   AE NB	   !CF		    x >= y  x >= 0 (unsigned)
02   B	NAE	    CF		    x < y   x < 0  (unsigned)
06   BE NA	    CF || ZF	    x <= y  x <= 0 (unsigned)
17   G	NLE	    SF == OF && !ZF x > y   x > 0  (signed)
15   GE NL	    SF == OF	    x >= y  x >= 0 (signed)
14   L	NGE	    SF != OF	    x < y   x < 0  (signed)
16   LE NG	    SF != OF || ZF  x <= y  x <= 0 (signed)
04   E	Z	    ZF		    x == y  x == 0
05   NE NZ	   !ZF		    x != y  x != 0
00   O		    OF		    Overflow (signed overflow)
01   NO 	   !OF		    No overflow (signed overflow)
02   C		    CF		    Carry (unsigned overflow)
03   NC 	   !CF		    No carry (unsigned overflow)
10   S		    SF		    (Negative) sign
11   NS 	   !SF		    No (negative) sign
12   P	PE	    PF		    Parity [even]
13   NP PO	   !PF		    No parity (parity odd)
CC   cc 	   Cond.
 
Jcc Rel        if (Cond) EIP += Rel;
SETcc L        L = (Cond)? 1: 0;
#  017 200+CC Cw   jcc Cw
#  017 220+CC x0m  setcc Rb
   160+CC	   jcc Cb
 
STACK OPERATIONS
----------------
Comments:
   * PUSHA[D] uses the value SP had before the operation started.
   * POPA[D] doesn't actually affect [E]SP, which is why it's bracketed out.
   * POP CS is not allowed because it's already subsumed by the RET (far)
     operation.  Instead, 017 is used as a 2-byte operation prefix.
   * POP SS inhibits interrupts in order to allow [E]SP to be altered in the
     following operation -- for what should be obvious reasons.
PUSH E	       SP -= sizeof E; SS:[SP] = E;
PUSHA	       push AX, CX, DX, BX, SP, BP, SI, DI
PUSHAD	       push EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI
#  017 240	   push FS
#  017 250	   push GS
   0s6		   push SR   (s = 0-3)
   12r		   push Rw
*  140		   pusha / (#) pushad
   150 Dw	   push Dw
   152 Dc	   push Dc
   377 x6m	   push Ew
POP L	       L = SS:[SP]; SP += sizeof L;
POPA	       pop DI, SI, BP, (SP), BX, DX, CX, AX
POPAD	       pop EDI, ESI, EBP, (ESP), EBX, EDX, ECX, EAX
#  017 241	   pop FS
#  017 251	   pop GS
   0s7		   pop SR    (s = 0, 2-3)
   13r		   pop Rw
*  141		   popa  / (#) popad
   217 x0m	   pop Ew
 
TRANSFER OPERATIONS
-------------------
Comments:
   * XCHG can have its operands listed in either order.
   * MOV CS, ... is not allowed, since this is already subsumed by JMPs.
   * LCS ... is not allowed either for the same reason.
   * XCHG AX, AX is one and the same as NOP.
XCHG L, E      <L, E> = <E, L>
   206 xrm	   xchg Rb, Eb
   207 xrm	   xchg Rw, Ew
   22r		   xchg AX, Rw	(r != 0)
MOV L, E       L = E;
   210 xrm	   mov Eb, Rb
   211 xrm	   mov Ew, Rw
   212 xrm	   mov Rb, Eb
   213 xrm	   mov Rw, Ew
   214 xsm	   mov Es, SR	(s = 0-3,   (#) 4-5)
   216 xsm	   mov SR, Es	(s = 0,2-3, (#) 4-5)
   240 Dw	   mov AL, [Dw]
   241 Dw	   mov AX, [Dw]
   242 Dw	   mov [Dw], AL
   243 Dw	   mov [Dw], AX
   26r Db	   mov Rb, Db
   27r Dw	   mov Rw, Dw
   306 x0m Db	   mov Eb, Db
   307 x0m Dw	   mov Ew, Dw
LEA L, An      L = &An;
   215 xrm	   lea Rw, En  (x != 3)
LSeg L, Af     Seg:L = &Af;
#  017 262 xrm	   lss Rw, Ef  (x != 3)
#  017 264 xrm	   lfs Rw, Ef  (x != 3)
#  017 265 xrm	   lgs Rw, Ef  (x != 3)
   304 xrm	   les Rw, Ef  (x != 3)
   305 xrm	   lds Rw, Ef  (x != 3)
 
ADDRESSING
----------
Comments:
   * The current mode of the machine determines its default mode (16 or 32
     bits).
   * RAND: and ADDR: (not an standard name, since Intel has none) are
     prefixes that alter the default for the next instruction only.
   * RAND: changes the word size between 16 and 32 bits.
   * ADDR: changes the address size between 16 and 32 bits.
   * seg: cannot override the implied ES:[DI] operand in any string op,
     but can override the DS in the implied DS:[SI] operands there.
seg:	       Segment override prefix
ADDR:	       Address size toggle
RAND:	       Operand size toggle
   305 xrm	   lds Rw, Ef  (x != 3)
   046		   ES:
   056		   CS:
   066		   SS:
   076		   DS:
#  144		   FS:
#  145		   GS:
#  146		   RAND:
#  147		   ADDR:
 
PORT I/O
--------
Comments:
   * In protected mode the user of these operations must pass the I/O
     Privilege Level (IOPL) else they are blocked by an interrupt.
     This allows the Operating System to spool I/O devices in a
     multitasking system (since the OS handles interrupts) to avoid having
     processes all trying to use the same device at once.
IN ACC, Port   ACC = IO[Port]
   344 Db	   in AL, Db
   345 Db	   in AX, Db
   354		   in AL, DX
   355		   in AX, DX
OUT Port, ACC  IO[Port] = ACC
   346 Db	   out Db, AL
   347 Db	   out Db, AX
   356		   out DX, AL
   357		   out DX, AX
 
STRING OPERATIONS
-----------------
Comments:
   * In all these operations below, Src denotes DS;[ESI] and Dest ES:[EDI].
   * Dest cannot be overridden by a segment prefix, only Src.
   * The pointes (ESI, EDI) are bumped up (DF = 0) or down (DF = 1) after
     the operation by sizeof Operand.
   * ACC is either AL, AX or EAX depending on the operand size.
   * The flags altered are exactly those altered by the corresponding
     MOV, IN, OUT, or CMP operation (namely: only SCAS and CMPS alter the
     flags and in the same way as CMP) and these are therefore the only ones
     that can be prefixed by REP[N]E/REP[N]Z.
   * REP with all string ops, but REP LODS doesn't do anything sensible.
INS	       in Dest, DX
OUTS	       out DX, Src
MOVS	       mov Dest, Src
CMPS	       cmp Dest, Src
STOS	       mov Dest, ACC
LODS	       mov ACC, Src
SCAS	       cmp ACC, Dest
*  154		   insb
*  155		   insw  / (#) insd
*  156		   outsb
*  157		   outsw / (#) outsd
   244		   movsb
   245		   movsw / (#) movsd
   246		   cmpsb
   247		   cmpsw / (#) cmpsd
   252		   stosb
   253		   stosw / (#) stosd
   254		   lodsb
   255		   lodsw / (#) lodsd
   256		   scasb
   257		   scasw / (#) scasd
REP Op	       while (CX-- > 0) Op
REPE /REPZ Op  while (CX-- > 0 && ZF) Op
REPNE/REPNZ Op while (CX-- > 0 && !ZF) Op
   362		   repne / repnz / rep
   363		   repe  / repz
 
CONTROL FLOW
------------
Comments:
   * The distinction between near and far jumps/calls/returns is built right
     into the 8086 language, which pretty much forces you to explicitly
     declare a routine as "near" or "far" and be consistent about it.  The
     intended usage runs pretty much like C's static vs. global functions,
     with each C file being analogous to an 8086 segment.
   * The 8086 was specifically designed to be a Pascal (and PL/I) machine,
     though.  Intel wrongly assumed that one of these languages would become
     like C is now.  So the ENTER and LEAVE operators were added (and BOUND
     to do array bounds-checking).  The segmentation structure was intended
     to support these types of languages.
JCXZ Rel       if (!CX) IP += Rel;
JECXZ Rel      if (!ECX) IP += Rel;
LOOPcc Rel     if (!--CX && cc) IP += Rel;
   340 Cb	   loopnz Cb / loopne Cb
   341 Cb	   loopz Cb  / loope Cb
   342 Cb	   loop Cb
   343 Cb	   jcxz Cb   / (#) jecxz Cb
JMP Rel        IP += Rel;
JMP FAR Af     CS:IP = Af;
CALL Rel       push IP;      IP += Rel;
CALL FAR Af    push CS, IP;  IP = Af;
   232 Af	   call Af
   350 Cw	   call Cw
   351 Cw	   jmp Cw
   352 Af	   jmp far Af
   353 Cb	   jmp Cb
   377 x2m	   call En
   377 x3m	   call far Ef
   377 x4m	   jmp En
   377 x5m	   jmp far Ef
RET Params     pop IP;	     SP += Params (default: Params = 0)
RET FAR Params pop IP, CS;   SP += Params (default: Params = 0)
   302 Dw	   ret Dw
   303		   ret
   312 Dw	   ret far Dw
   313		   ret far
ENTER Locs, N push EBP;
	      (sub EBP, 4;  push [EBP]) N-1 times, if N > 0
	      mov EBP, ESP
	      (add EBP, 4*(N-1);  push EBP),	   if N > 0
	      sub ESP, Locs
LEAVE	      mov ESP, EBP;   pop EBP
*  310 Dw Db	   enter Dw, Db
*  311		   leave
 
SYSTEM CONTROL & MEMORY PROTECTION
----------------------------------
BOUND A, AA   if (A not in range AA[0]..AA[1]) INT 5
ARPL L, E     ZF = (L.RPL < E.RPL);
	      if (ZF) L.RPL = E.RPL;
*  142 xrm	   bound Rw, Ed
$  143 xrm	   arpl Es, Rw
 
SLDT Sel      Sel = LDTR
STR Sel       Sel = TR
LLDT Sel      LDTR = Sel
LTR Sel       TR = Sel
VERR Sel      ZF = (Sel is accessible and has read-access)
VERW Sel      ZF = (Sel is accessible and has write-access)
LAR L, Sel    ZF = (Sel is accessible);
	      if (ZF) L = the access rights of Sel's descriptor.
LSL L, Sel    ZF = (Sel is accessible);
	      if (ZF) L = the segment limit of Sel's descriptor.
$  017 000 x0m	   sldt Ew
$  017 000 x1m	   str Ew
$  017 000 x2m	   lldt Ew
$  017 000 x3m	   ltr Ew
$  017 000 x4m	   verr Ew
$  017 000 x5m	   verw Ew
$  017 002 xrm	   lar Rw, Ew
$  017 003 xrm	   lsl Rw, Ew
 
SGDT Desc     Desc = GDTR
SIDT Desc     Desc = IDTR
LGDT Desc     GDTR = Desc
LIDT Desc     IDTR = Desc
$  017 001 x0m	   sgdt Ep
$  017 001 x1m	   sidt Ep
$  017 001 x2m	   lgdt Ep
$  017 001 x3m	   lidt Ep
 
SMSW L	      L = MSW ... note that MSW is CR0 bits 0-15.
LMSW E	      MSW = E
CLTS	      MSW.3 = 0 ... clears the Task Switched flag.
$  017 001 x4m	   smsw Ew
$  017 001 x6m	   lmsw Ew
$  017 006	   clts
 
INVD	      Invalidate internal cache.
WBINVD	      Invalidate internal cache, after writing it back.
INVLPD Ea     Invalidate Ea's page.
@  017 010	   invd
@  017 011	   wbinvd
@  017 020 x7m	   invlpg Ea
 
MOV Reg, SysReg
MOV SysReg, Reg
#  017 040 3nr	   mov Rd, CRn	 (n = 0-3)
#  017 041 3nr	   mov Rd, DRn	 (n = 0-3, 6-7)
#  017 042 3nr	   mov CRn, Rd	 (n = 0, 2-3)
#  017 043 3nr	   mov DRn, Rd	 (n = 0-3, 6-7)
#  017 044 3nr	   mov Rd, TRn	 (n = 6-7)
#  017 046 3nr	   mov TRn, Rd	 (n = 6-7)
 
CO-PROCESSOR ESCAPE SEQUENCE
----------------------------
Comments:
   * This escape sequence is intended to be used with an external co-processor
     with the most common application being the 80x87 floating point unit.
   * Starting in the 80486, the floating point unit was made internal to the
     processor.
 
ESC TL, Ea  Escape, operation TL, address mode Ea.
   33T xLm	   esc TL Ea
 
(6) Floating Point Operations
   The Floating Point unit consists of 8 internal registers arranged in a
circular stack, and the Control Word (CW), Status Word (SW) and Tag Word (TW)
registers.  The floating point stack registers all store data in Real80
format (described below).
   Operations are carried out on data in the following formats (low-order bits
on right):
 
   INTEGER: 16/32/64 bits (Int16, Int32, Int64)
   BCD: (BCD80)
	 S 0000000  D D D D D D D D D D D D D D D D D D
	   S = 1-bit sign (1 = negative, 0 = positive)
	   D = 4-bit digit (encodes digits 0-9).
   FLOATING POINT: 32/64/80 bits (Real32, Real64, Real80)
	 S Exponent Mantissa
	   S = 1-bit sign (1 = negative, 0 = positive)
	   Exponent = 8/11/15 bit biased exponent
	   Mantissa = 23/52/64 bit decimal fraction.
   The values of floating point numbers in each format are as follows:
	Real32: (-1)^S (1 + Mantissa)/2^23 x 2^(Exponent - 127)
	Real64: (-1)^S (1 + Mantissa)/2^52 x 2^(Exponent - 1023)
	Real80: (-1)^S	   Mantissa/2^63   x 2^(Exponent - 16383)
 
The floatng point formats do not cover all the logical combination of binary
0's and 1's, and the remaining combinations are defined for special purposes:
 
	 Sign  Exponent      Mantissa	   Meaning
	  S   0 0 0 ... 0   0 0 0 ... 0       0
	  S   0 0 0 ... 0   ... 1 ...	   DENORMAL (Infinitesimal)
	  S   1 1 1 ... 1   0 0 0 ... 0    INFINITY
	  S   1 1 1 ... 1   0 ... 1 ...    Signalling NaN (Not a Number)
	  S   1 1 1 ... 1   1 ...	   Quiet NaN
 
This is all IEEE standard format.  Quiet NaN's are set by the FP Unit to
indicate invalid operations.
 
Notation:
ST(n) -- the nth item below the stack top.
ST ----- ST(0), the stack top.
Int*, BCD*, Real* -- described above.
 
All Int*, BCD*, and Real* operands are stored in memory and are encoded in
the 80x86's current addressing mode (16 or 32 bit).  All opcodes are
listed in the format:
 
		  T L xm    for    8086 escape code 33T xLm
 
Since only memory addresses are used in the operations, that frees up all
the combinations xm where x = 3.  These are generally used to encode the
operations that do not involve memory addresses.  In the following
presentation where "xm" is listed generally, it is understood that x is not 3.
 
   The operations FENI, FDISI are specific to the 8887; FSETPM to the 80287
and FUCOM*, FPREM1, and the trig. operations FSIN, FCOS, FSINCOS are all
present only in the 80387 and after.
 
DATA TRANSFER
-------------
Comments:
   * The followng table is used:
      P    0	2    3
     F-OP fld  fst  fstp
     I-OP fild fist fistp
FLD Arg       ST = (Real80)Arg
FST Arg       Arg = (typeof Arg)ST
FSTP Arg      Arg = (typeof Arg)ST; pop();
FXCH Arg      Arg <--> ST, with appropriate type conversions.
   1 P xm	   F-OP Real32
   3 P xm	   I-OP Int32
   5 P xm	   F-OP Real64
   7 P xm	   I-OP Int16
   3 5 xm	   fld Real80
   3 7 xm	   fstp Real80
   7 4 xm	   fbld BCD80
   7 5 xm	   fild Int64
   7 6 xm	   fbstp BCD80
   7 7 xm	   fistp Int64
   1 0 3m	   fld ST(m)
   1 1 3m	   fxch ST(m)
   5 2 3m	   fst ST(m)
   5 3 3m	   fstp ST(m)
 
COMPARISON
----------
Comments:
   * The followng table is used:
      P    2	 3
     F-OP fcom	fcomp
     I-OP ficom ficomp
FCOM Arg      cmp ST, Arg
FCOMP Arg     cmp ST, Arg; pop();
   0 P xm	   F-OP Real32
   2 P xm	   I-OP Int32
   4 P xm	   F-OP Real64
   6 P xm	   I-OP Int16
   0 P 3m	   F-OP ST(m)
FCOMPP	      cmp ST, ST(1); pop(); pop();
   6 3 31	   fcompp
FTST	      cmp ST, 0.0
   1 4 34	   ftst
FXAM	      examine ST
   1 4 35	   fxam
FUCOM Arg     unordered compare ST, Arg
FUCOMP Arg    unordered compare ST, Arg; pop();
FUCOMPP Arg   unordered compare ST, ST(1); pop(); pop();
   5 4 3m	   fucom ST(m)
   5 5 3m	   fucomp ST(m)
   2 5 31	   fucompp
 
ARITHMETIC OPERATIONS
---------------------
Comments:
   * The followng table is used:
      P    0	  1	 4	5      6       7
     F-OP fadd	 fmul	fsub   fsubr  fdiv   fdivr
     I-OP fiadd  fimul	fisub  fisubr fidiv  fidivr
     P-OP faddp  fmulp	fsubp  fsubrp fdivp  fdivrp
   * Dest is ST and Src the listed operand except where noted below.
FADD Arg      Dest += Src
FSUB Arg      Dest += Src
FSUBR Arg     Dest = Src - Dest
FMUL Arg      Dest *= Src
FDIV Arg      Dest /= Src
FDIVR Arg     Dest = Src/Dest
   0 P xm	   F-OP Real32
   2 P xm	   I-OP Int32
   4 P xm	   F-OP Real64
   6 P xm	   I-OP Int16
   0 P 3m	   F-OP ST(m)
   4 P 3m	   F-OP ST(m)	(Dest = ST(m), Src = ST)
   6 P 3m	   P-OP ST(m)	(Dest = ST(m), Src = ST)
 
CONSTANTS
---------
FLD1	      ST = 1.0
FLDL2T	      ST = log_2(10)
FLDL2E	      ST = log_2(e)
FLDPI	      ST = pi
FLDLG2	      ST = log_10(2)
FLDLN2	      ST = ln(2)
FLDZ	      ST = 0.0
   1 5 30	   fld1
   1 5 31	   fldl2t
   1 5 32	   fldl2e
   1 5 33	   fldpi
   1 5 34	   fldlg2
   1 5 35	   fldln2
   1 5 36	   fldz
 
BUILT-IN FUNCTIONS
------------------
Comments:
   * The stack replacements entail pop()'s.
FCHS	      ST = -ST
FABS	      ST = |ST|
F2XM1	      ST = 2^ST - 1
FYL2X	      Replace the stack: ST(1), ST -> ST(1)*log_2(ST)
FPTAN	      Replace the stack: ST -> tan(ST), 1.0
FPATAN	      Replace the stack: ST(1), ST -> atan(ST(1)/ST)
FXTRACT       Replace the stack: ST -> exponent(ST), mantissa(ST)
FPREM1	      ST = remainder(ST/ST(1)), IEEE consistent
FPREM	      ST = remainder(ST/ST(1))
FYL2XPI       Replace the stack: ST(1), ST -> ST(1)*log_2(ST + 1)
FSQRT	      ST = sqrt(ST)
FSINCOS       Replace the stack: ST -> sin(ST), cos(ST)
FRNDINT       ST = round(ST)
FSCALE	      ST *= 2^(int)ST(1)
FSIN	      ST = sin(ST)
FCOS	      ST = cos(ST)
   1 4 30	   fchs
   1 4 31	   fabs
   1 6 30	   f2xm1
   1 6 31	   fyl2x
   1 6 32	   fptan
   1 6 33	   fpatan
   1 6 34	   fxtract
   1 6 35	   fprem1
   1 7 30	   fprem
   1 7 31	   fyl2xpi
   1 7 32	   fsqrt
   1 7 33	   fsincos
   1 7 34	   frndint
   1 7 35	   fscale
   1 7 36	   fsin
   1 7 37	   fcos
 
CONTROL
-------
Comments:
   * The save and load operations for the environment and state are used
     primarily for multitasking applications where 2 or more processes are
     using the FP unit concurrently.
FNOP	      Delay 1 cycle.
FLDENV Arg    Load FP environment from [Arg]
FLDCW Arg     CW = Arg
FSTENV Arg    Save FP environment to [Arg]
FSTCW Arg     Arg = CW
FDECSTP       TOP = (TOP - 1) mod 8
FINCSTP       TOP = (TOP + 1) mod 8
FENI	      Enable interrupts (8087 only)
FDISI	      Disable interrupts (8087 only)
FCLEX	      Clear out FP exception flags
FINIT	      Initialize FP registers
FSETPM	      Enter Protected Mode (80287 only)
FFREE ST(m)   Mark register m as unused.
FRSTOR Arg    Restore FP state from [Arg]
FSAVE Arg     Save FP state to [Arg]
FSTSW Arg     Arg = SW
   1 2 30	   fnop
   1 4 xm	   fldenv Ea
   1 5 xm	   fldcw Ea
   1 6 xm	   fstenv Ea
   1 7 xm	   fstcw Ea
   1 6 36	   fdecstp
   1 6 37	   fincstp
   3 4 30	   feni
   3 4 31	   fdisi
   3 4 32	   fclex
   3 4 33	   finit
   3 4 34	   fsetpm
   5 0 3m	   ffree ST(m)
   5 4 xm	   frstor Ea
   5 6 xm	   fsave Ea
   5 7 xm	   fstsw Ea
   7 4 30          fstsw AX

