Intel 80x86Assembly Language

The assembly language for the Intel 80x86 architecture is significantly more sophisticated than the assembly language for our IJVM machine. You might want to look at a description of the Intel instruction set before continuing.

Basic Format

All of our assembly language programs will have a similar format:

#make_COM#
ORG 100h

< Code goes here >

< Variable Declarations go here >

end

Variable Declarations

Variable declarations allocate space for a value that can be accessed and/or modified by our assembly language program. Unlike the IJVM architecture, there are no constants at the Intel assembly language level. In a high-level language such as C++ or Java, it is a syntax error to modify a variable that has been declared as a constant. Consequently, it is the compiler that enforces the rule that a constant cannot be modified by the program. In an assembly language program, it is up to the programmer to make sure that no instruction modifies a memory location that is being interpreted as a constant.

Byte Variables (8-Bits)

Syntax: name DB value, where name is the name of the variable and value is its initial value. "DB" means "define byte".

Word Variables (16-Bits)

Syntax: name DW value, where name is the name of the variable and value is its initial value. "DW" means "define word".

Assembly language is not case sensitive so, for example, DB can be entered as db (or dB or Db, for that matter). Byte values can be entered in decimal, hexadecimal, octal, binary, or as a character. The only difference with 2-byte variables is that you cannot use a character (since, by definition, a character is only one byte).

In the following assembly language program The first five declarations (A-E) all initialize the variable to decimal 65. The remaining four declarations all initialize the variable to decimal 66.

#make_COM#
include 'emu8086.inc'
ORG 100h
 
mov AX, 0
ret
 
A db 65        ;decimal
B db 41h       ;hexadecimal
C db 101o      ;octal
D db 1000001b  ;binary
E db 'A'       ;character
 
F dw 66
G dw 42h
H dw 102o
I dw 1000010b
 
end

This program doesn't really do anything. Because a program must include at least one executable statement (or the assembler won't generate any code), I've added an instruction to move a zero into the AX register. If you compile and load this program into the emulator, you'll see the following:

EMU Screen

The MOV instruction is a 3-byte instruction (B8h is the opcode and the 2-byte immediate mode operand is 0000h). The RET instruction at address 0103h is a 1-byte instruction (C3h is the opcode). The 1-byte variables A through E are stored at addresses 0104h through 0108h (the upper red rectangle) and all have the same value (41h which is 65 in decimal which is the ASCII code for 'A').

The 2-byte variables F through I are stored at addresses 0109h through 0110h (the lower red rectangle). Variable F is stored in 0109h and 010Ah with the low-order byte (42h) in 0109h and the high-order byte (00h) in 010Ah. Notice that the remaining three 2-byte variables have exactly the same value.

Example 1: Basic Arithmetic and Logic

The first IJVM program we looked at is given below. Next to the IJVM assembly language code is the Intel assembly language code that will accomplish the same result. While both assembly language programs do the same thing, they are substantially different because the IJVM is a stack-based machine and the Intel machine is a traditional register-based machine. The IJVM program uses zero-address and one-address instructions. The Intel program uses two-address instructions (in this case, a register address and a memory address).

Java-Like      IJVM Assembly   Intel Assembly

                               #make_COM#
                               ORG 100h

C = A + B;     ILOAD A         mov ax, A
               ILOAD B
               IADD            add ax, B
               ISTORE C        mov C, ax
D = A - B;     ILOAD A         mov ax, A
               ILOAD B
               ISUB            sub ax, B
               ISTORE D        mov D, ax
E = A AND B    ILOAD A         mov ax, A
               ILOAD B
               IAND            and ax, B  
               ISTORE E        mov E, ax
F = A OR B     ILOAD A         mov ax, A  
               ILOAD B
               IOR             or  ax, B
               ISTORE F        mov F, ax  
               HALT            ret

                               A dw 129
                               B dw 127
                               C dw 0
                               D dw 0
                               E dw 0
                               F dw 0
                               
                               end

As written, the Intel program has 12 memory accesses (8 reads and 4 writes). It can be modified to run faster by reducing the number of memory accesses to a bare minimum (2 reads and 4 writes).

Example 2: If-Then-Else

Write a program that will implement the following Java-like program:

int A, B
if (A < 0)
    B = -A;
else
    B = A;

As we noted earlier in our discussion of IJVM programming, using the test "A < 0" requires a rearrangement of the logic. As a general rule, the else clause immediately follows the branching instruction and the if clause follows the else clause:

if condition is true go to IF_CLAUSE (conditional branch)
    else clause code goes here
    go to REST_OF_PROGRAM (unconditional branch)
IF_CLAUSE:
    if clause code goes here
REST_OF_PROGRAM:
    rest of program goes here

Here is an Intel assembly language program that will perform the desired task (I've not shown the variable declarations):

      mov AX, A
      cmp AX, 0
      jl  if_clause
      mov B, AX      ;This is the else clause
      jmp end
if_clause:
      neg AX
      mov B, AX
end:
      ret

If we want to keep the if-clause and the else-clause in the original order and use the same test, the logic gets a little more complicated:

if condition is true go to IF_CLAUSE (conditional branch)
    go to ELSE_CLAUSE (unconditional branch)
IF_CLAUSE:
    if clause code goes here
    go to REST_OF_PROGRAM (unconditional branch)
ELSE_CLAUSE:
    else clause code goes here
REST_OF_PROGRAM:
    rest of program goes here

Our Intel program might look like this:

      mov AX, A
      cmp AX, 0
      jl  if_clause
      jmp else_clause
if_clause:
      neg AX
      mov B, AX
      jmp end
else_clause:
      mov B, AX
end:
      ret

Think about which approach would be easiest for a compiler. Rearranging the if and else clauses means the compiler can't generate the code for the if clause until it has processed the else clause. Then it has to go back in the source code and process the if clause. This could get really messy in a nested if-then-else structure. Processing the if and else clauses in the order in which they appear in the source code would be much easier.

As an assembly language programmer, we don't have to think like a compiler. We are free to arrange our code anyway we want in order to accomplish the given task. In this case, it is actually easier to test the opposite condition (test "A >= 0" rather than "A < 0") and jump, if necessary, directly to the else clause:

      mov AX, A
      cmp AX, 0
      jge else_clause
      neg AX           ;this is the if clause
      mov B, AX
      jmp end
else_clause:
      mov B, AX
end:
      ret

If you think about it, you'll see that we don't really need a "mov B, AX" instruction in two different places. We can simplify our program even more:

      mov AX, A
      cmp AX, 0
      jge store_B
      neg AX
store_B:
      mov B, AX
      ret

Example 3: Looping

Write a program that will implement the following Java-like program:

sum = 0;
counter = 1;
do while (counter < 11)
{
    sum = sum + counter;
    counter++;
}

As in IJVM, it is a little easier to write the assembly language code if we convert the continuation condition ("counter < 11") to a termination condition ("counter >= 11"):

      mov sum, 0
      mov counter, 1
top:
      cmp counter, 11
      jge end
      mov ax, sum
      add ax, counter
      mov sum, ax
      inc counter
      jmp top
end:
      ret

Using registers would make the program smaller and faster:

      mov ax, 0
      mov cx, 1
top:
      cmp cx, 11
      jge end
      add ax, cx
      inc cx
      jmp top
end:
      mov sum, ax
      ret

Modifying the logic to use the "loop" instruction makes the program even smaller and, I suspect, faster:

      mov ax, 0
      mov cx, 10
top:
      add ax, cx
      loop top
 
      mov sum, ax
      ret

Remember that the "loop" instruction automatically decrements the CX register and then terminates if the value in CX is zero. Otherwise, it jumps back up to the top of the loop.