The assembly language for the Intel 80x86 architecture is significantly more sophisticated than the assembly language for our IJVM machine. You might want to look at a description of the Intel instruction set before continuing.
All of our assembly language programs will have a similar format:
#make_COM# ORG 100h < Code goes here > < Variable Declarations go here > end
Variable declarations allocate space for a value that can be accessed and/or modified by our assembly language program. Unlike the IJVM architecture, there are no constants at the Intel assembly language level. In a high-level language such as C++ or Java, it is a syntax error to modify a variable that has been declared as a constant. Consequently, it is the compiler that enforces the rule that a constant cannot be modified by the program. In an assembly language program, it is up to the programmer to make sure that no instruction modifies a memory location that is being interpreted as a constant.
Syntax: name DB value, where name is the name of the variable and value is its initial value. "DB" means "define byte".
Syntax: name DW value, where name is the name of the variable and value is its initial value. "DW" means "define word".
Assembly language is not case sensitive so, for example, DB can be entered as db (or dB or Db, for that matter). Byte values can be entered in decimal, hexadecimal, octal, binary, or as a character. The only difference with 2-byte variables is that you cannot use a character (since, by definition, a character is only one byte).
In the following assembly language program The first five declarations (A-E) all initialize the variable to decimal 65. The remaining four declarations all initialize the variable to decimal 66.
#make_COM# include 'emu8086.inc' ORG 100h mov AX, 0 ret A db 65 ;decimal B db 41h ;hexadecimal C db 101o ;octal D db 1000001b ;binary E db 'A' ;character F dw 66 G dw 42h H dw 102o I dw 1000010b end
This program doesn't really do anything. Because a program must include at least one executable statement (or the assembler won't generate any code), I've added an instruction to move a zero into the AX register. If you compile and load this program into the emulator, you'll see the following:
The MOV instruction is a 3-byte instruction (B8h is the opcode and the 2-byte immediate mode operand is 0000h). The RET instruction at address 0103h is a 1-byte instruction (C3h is the opcode). The 1-byte variables A through E are stored at addresses 0104h through 0108h (the upper red rectangle) and all have the same value (41h which is 65 in decimal which is the ASCII code for 'A').
The 2-byte variables F through I are stored at addresses 0109h through 0110h (the lower red rectangle). Variable F is stored in 0109h and 010Ah with the low-order byte (42h) in 0109h and the high-order byte (00h) in 010Ah. Notice that the remaining three 2-byte variables have exactly the same value.
The first IJVM program we looked at is given below. Next to the IJVM assembly language code is the Intel assembly language code that will accomplish the same result. While both assembly language programs do the same thing, they are substantially different because the IJVM is a stack-based machine and the Intel machine is a traditional register-based machine. The IJVM program uses zero-address and one-address instructions. The Intel program uses two-address instructions (in this case, a register address and a memory address).
Java-Like IJVM Assembly Intel Assembly #make_COM# ORG 100h C = A + B; ILOAD A mov ax, A ILOAD B IADD add ax, B ISTORE C mov C, ax D = A - B; ILOAD A mov ax, A ILOAD B ISUB sub ax, B ISTORE D mov D, ax E = A AND B ILOAD A mov ax, A ILOAD B IAND and ax, B ISTORE E mov E, ax F = A OR B ILOAD A mov ax, A ILOAD B IOR or ax, B ISTORE F mov F, ax HALT ret A dw 129 B dw 127 C dw 0 D dw 0 E dw 0 F dw 0 end
As written, the Intel program has 12 memory accesses (8 reads and 4 writes). It can be modified to run faster by reducing the number of memory accesses to a bare minimum (2 reads and 4 writes).
Write a program that will implement the following Java-like program:
int A, B if (A < 0) B = -A; else B = A;
As we noted earlier in our discussion of IJVM programming, using the test "A < 0" requires a rearrangement of the logic. As a general rule, the else clause immediately follows the branching instruction and the if clause follows the else clause:
if condition is true go to IF_CLAUSE (conditional branch) else clause code goes here go to REST_OF_PROGRAM (unconditional branch) IF_CLAUSE: if clause code goes here REST_OF_PROGRAM: rest of program goes here
Here is an Intel assembly language program that will perform the desired task (I've not shown the variable declarations):
mov AX, A cmp AX, 0 jl if_clause mov B, AX ;This is the else clause jmp end if_clause: neg AX mov B, AX end: ret
If we want to keep the if-clause and the else-clause in the original order and use the same test, the logic gets a little more complicated:
if condition is true go to IF_CLAUSE (conditional branch) go to ELSE_CLAUSE (unconditional branch) IF_CLAUSE: if clause code goes here go to REST_OF_PROGRAM (unconditional branch) ELSE_CLAUSE: else clause code goes here REST_OF_PROGRAM: rest of program goes here
Our Intel program might look like this:
mov AX, A cmp AX, 0 jl if_clause jmp else_clause if_clause: neg AX mov B, AX jmp end else_clause: mov B, AX end: ret
Think about which approach would be easiest for a compiler. Rearranging the if and else clauses means the compiler can't generate the code for the if clause until it has processed the else clause. Then it has to go back in the source code and process the if clause. This could get really messy in a nested if-then-else structure. Processing the if and else clauses in the order in which they appear in the source code would be much easier.
As an assembly language programmer, we don't have to think like a compiler. We are free to arrange our code anyway we want in order to accomplish the given task. In this case, it is actually easier to test the opposite condition (test "A >= 0" rather than "A < 0") and jump, if necessary, directly to the else clause:
mov AX, A cmp AX, 0 jge else_clause neg AX ;this is the if clause mov B, AX jmp end else_clause: mov B, AX end: ret
If you think about it, you'll see that we don't really need a "mov B, AX" instruction in two different places. We can simplify our program even more:
mov AX, A cmp AX, 0 jge store_B neg AX store_B: mov B, AX ret
Write a program that will implement the following Java-like program:
sum = 0; counter = 1; do while (counter < 11) { sum = sum + counter; counter++; }
As in IJVM, it is a little easier to write the assembly language code if we convert the continuation condition ("counter < 11") to a termination condition ("counter >= 11"):
mov sum, 0 mov counter, 1 top: cmp counter, 11 jge end mov ax, sum add ax, counter mov sum, ax inc counter jmp top end: ret
Using registers would make the program smaller and faster:
mov ax, 0 mov cx, 1 top: cmp cx, 11 jge end add ax, cx inc cx jmp top end: mov sum, ax ret
Modifying the logic to use the "loop" instruction makes the program even smaller and, I suspect, faster:
mov ax, 0 mov cx, 10 top: add ax, cx loop top mov sum, ax ret
Remember that the "loop" instruction automatically decrements the CX register and then terminates if the value in CX is zero. Otherwise, it jumps back up to the top of the loop.