Interpreting Machine Language Programs

The machine language instruction set of a computer is the lowest level language available to application programmers. (There have been some examples of machines whose microprograms could be modified but they are very rare and even with those machines no one would write applications at the microinstruction level.) A complete description of the machine language instruction set for our example architecture can be found in the documentation for the simulator and will not be repeated here. The purpose of this page is to consider how one might write a microprogram to interpret a machine language program.

Memory

Our example architecture has a 64KB memory divided into four 16KB blocks or sections:

Machine language code starting at 0x0
Stack starting at 0x1000
Local variables starting at 0x2000
Constants starting at 0x3000

While this is a simplification of how the memory would be used in a real machine, it will serve our purpose well. There are four registers that are associated with these blocks of memory. The PC (program counter) contains the address of a byte in the machine language code section of memory. The SP (stack pointer) contains the address of the value at the top of the stack (the last word pushed onto the stack). The LV (local variables) register contains the address of the first word of data in the local variables frame. In particular, LV contains the address of the first word of data in the current list of local variables. The CPP (constant pool pointer) contains the address of the first constant in a pool of constants.

Boot-Up

During the boot-up process, the CPU is put into an initial state with predetermined values in its registers. A cold boot-up occurs when a computer is turned on. A warm boot-up occurs when a computer is restarted or reset while it is already on. Following a boot-up or reset, the values in the registers of our architecture are all zeroes with the following exceptions:

Register	Value
PC	0xFFFFFFFF
SP	0x00001000
LV	0x00002000
CPP	0x00003000
MPC	0x000

The last three should make perfect sense given the memory partition described in the preceding section. Each register points to its corresponding block of memory. Using that same logic, the PC should contain 0 rather than -1. As we will see, the microprogram that interprets machine language programs will increment the PC from -1 to 0 immediately prior to fetching the first byte of the machine language program. Finally, every microprogram begins execution at address 0x000.

Fetch, Decode, Execute Cycle

Our machine language programs will be stored as a sequence of bytes. The process of executing machine language instructions can be broken down into three steps: fetch, decode, and execute. These three steps are repeated in an endless loop (or at least until the computer is shut down or reset). As programmers, you have been taught to be careful to avoid infinite loops. Yet, that is exactly what we want in our microprogram:

fetch opcode of first instruction into the MBR
do forever
    1. decode instruction in the MBR
    2. execute instruction
    3. fetch the opcode of the next instruction into the MBR
end do

Fetch

What makes writing (and reading) microprograms difficult is that microinstructions can do several things at the same time; a characteristic that can be very beneficial in making our microprograms as compact as possible. For example, whenever we use the value in the MBR, we can also begin the fetch operation to retrieve the next byte of the machine language program which will be either an operand for the current instruction or the opcode of the next instruction. By initiating the fetch as early as possible, the next byte of the program will often be available as soon as it is needed with no delay. If we waited to fetch it until we actually need it, we would have to wait an extra clock cycle since the results of a fetch are not available in the MBR until the beginning of the second clock cycle following the fetch.

The fetch then is not literally the third step in a highly sequential code sequence. The fetch, decode, execute cycle works more like this:

fetch opcode of first instruction into the MBR
do forever
    1. decode the instruction in MBR and fetch the next byte
    2. execute the instruction (fetching a new byte each time you
       use the value in the MBR)
end do

If the current instruction has no operands, then the next opcode will have already been fetched when the current instruction was decoded in step 1. If the current instruction does have operands, as soon as they are pulled from the MBR, the next byte will be fetched from memory so when execution finishes in step 2, the next opcode will have already been fetched. In either case, when control returns to step 1, the opcode of the next instruction will already be in the MBR

Decode

In our architecture, decoding is amazingly simple. The 1-byte opcode for each machine language instruction is the address in the control store at which the microcode for the execution of that instruction begins. Notice that since the opcode is only 8-bits long, the starting address for the execution of each instruction must be in the lower half of the control store. The remaining microinstructions can be at any address and do not have to be sequential within the control store.

At this point, you can begin to appreciate the JMP bit in the microinstruction format. When this bit is set, the 8 low-order bits of the MPC take on the value of MBR OR NEXT_ADDR. If NEXT_ADDR is zero, then MPC = MBR OR 0x000 or, more simply, MPC = MBR (assuming of course that JMPZ and JMPN are both zero).

Execute

As implied in the preceding paragraph, the execution of every machine language instruction is carried out by a sequence of microinstructions, the first of which is located at the control address specified by the opcode of the machine language instruction.

Basic Loop

There are just two instructions that initiate the execution of the machine language interpreter:

0x000 ALU=0; goto 0x2
0x001 PC=PC+1; goto 0x40
0x002 PC=PC+1; fetch; goto (MBR OR 0x0)

Consider the following sequence of instructions:

0x000 Do nothing. Go to 0x002
0x002 Increment PC from -1 to 0 and fetch the first opcode. Since the MBR is currently 0, go to 0x000.
0x000 Do nothing while waiting for the fetch to complete. Goto 0x002.
0x002 Increment PC from 0 to 1 and fetch the next byte of the machine language program. The MBR contains the opcode of the first instruction so the interpreter will jump to the section of microcode that executes that instruction.

The block of code that executes a machine language instruction also fetches the opcode of the next instruction. Consequently, the last statement in the block has the value 0x002 as the value of the next address field which sends the interpreter back to the beginning of the loop at 0x002.

Some Examples

Let's consider a few simple machine language instructions to see how they might be interpreted.

BIPUSH

The BIPUSH instruction has an opcode of 0x010 and a 1-byte operand which represents a signed value in the range -128 to 127. This instruction pushes its operand onto the stack. There are two registers associated with the stack; the stack pointer (SP) and the top of stack register (TOS). The SP register contains the address of the top of the stack (the address of the last word pushed onto the stack. Initially, SP points at 0x1000 which indicates that the stack is empty. When a word of data is pushed onto the stack, the value of SP must be incremented by 1 and the data written to that word address. The TOS register contains a copy of whatever is currently on the top of the stack. Consequently, whenever a word is pushed onto the stack, that word must also be loaded into the TOS register.

Given that the instruction at 0x002 has already initiated the fetch of the byte operand for this instruction, the following actions must take place:

Retrieve the operand when it becomes available in the MBR.
Increment the SP register.
Write the operand (with sign extension) to the SP address.
Copy the operand (with sign extension) to the TOS.
Increment the PC and fetch the next opcode.

Here is the microcode that performs these five actions:

0x010 SP=MAR= SP+1; goto 0x016
...
0x016 PC= PC+1; fetch; goto 0x017
0x017 TOS=MDR=MBR; wr; goto 0x002

The instruction at 0x010 increments SP and loads the MAR with the new value in preparation for the memory write (action 2 and part of action 3).

The instruction at 0x016 increments PC and initiates the fetch of the next opcode (action 5). Notice, in particular, that it is done early enough that there will be one additional instruction performed before returning to the beginning of the loop. This is necessary to insure that the fetch has been completed by the time the instruction at 0x002 is executed.

The instruction at 0x017 retrieves the operand from the MBR, loads it into TOS and MDR, and initiates the write operation (action 1, the rest of action 3, and action 4). The write operation will be completed by the time the instruction at 0x002 has been executed.

Note that it would be a logical mistake to push a word of data onto the stack when the value in SP is 0x1FFF. Doing so would increment SP to 0x2000 which is the first word in the local variables section of memory. Writing data to that location would change the value of the variable stored at there.

POP

The POP instruction removes the top word of data from the stack. That is, it removes the word of data that was most recently pushed onto the stack. The opcode for this instruction is 0x57 and it has no operands. To pop a word from the stack, decrement the stack pointer and copy the new top of stack word into the TOS register. Given that the instruction at 0x002 has already initiated the fetch of the next opcode, the following actions must take place:

Decrement SP
Read the word at the new value of SP and load it into TOS.

Here is the code to perform these actions (in the order in which the instructions are executed):

0x057  SP=MAR=SP-1; rd; goto 0x00C
...
0x00C  ALU=0; goto 0x00D
0x00D  TOS=MDR; goto 0x002

The instruction at 0x057 decrements the stack pointer and reads the word at the new top of stack location (action 1 and part of action 2).

The instruction at 0x00C does nothing but wait for the read operation to complete (part of action 2).

The instruction at 0x00D loads the new top of stack value into the TOS register (completing action 2).

Note that it would be a logical mistake to pop a word of data from an empty stack (SP = 0x1000). In that case, decrementing the SP register would yield the address of the last 4-byte word in the section of memory reserved for machine language code.

IADD

The IADD instruction pops the top two elements from the stack, adds them together and pushes the sum back onto the stack. The opcode is 0x60 and there are no operands. Given that the instruction at 0x002 has already initiated the fetch of the next opcode, the following actions must take place:

Pop first number from stack and store it in H since that is the only register that feeds the A side of the ALU.
Pop second number from stack and add it to H.
Push the resulting sum onto the stack.

Here is the code to perform these actions (in the order in which the instructions are executed):

0x060  SP=MAR=SP-1; rd; goto 0x3
...
0x003  H=TOS; goto 0x4
0x004  TOS=MDR=H+MDR; wr; goto 0x2

Keep in mind that the TOS register contains the value on the top of the stack. The instruction at 0x060 pops the first integer off the stack (the one already in TOS) and begins the process of retrieving the second (part of action 1 and part of action 2). Normally, popping the second integer off the stack would mean that we would decrement SP a second time. However, we would need to immediately increment SP when we push the sum back onto the stack. Consequently, we won't do either since decrementing and then incrementing gives us what we started with.

The instruction at 0x003 loads the first number into H and allows the retrieval of the second to complete (the rest of action 1 and part of action 2).

The instruction at 0x004 performs the addition, stores the result in TOS, and writes it to the top of the stack (rest of action 2 and action 3). Note that the MAR was loaded with the correct address back up in the instruction at 0x060. When the instruction at 0x002 executes, the write operation initiated in the instruction at 0x004 will be completed.

In the documentation for the simulator, you can find the microcode that executes each instruction in the machine language instruction set.