header

Machine Language Level

The machine language level instruction set of a computer is the lowest level language available to application programmers. (There have been some examples of machines whose microprograms could be modified but they are very rare and even with those machines no one would write applications at the microinstruction level.) A complete description of the machine language instruction set for our example architecture can be found in the documentation for the simulator and will not be repeated here. The purpose of this page is to consider the environment in which our Integer Java Virtual Machine (IJVM) programs will be executed.

Memory

Our example architecture has a 64KB memory divided into four 16KB blocks or sections:

  1. Machine language code starting at 0x0
  2. Stack starting at 0x1000
  3. Local variables (for main) starting at 0x2000
  4. Constants starting at 0x3000

The entire machine language program must be stored in the first block of memory (0x0000 through 0x0FFF). All constants are stored in the last block of memory (0x3000 through 0x3FFF). All variables declared in the main method are stored in the third block of memory (0x2000 through 0x2FFF). The stack and all local variables and parameters for methods other than main are stored in the second block of memory (0x1000 through 0x1FFF).

During the boot-up process, the CPU is put into an initial state with predetermined values in its registers. A cold boot-up occurs when a computer is turned on. A warm boot-up occurs when a computer is restarted or reset while it is already on. Following a boot-up or reset, the values in the registers of our architecture are all zeroes with the following exceptions:

Register    Value
SP 0x00001000
LV 0x00002000
CPP 0x00003000

During program execution, the stack is the most dynamic data structure and is changed by almost every machine language instruction. The value of the LV register is constant throughout the execution of a given method but changes each time a method is invoked or a method terminates. The value stored in the CPP register never changes.

The Stack

In computer science, a stack is a last-in first-out data structure. Think of a stack of books piled up on a table for which books are added or removed one at a time. Each new book is added to the top of the stack and the only book that can be removed (at any given time) is the book on the top of the stack. Given these rules, the last book added to the stack is, of necessity, the first book removed from the stack. In computer science, push means to place a new item onto the stack and pop means to remove the top element from the stack. In the context of IJVM, the stack is a stack of 32-bit data and is used as a scratchpad area in which to perform arithmetic and logic operations. Two registers are used to assist in working with the stack and all but a few of the machine language instructions manipulate the stack.

 Normally, the stack pointer (the SP register) contains the word address of the value at the top of the stack. The only exception is when SP contains 0x00002000, its initial value. In that case, the stack is empty. To push a value onto the stack, increment the stack pointer and copy the value to that memory location. To pop an item from the stack, simply decrement the stack pointer.

In our architecture, there is a second register reserved for working with the stack and that is the top of stack (TOS) register. The TOS register always contains a copy of the value currently on the top of the stack. This register can speed up our machine language interpreter by eliminating the need to access memory in order to look at or use the value on the top of the stack since that value is already in a register.

In order to keep the contents of TOS up-to-date, the push and pop operations have to be modified. To push a value onto the stack, increment the stack pointer and copy the value to that memory location and to the TOS register. To pop a value from the stack, decrement the stack pointer and copy the value stored at that memory location into the TOS register.

There are five machine language instructions whose sole purpose is to place a new value on the top of the stack:

There are thee machine language instructions whose sole purpose is to remove a value from the stack:

There are eight machine language instructions that manipulate the stack:

There are two other machine language statements that affect the stack (they will be discussed later):

Local Variables

Each method in a Java program (including "main") has its own local variable frame whose address is stored in the LV register. For the main method, the local variable frame consists of only the variables that are declared in main itself. These values are stored in the block of memory starting at 0x2000 (the initial value of the LV register). Since the LV register points to the base of the local variable frame, values within the frame are identified by an offset from the value stored in the LV register. That is, the address of a particular value is found by adding the value in the LV register and the offset. Suppose four variables have been declared in main:

Variable Offset Address of Variable
Var1 0 LV + 0
Var2 1 LV + 1
Var3 2 LV + 2
Var4 3 LV + 3

For other methods, the local variable frame contains a pointer to the previous value of the PC register (the return address), parameters (if any), local variables (if any), the previous value of the PC register, and the previous value of the LV register. The pointer to the return address, the return address itself, and the previous value of the LV register are all needed to return to the invoking method when a method terminates. The local variable frames for methods other than main are stored on the stack described above (0x1000 through 0x1FFF).

Because the local variable frame is stored on the stack, it is also referred to as a stack frame. In both our C++ and Java programming classes, the concept of a stack frame was used when we tried to illustrate function calls. Even then, we pointed out that the stack frame included more than just the parameters and local variables. Now we see that the "more" includes the previous value of the PC and LV registers.

As noted earlier, the address of a particular value in the local variables frame is found by adding the value in the LV register and the corresponding offset. Consider a method with three parameters and two local variables. The offsets and corresponding addresses are given in the table below:

Item Offset Address of Item
Pointer to Previous PC Value 0 LV + 0
Param1 1 LV + 1
Param2 2 LV + 2
Param3 3 LV + 3
Var1 4 LV + 4
Var2 5 LV + 5
Previous PC Value 6 LV + 6
Previous LV Value 7 LV + 7

Note, the value stored at mem[LV + 0] is LV + 6, the address of the previous PC value. Read the documentation of the INVOKEVIRTUAL and IRETURN instructions for a more detailed look at the creation of a local variables frame on the stack when a method is invoked and the removal of the local variables frame from the stack when the method terminates.

Constant Pool

All constants, no matter where they are declared, are stored in the fourth block of memory (0x3000 through 0x3FFF). The CPP register (constant pool pointer) contains the address 0x3000 which is the base address of the constant pool. Each constant is associated with an offset and the address of the constant can be found by adding the value of the CPP register and the corresponding offset. Consider a program in which four constants have been declared:

Constant Offset Address of Constant
Constant1 0 CPP + 0
Constant2 1 CPP + 1
Constant3 2 CPP + 2
Constant4 3 CPP + 3

Indexed Addressing

The technique that we have described for using the LV and CPP registers is so common that it has a name. A machine language instruction that uses indexed addressing (also called base register addressing) has, as its operand, an integer value (the offset) that is added to the value in the corresponding base register to determine the physical address of the data in memory. A huge advantage of indexed addressing is that the entire block of memory can be moved to (or loaded into) a different location in memory and the only change necessary is to load the corresponding base register with the new address.