The present invention pertains to processor design. In particular, the present invention pertains to a processor capable of executing instructions containing symbolic references (operands specified as symbols, rather than as xe2x80x9cnumeric referencesxe2x80x9d or run-time executable functions that evaluate to specific memory locations or xe2x80x9cslotsxe2x80x9d) that are dynamically resolved to numeric references during execution.
A pipelined processor is a processor having several sequentially connected stages or sections. For example, a processor can have an instruction cache, an instruction decoder, a register read stage, an execution stage and a post-execution register write stage. The instruction cache is a memory for storing instructions pending execution. Because of the temporal locality of reference (tendency to repeat execution of a recently executed instruction) and spatial locality of reference (tendency to execute instructions in memory locations nearby the memory locations of a recently executed instruction) properties of instruction execution, the instruction cache reduces the number of accesses to a slower main memory, thereby increasing instruction execution speed. The instruction decoder decodes the instruction into control signals for selecting/retrieving the appropriate operands and arithmetic, logic or memory access operations on such operands. The register read stage may be provided in certain processors having a register file for quickly and efficiently retrieving information contained in, or indexed by, registers of the register file. The execution stage illustratively includes one or more functional units or sections for executing the operation on the operands as specified in the decoded instruction. The post execution register write stage is for quickly and efficiently storing values in the register file. Other stages or configurations are also known in the prior art.
In addition to the above noted stages, a data cache is also often provided in the processor. Like the instruction cache, the data cache stores data, e.g., different operands, pending execution thereon. The data cache is separated from the instruction cache because the temporal and spatial locality of reference properties of operands are typically independent of those of instructions.
Several architectures are known for connecting the instruction and data caches to the other stages of the processor, such as shown in FIGS. 1-3. FIG. 1 shows an architecture which is similar to that employed in a processor manufactured by MIPS(trademark). An instruction cache 12 and a separate data cache 14 feed the other stages 10 in parallel. Either cache 12 or 14 can access the bus interface 16 so as to receive information (e.g., instructions or data) from, or transmit information to, devices outside the processor, such as the processor bus, L2 cache, main memory, etc. The architecture of FIG. 1 is characterized as conserving manufacturing costs and integrated circuit chip space and also improving execution speed as a result of separating the instruction cache 12 from the data cache 14.
A shortcoming of the architecture of FIG. 1 is its inability to support instruction code self-modification. That is, instructions executing in stages 10 cannot modify instructions that reside in the instruction cache 12. The reason is as follows. Consider that both the data cache 14 and the instruction cache 12 stores facsimile copies of instructions and data of a main memory. If another device requests to access the same data or instructions for which the instruction cache 12 or data cache 14 has a copy, the instruction cache 12 or data cache 14 must relinquish control of (i.e., invalidate or erase) such instructions or data. Furthermore, if the copy of such instruction or data in the instruction cache 12 or data cache 14 has been modified since it was retrieved from the main memory, such modifications must be written back to the main memory. Since the same data or instructions may be modified repeatedly, the instruction cache 12 and data cache 14 preferably defer writing back modified instructions or data until another device requests such instructions or data; until the instruction cache 12 or data cache 14 runs out of storage space; or as a result of a specific program instruction to do so. (This manner of deferring updates of modifications to information is referred to as a xe2x80x9cwrite backxe2x80x9d, policy or scheme). The problem is that all modified information, whether such modified information is program data or executable instructions, is outputted to the data cache 14. Thus, it is possible that the following scenario can occur: the instruction cache 12 writes back a copy of a to-be-modified instruction shortly after submitting the instruction that modifies it to the other stages 10 for execution. The modified instruction is written to the data cache 14. Before the data cache 14 writes back the modified instruction, the instruction cache 12 reads in the same instruction for execution. However, because the instruction cache 12 has no way of knowing that the data cache 14 possesses a modified version of the instruction (indeed the data cache 14 does not even know this), the instruction cache 12 obtains the stale unmodified copy of the instruction from the main memory. This result is a serious violation of the memory coherency restriction imposed on all caches 12 and 14 because the program would not execute as specified in the program instructions, possibly causing an unrecoverable system error or crash.
FIG. 2 shows a second architecture similar to that employed in Intel""s(trademark)""486(trademark) processor. Unlike the architecture of FIG. 1, a single unified cache 18 is provided for storing instructions and data. As a result, the performance of this processor architecture is lower than that of the processor shown in FIG. 1. Nevertheless, this architecture is inexpensive and can support self-modifying instructions.
FIG. 3 shows a third architecture similar to that employed in Intel""s(trademark) Pentium(trademark) processor. The instruction cache 20 is modified to include a snoop circuit 22. The snoop circuit 22 keeps track of the modifications to data loaded into the data cache 14, including modifications to instructions. As such, in the above scenario, if the instruction cache 20 attempted to load a modified instruction prior to the data cache 14 writing back the modifications, the snoop circuit 22 would obtain the modified copy of the instruction from the data cache 14, not the stale copy from the main memory. Thus, the architecture of FIG. 3 has a fast execution speed and accommodates self-modifying instructions. The problem is that the complexity of the snoop circuit 22 far increases the cost of the architecture of FIG. 3. The increase in cost might not be justifiable for low cost processors used in certain applications.
Executable instructions are traditionally generated one of two ways: by compiling source code instructions or interpreting source code instructions. In compiling source code, a compiler program reads each textual source code instruction. In response to each instruction, the compiler generates one or more machine executable instructions (xe2x80x9cobject codexe2x80x9d instructions). The source code instructions specify high-level functions applied to variables in human readable form. Source code instructions are often specified as functions performed on variables in symbolic form as it is easier for a human to distinguish variables identified by symbols rather than memory locations. For example, the source code instruction xe2x80x9ci:=i+2;xe2x80x9d specifies that the variable identified the symbol i is to be incremented by 2. A later instruction xe2x80x9ci:=i/2;xe2x80x9d, specifies that the same variable identified by the symbol i should be divided by 2. The compiler not only translates each function to one or more operations to be performed on variables, but furthermore assigns memory locations or slots to each variable and replaces the symbolic reference to the variable with a xe2x80x9cnumeric reference.xe2x80x9d A numeric reference is either a memory address of the slot in which the variable is stored or a machine executable function that produces the slot. Thus, the instruction xe2x80x9ci:=i+2;xe2x80x9d may be translated to xe2x80x9cLOAD R1,0xc3x9713F4; ADD R1,2;xe2x80x9d assuming that memory location or slot with address 0xc3x9713F4 has been assigned to storing the contents of the variable identified by the symbol i. The instruction xe2x80x9ci:=i/2;xe2x80x9d may be translated to xe2x80x9cLOAD R3,0xc3x9713F4; DIV R3,R3,2;xe2x80x9d. Again, the operation is performed on the slot having address 0xc3x9713F4 which the compiler assigned to the variable identified by the symbol i. The compiler may also use more sophisticated types of addressing in translating instructions to numeric references such as indirect mode addressing, indexed mode addressing, etc. These modes are predefined, executable functions for identifying the slot of a variable. For example, consider the compilation of the source code instruction xe2x80x9ca[i]:=a[i]+2;xe2x80x9d which increments the ith array element of array a by 2. This instruction may be compiled to: xe2x80x9cLOAD R3, 0xc3x972F36; LOAD R2, 0xc3x9713F4; LOAD R1,R2(R3); ADDR1,R1,2;xe2x80x9d wherein variable i is stored at the slot having address 0xc3x9713F4 and the array a is stored at sequential addressed slots beginning at the address 0xc3x972F36. Note the numeric reference xe2x80x9cR2(R3)xe2x80x9d which causes the value stored at the slot having the address in register R3 to be added to value stored in the register R2 to produce a slot address operand. Source code languages that employ compilers include C, FORTRAN, PASCAL, etc.
An alternative manner for generating executable instructions is to interpret the source code instructions. An interpreter is an executable object code program that interprets each source code instruction one by one. In the course of interpreting a source code instruction, the interpreter dynamically identifies the numeric reference for each symbolic reference in each source code instruction. This is achieved by searching a xe2x80x9cdata objectxe2x80x9d which is a dynamically created mapping between symbols and the numeric references to which they correspond. The interpreter then translates the function of the source code instruction into object code instructions or operations to be performed on the identified numeric operands in place of their respective symbolic operands. The interpreter xe2x80x9cinterpretsxe2x80x9d each source code instruction, i.e., generates the object code instructions and resolves symbolic references to numeric references, each time the source code instruction is executed. Thus, if the source code instructions implement a loop, whereby a sequence of source code instructions is executed N times, then each instruction in that loop is interpreted (including resolving each symbolic reference to a numeric reference) N times.
Compiling generates many but fairly efficiently executed object code instructions. Interpretation generates no or few object code instructions but executes each source code instruction at a much slower pace. This is because each source code instruction is interpreted each time it is executed in the course of executing a source code program under an interpreter execution model. Nevertheless, the interpretation execution model is much simpler to use to develop programs. Programs are normally developed as separate modules or sequences of source code instructions. The developer may write a subset of the modules, compile them, execute them to test the modules and then modify a module or add additional modules to the source code program. Each time a module is modified or added after the source code has been compiled, it is possible (in fact likely) that the allocation of slots to variables (or even to object code instructions) will be varied when the modified or added module is compiled. A such, all modules must be re-compiled. In addition, it is more difficult to employ self-modifying instructions using a compiler execution model. It is also more complicated to use xe2x80x9csymbolicxe2x80x9d debugging software tools under the compiler execution model, which amongst other things, allow the programmer to execute each source code instruction one at a time and examine the values assigned to each variable as identified by its symbolic reference in the source code. Such constraints are not imposed on the interpreter execution model because slots and numeric references are dynamically generated during execution.
U.S. Pat. No. 5,367,685 proposes, an alternative model for generating and for executing executable object code instructions from source code instructions, which alternative model provides the advantages of both compiling and interpretation. According to this patent, a hybrid compiler/interpreter is provided. Prior to executing a source code program, a modified module is compiled. Like an ordinary compiler, the hybrid compiler/interpreter compiles source code instructions. However, in compiling such instructions, the hybrid compiler/interpreter does not resolve symbolic references to numeric references. Thus, the instructions compiled by the hybrid compiler/interpreter are intermediate form instructions containing symbolic references to variables. At run-time, all of the executed modules of the program are interpreted. During the interpreting phase, the hybrid compiler/interpreter performs the following steps. If the instruction contains a symbolic reference, the hybrid compiler/interpreter searches the data object for the numeric reference corresponding to the symbolic reference. Then, the hybrid compiler/interpreter overwrites the symbolic reference in the instruction with the numeric reference. The program counter is not advanced in this event, and the interpreter returns to the top of the program interpretation/execution loop with the program counter pointing to the same instruction for which the numeric reference has replaced the symbolic reference. Thus, the hybrid compiler/interpreter re-executes the instruction. Since no symbolic reference is present in the instruction, the instruction can be executed. In addition, because the numeric reference overwrites the symbolic reference, at each subsequent execution of the instruction, the data object need not be searched; rather, the instruction is executed as an ordinary numeric reference instruction.
Recently, the Java(trademark) programming language has become widely popular. Java(trademark) is an interpreted language which has become popular due to its cross-platform interoperability. Considering the wide deployment of Java(trademark), it is desirable to design a processor particularly suited to executing Java(trademark) source code quickly and efficiently. As noted above, U.S. Pat. No. 5,367,685 provides an efficient hybrid compiler/interpreter execution engine. It is desirable to provide a processor capable of functioning as the interpreter program described in this patent, i.e., capable of directly executing the intermediate form instructions, including resolving symbolic references to numeric references. The problem is that the particular implementation employed in U.S. Pat. No. 5,367,685 modifies instructions in the course of interpreting them. As noted above, in order to provide a processor which supports self-modifying instructions, a very expensive cache architecture with snoop circuit, such as is shown in FIG. 3, must be employed.
It is therefore an object of the present invention to overcome the disadvantages of the prior art.
It is a further object of the present invention to provide a, simple processor capable of executing intermediate form code and which resolves symbolic references to numeric references in the course of executing each instruction.
These and other objects are achieved according to the present invention. According to the invention, a processor is provided with a decoder, a memory connected to the decoder and an execution stage connected to the decoder. The decoder receives each instruction. Each time the decoder receives an instruction, if the instruction contains a symbolic reference, the decoder determines whether or not the symbolic reference has been resolved into a numeric operand. If the symbolic reference has been resolved into a numeric operand, the memory retrieves, from a numeric reference table, a numeric operand to which the symbolic reference has been resolved. The execution stage then executes the instruction on the retrieved numeric operand in place of the symbolic reference.
If the symbolic reference has not been resolved into a numeric operand, then the execution stage searches a data object, which relates each symbolic reference to a memory slot in which a corresponding numeric operand is stored, for a numeric reference relating the symbolic reference to a corresponding numeric operand. The memory then retrieves the numeric operand, that corresponds to the unresolved symbolic reference, from the memory slot indicated by the numeric reference of the data object. The memory stores the retrieved numeric operand in the numeric reference table maintained therein. The execution stage executes the instruction on the retrieved numeric operand in place of the symbolic reference of the instruction and indicates to the decoder that the symbolic reference is resolved (for future executions).
Thus, symbolic references are resolved dynamically to numeric operands during execution, but do not overwrite the symbolic references in the instruction to which they correspond. Instead, the numeric operands to which the symbolic references corresponds are stored in a numeric reference table. This speeds up each subsequent execution of the instruction because the symbolic reference can be quickly resolved by resort to the numeric reference table rather than requiring a search of the entire data object.
xe2x80x9cResolved indications,xe2x80x9d which each indicates whether or not a specific, respective symbolic reference is resolved, can be stored in a numeric reference buffer. The resolved indications of the numeric reference buffer are indexed by the fetch address of the executed instruction. The numeric reference buffer may be implemented as a mirror buffer, meaning that the same mapping/decoding of the fetch address as employed in accessing the instruction in the instruction cache is employed in accessing the appropriate resolved indication in the numeric reference buffer. Alternatively, the numeric reference buffer may be implemented as a non-mirror buffer, meaning that an independent mapping/decoding of the fetch address than that employed in accessing the instruction in the instruction cache is employed in accessing the appropriate resolved indication in the numeric reference buffer. Preferably, the numeric reference table of numeric operands is also stored in the numeric reference buffer, wherein each numeric operand is stored in relation to its respective resolved indication and accessed (indexed) the same way. In such a case, the numeric reference buffer may take the form of a cache where the resolved indication is simply the valid bit. However, the numeric table may also be stored in a different memory.
Instructions can also include numeric references. Illustratively, a second xe2x80x9cstaticxe2x80x9d numeric reference buffer stores resolved indications for such references. However, such indications always indicate that the numeric reference is resolved. Furthermore, no numeric operand need be specifically stored in the numeric reference buffer, since the numeric reference can be easily resolved to a numeric operand during execution.
Thus, the hybrid compiler/interpreter model can be implemented at the processor level. That is, a compiler program compiles source code instructions into intermediate instructions that retain symbolic references to operands/variables. The processor can execute such intermediate instructions and can resolve symbolic references into numeric operands during execution. Since the resolution does not modify instructions, a simple and inexpensive instruction and data cache architecture, which does not need complex snoop circuitry, can be used in the processor.