1. Field of the Invention
This invention relates to data processing apparatus and methods in which an instruction cache is used to store instructions within a central processor unit and thereby relieve the demand for instructions from an external program memory.
2. Description of the Related Art
A data processor with a conventional Von Neumann architecture is shown in FIG. 1. Both instructions and data are stored in an external memory 2 for delivery to a central processing unit (CPU) 4. Only a single instruction or datum can be supplied per cycle to the CPU 4. Instructions are directed within the CPU to an instruction latch 6 that feeds the instruction to a program sequencer 8, which in turn decodes the instruction and controls the operation of the CPU. Data that is read from the external memory 2 is directed within the CPU to a register file 10, which in turn supplies the data to computation units 12 such as adders, subtractors, multipliers and dividers. To perform an operation such as C=A+B, where A and B reside in the memory 2 and C will be stored in the register file 10, three cycles of the CPU are necessary to transfer the two data items and the one instruction. As the current instruction C=A+B is being performed, the next instruction to be performed is being read.
To speed up the operation of the processor by reducing the number of cycles required to execute an instruction, the basic Harvard architecture shown in FIG. 2 has been devised. In this arrangement two separate external memories are provided. A program memory 14, sometimes referred to as an instruction memory, stores only instructions, while a data memory 16 stores only data. The program memory 14 supplies one instruction per cycle to the instruction latch 6, while the data memory 16 supplies one datum per cycle to the register file 10. The performance of the C=A+B operation is speeded up by one cycle with this architecture, since an instruction fetch from the program memory 14 can be accomplished in the same cycle as a data fetch from data memory 16. For an arithmetic operation with only a single operand, such as C=log(A), only a single cycle is required; the single data fetch from the data memory 16 can be performed simultaneously with an instruction fetch from the program memory 14.
Because processors often consume two data per operation, it is desirable to have two separate data sources. A modified Harvard architecture that accomplishes this is shown in FIG. 3. In this design the program memory 18 stores both instructions and data. The operation C=A+B can now be performed in a single cycle if data can be read in from both the data memory 16 and the program memory 18 simultaneously. However, the instruction latch 6 still needs to receive the next instruction from the program memory 18 at the same time data for the current instruction is called for. This produces a bottleneck on the path between program memory 18 and the CPU, forcing the fetching of the next instruction and the data for the current instruction from the program memory to be performed in two separate cycles.
A known type of architecture that solves this problem by adding an instruction cache 20 internally within the CPU is shown in FIG. 4. Typical programs executed by the CPU spend most of their execution time in a few main routines or tight loops in which instructions are repeated many times. The instruction cache 20 stores each separate instruction the first time it is fetched from the program memory. Thereafter, when the same instruction is called for, it is supplied from the instruction cache 20 to the instruction latch 6 over a dedicated internal path within the CPU. This relieves the load on the external path between the CPU and the program memory. The type of architecture illustrated in FIG. 4 is used, for example, in the Analog Devices ADSP-2100, the Texas Instruments TMS320 and the Motorola MC68020 data processors.
While the instruction cache technique illustrated in FIG. 4 effectively relieves the communication line between the CPU and the external program memory and allows for faster operation, it requires storage capacity for all of the different instructions that can be anticipated, and thus must be relatively large and expensive.