A microprocessor includes a datapath portion and a control portion. Data and addresses are manipulated in the datapath portion. The control portion is operative to decode instructions in a program into a form suitable for controlling that manipulation. Programs typically are stored in a main memory external to the chip and include sequences of instructions and data at specified addresses in the memory.
The control portion of the microprocessor conveniently comprises a programmable logic array (PLA) for decoding instructions from main memory as well as auxiliary logic circuitry for applying decoded instructions to the datapath. A PLA includes an input register and an output register each having a set of latches. Instructions from main memory are applied to the latches of the input register typically during a first phase of each clock cycle of operation. During a second phase of each cycle, the latches of the output register are set to provide the binary code for controlling the datapath for the next subsequent cycle of operation. An instruction applied to the input register is called an op-code, and the output of the PLA (output register) is called a line of microcode. Each such line of microcode determines the "state" of the microprocessor for the instant cycle of operation.
A PLA is characterized by feedback loops between the output register and the input register. These feedback loops carry binary data back to the input register to modify some bits of the input to the PLA in a manner to generate a sequence of related states. A PLA is able, thus, to generate a sequence of related microcode lines in response to each of one or more instructions in the program.
As is most often the case, data located at more than a single address in the main memory are required in order for even a single instruction to produce useful results. These data must be accessed and moved to ("fetched" from main memory) on-chip registers in the datapath under the control of consecutive microcde lines in response to the single instruction. It typically takes a number of clock cycles to accomplish this movement of data even in response to a single instruction.
The requisite number of clock cycles for such movement is reduced if the microprocessor includes an on-chip queue in which the instructions and data for a portion of a program can be stored. If this portion of the program is "prefetched" (i.e., fetched during earlier cycles) and stored in an on-chip queue in consecutive locations in the queue, the program can then be executed without wasting extra cycle time to access data stored in the main memory. Instead, the requisite instructions and data, when required, are obtained in a single cycle from the first location in the queue. Instructions in the queue are then applied to the input register of the PLA, and data in the queue are applied to elements of the datapath. Limitations imposed upon the speed of microprocessor operation by the bandwidth of the input/output (I/O) bus which carries instructions from main memory are thus reduced in microprocessors which include such a program queue into which such prefetched instructions and data are stored temporarily.
A macro-rom is used to store on-chip, frequently-used programs called "routines". Such routines are often called for in the execution of certain instructions called "macro-instructions." A macro-rom is a word organized, on-chip, read-only-memory (ROM) operative to generate an ouput sequence of binary codes (coded words) in response to a corresponding sequence of input codes. The input codes are applied to the macro-rom from an on-chip register controlled by the output register of the PLA.
Operation of the macro-rom is initiated when a program in main memory calls for a macro-instruction to be applied to the input register of the PLA. The PLA responds to generate microcode, specified bits of which set specified latches of the output register of the PLA for configuring the datapath elements (i.e., the queue, counter, address register, . . . ) to execute routines stored in the macro-rom and for activating the macro-rom as well. In turn, the macro-rom applies appropriate portions of the routine to the PLA input register. The routine is selected by the macro-instruction which specifies the addresses in the macro-rom at which the firt byte of the selected routine is stored.
Consecutive macro-rom outputs typically are not applied directly to the PLA because a macro-rom instruction is not necessarily aligned in a proper field for the input register of the PLA, and execution is slow due to the requirement of several clock cycles for accessing a macro-rom memory to obtain an instruction. Instead, the selected macro-rom program is also stored in the queue. However, the selected routine cannot be stored in the queue without first erasing all unexecuted data then stored in the queue when the macro-rom is activated. The reason for this is that the queue is a sequential memory which can be loaded only from one end and read out only from the other. In the absence of erasing the unexecuted data, the routine from the macro-rom thus would not be located properly with respect to the unexecuted program already in the queue and would often occupy more space than would be available in the queue. Consequently, for proper operation, unexecuted program is erased and the queue is filled with a routine from the macro-rom.
When the routine is completely executed, execution of the (main) program in main memory resumes. But a significant latency time is incurred before such execution can resume. The latency time arises because the queue had been erased and the appropriate segment of the main program must again be fetched into the queue. Several clock cycles are required for such a fetch operation. This latency is undesirably long.
Another problem with the use of a macro-rom stems from the fact that all data to be fetched from the main memory (operands or operand descriptor or pointers) associated with a macro-instruction must be fetched and stored in on-chip registers (and not in the queue) prior to the activation of the macro-rom. The reason for this is because any such operands or operand descriptors associated with the macro-instruction would otherwise be erased from the queue and therefore would be inaccessible once the activation of the macro-rom occurs.