1. Field of the Invention
This invention relates to computer processors and, more particularly, to methods and apparatus for rapidly determining the length of instructions being processed from a stream of sequential instructions of variable lengths which are not differentiated one from another.
2. History of the Prior Art
Computer designers are continually attempting to make computers run faster. One way in which this may be accomplished is to make the computer process instructions faster. Typically, a computer processor handles the instructions of any process in sequential order, one after another. Thus, instruction one must be processed or at least decoded before process two can start. For example, computers based on the microprocessors manufactured by Intel Corporation including the 8086, 8088, 80286, i386.TM., and the i486.TM. microprocessors (hereinafter referred to as the Intel microprocessors) operate in response to instructions which vary in length from one byte to fifteen bytes. These instructions are arranged in existing programs for the Intel microprocessors to be manipulated in typical sequential order.
One way in which the speed of computers is increased is by pipelining instructions. Instead of running through each instruction until it is complete and then commencing the next instruction, the stages of an instruction are overlapped so that no part of the computer lies idle while another stage is being accomplished. The computers based on the Intel microprocessors pipeline instructions so that each phase of the operation may be handled in one clock period. The phases or steps into which an instruction for an Intel microprocessor are divided include fetching the instruction from wherever it is stored, decoding the instruction, executing the instruction, and then writing the results of the execution to storage for later use. Each of the different steps is designed to require one clock period. Different portions of the computers accomplish each of the steps in the pipeline on sequential instructions during each clock period. Thus, during a first clock period the prefetch portion of the computer fetches an instruction from storage and aligns it so that is ready for decoding. During a second clock period the prefetch portion of the computer fetches the next instruction from storage and aligns it so that is ready for decoding in the third clock period. A decoder portion of the computer does the first stage of decoding of the first instruction fetched during the second clock period. The decoder portion of the computer does the first stage of decoding of the second instruction fetched during the third clock period. By pipelining instructions the overall speed of operation is significantly increased.
The instructions are available to be prefetched on the bus or from a cache memory as a stream of bytes in which no instruction is differentiated from any other. These instruction must be prefetched from these sources in one clock period. Thus, each instruction (in general) appears in order in any process. In order to determine the length of an instruction being prefetched for processing at any time, previous computers using the Intel microprocessors took a sufficient number of bytes to determine the type of instruction and its length and decoded the instruction to determine its content. When this has been accomplished, the length of the instruction being processed and the starting point for the next instruction in sequence are known and can be fed back to the prefetch unit. This has forced the decoding of instructions in all previous computers based on the Intel microprocessors to be conducted serially. This also makes the time used in determining the length of an instruction critical to the speed of operation of the computer.
To date the process for accomplishing this length determination has been able to keep up with the speed of the computers. However, new processors are being devised in which the present methods of instruction length determination are not fast enough. A computer can be made to process instructions faster if two or more instructions can be run simultaneously. This may be accomplished by providing more than one processing path or channel for the instructions handled by the computer and running the processing paths simultaneously whenever possible. A computer having a processor with two or more processing paths which are capable of simultaneously processing general machine instructions which are normally run serially is called a super scaler computer.
If any new computer is to be commercially successful, it must have a base of application programs which it can run when it is introduced in order to be of interest to users. The most economic way to provide such programs is to design the new computer to operate the application programs designed for an earlier computer or family of computers.
In order to provide a super scaler computer which is able to function with software used by older computers using the Intel microprocessors, the new machine must be able to understand and process the instructions of that software. Since the instructions used in the for the Intel microprocessors vary in length from one byte to fifteen bytes and are arranged in existing programs to be manipulated in sequential order as a stream of bytes in which no instruction is differentiated from any other, the determination of the length of instructions has become more critical. The prior art method of first decoding an instruction to determine its length in order to determine the starting point for the next instruction in sequence is not fast enough.
Since a super scaler machine must process at least two instructions simultaneously, it must decode two instructions simultaneously. However, to select the correct starting point for a second instruction, it must know where a first instruction ends so that it may know where the next (second) instruction begins. Yet only by decoding the first instruction can it know the length of the first instruction and, thus, where the second instruction begins. The entire purpose of the super scaler to process two instructions at the same time is defeated if the processing of the second instruction must await the decoding of the first instruction before it can begin.
An arrangement used in the prefetch operation for determining the ends of individual instructions in a stream of instructions without decoding the instruction in the decode circuitry is described in U.S. patent application Ser. No. 07/831,942, now abandoned, entitled End Bit Markers, E. Grochowski et al, filed on even date herewith, and assigned to the assignee of the present invention. One of the problems encountered in designing the arrangement of this patent application was to provide an arrangement for deriving from the stream of instruction data available a sufficient amount of data to include the two sequential instructions which are to be processed by the two channels of the super scaler processor while maintaining the speed of the operation of the computer. Such an arrangement is described in U.S. patent application Ser. No. 07/831,968, now abandoned, entitled Two Stage Window Multiplexors For Deriving Variable Length Instructions From A Stream Of Instructions, E. Grochowski, filed on even date herewith, and assigned to the assignee of the present invention.
In order for the arrangements described in each of the above-mentioned patent applications to function correctly, and even though the instructions need not be decoded to determine an instruction length to derive a starting point for each next instruction, the length of each instruction must still be determined. Moreover, the length of instructions processed must be determined in one clock period in order for the super scaler machine to perform correctly in maintaining the operation of a pair of pipelines. However, the typical method of determining the length of an instructions uses a plurality of adders in a chain with a carry generation operation. Such an operation cannot be carried out within the clock period provided.