1. Field of the Invention
This invention relates in general to the field of microelectronics, and more particularly to apparatus in a pipeline microprocessor for efficiently performing string scan and compare operations.
2. Description of the Related Art
Early microprocessors executed one instruction at a time. Accordingly, each individual instruction was fetched from memory and all of the functions prescribed by the instruction were executed by functional units within the microprocessors until all the functions were completed. At that point, the individual instruction was retired and a next instruction was fetched from memory for execution.
Although execution of program instructions in an early microprocessor was simple to understand, the practical affect of their execution was quite slow. Since that time, microprocessor designers have repeatedly focused on modifying the architecture of microprocessors to improve the execution speed, or throughput, of instructions. More recently, pipeline architectures have prevailed in the art as a means for increasing instruction throughput. A pipeline architecture breaks down the functional units of a microprocessor into a sequence of successive operations, very much analogous to the staging of an assembly line. Accordingly, it is possible—and highly desirable from a throughput standpoint—that a particular stage of the microprocessor is performing an operation prescribed by a first instruction while a stage immediately preceding the particular stage is performing another operation prescribed by a second instruction that follows the first instruction in an application program. Efficient throughput in a microprocessor is attained when all pipeline stages are performing operations. Problems of inefficiently occur when a particular pipeline stage takes too long to perform its prescribed operation. In this circumstance, a stall signal is issued to preceding pipeline stages that force them to hold until the particular pipeline stage completes its function.
Pipeline architectures have continued to evolve to the point that many operations that are prescribed by program instructions (also called macro instructions) can be accomplished in a single traversal of the pipeline. For example, a register-to-register add operation is accomplished by retrieving two register operands simultaneously from registers within a register stage, adding the two operands together to produce a result in a following execution stage, and finally writing the result back to a result register in a result write back stage that follows the execution stage. A single instruction to perform the register—register add operation is thus configured to propagate through successive pipeline stages in synchronization with a pipeline clock, and the end outcome is that a user experiences register-to-register addition in a single pipeline cycle.
Although the operations prescribed by many macro instructions can be executed in a single traversal through the pipeline, there remain numerous instructions whose prescribed operations are so complex that they cannot be executed in a single traversal. One such class of macro instructions are the so-called string compare instructions, such as a scan string instruction or a compare string instruction. This class of instructions indirectly prescribes the location of one or two operands, which must be retrieved from data memory and compared to one another or two a third operand stored in an internal register to generate a comparison result. This type of operation is known as a load-compare operation. Yet, most present day microprocessors have a particular pipeline stage that is capable either of 1) accessing operands in memory or 2) performing an arithmetic or logical computation using provided operands. Consequently, both of these types of operations cannot be performed during the same pipeline cycle within that particular stage. Accordingly, a load-compare operation requires that two sub-operations be performed. First, the operand(s) must retrieved from memory. Following this, the retrieved operand(s) must be compared to generate the result. Hence fetching of subsequent instructions must be stalled while the operation to retrieve the operand(s) (i.e., the first sub-operation) from memory is provided. When the addition operation (i.e., the second sub-operation) is provided, fetching is allowed to resume.
Stalling the pipeline for one or more cycles is disadvantageous from a throughput perspective. And a single load-compare operation results in at least one pipeline stall. But when string compare macro instructions are employed iteratively many times over, as is typically seen within many application programs, the disadvantages caused by stalls during a single iteration of a string compare operation are furthermore exacerbated in proportion to the number of prescribed iterations.
Any type of operation in a pipeline microprocessor that requires multiple pipeline cycles to accomplish is problematic in that inefficient utilization of the pipeline stages is experienced. When this inefficient utilization is compounded by iterative situations, the execution speed of a microprocessor suffers. Therefore, what is needed is an apparatus in a microprocessor that enables a load-compare operation to be accomplished in a single pipeline cycle.