Historically, the central processing unit (CPU) of computer systems consisted of a single semiconductor chip known as a microprocessor. This microprocessor executed the programs stored in the main memory by fetching their instructions, examining them, and then executing them one after another. Due to rapid advances in semiconductor technology, faster, more powerful and flexible microprocessors were developed to meet the demands imposed by ever more sophisticated and complex software.
Presently, the state-of-the-art in microprocessor design has come to a point where designing the next generation of microprocessors is incredibly costly, labor-intensive, and time-consuming. However, new applications, such as multimedia, which integrates text, audio, speech, video, data communications, and other time-correlated data to create a more effective presentation of information, requires a large amount of processing power to handle in a real-time environment. And with the explosion in network and file server applications, there is a need for processing vast mounts of data in a fast, efficient manner. The trend is for even more complex and lengthier software programs. The processing required to run these applications in real-time is starting to overwhelm even the most powerful of microprocessors.
One solution is to implement multiple processors. A singularly complex task can be broken into sub-tasks. Each sub-task is processed individually by a separate processor. For example, in a multi-processor computer system, word processing can be performed as follows. One processor can be used to handle the background task of printing a document, while a different processor handles the foreground task of interfacing with a user typing on another document. Thereby, both tasks are handled in a fast, efficient manner. This use of multiple processors allows various tasks or functions to be handled by other than a single CPU so that the computing power of the overall system is enhanced. And depending on the complexity of a particular job, additional processors may be added. Utilizing multiple processors has the added advantage that two or more processors may share the same data stored within the system.
In multi-processor systems, care must be taken to maintain processor consistency. Processor consistency is inherently assumed by existing software written for many multi-processor system architectures. For example, assume that processor P1 is a producer of information and processor P2 is the consumer of information. P1 performs a write operation W1 to location 1 followed by a write operation W2 to location 2. Location 2 contains a flag variable that signals that the data in location 1 is valid. Processor P2 continuously performs read operation R2 on location 2 until the flag becomes valid. After the flag is observed valid, P2 performs a read operation R1 on location 1 to read the data. In order for this algorithm to successfully execute in a multi-processor system, the order in which W1 and W2 are written by processor P1 should be the same order in which R1 and R2 appear to be updated to processor P2.
One method of ensuring processor consistency is to impose a mater-slave arrangement, whereby one of the processors (the master) controls the other processors (slaves). However, this arrangement is quite slow and inefficient. Another method involves imposing a strict ordering regiment. Both stores and loads are executed in order (when a program changes a value held in memory, it is performing a store; when a program retrieves data from memory, it is performing a load). In other words, the stores and loads are executed in the same sequence implied by the source program. However, one drawback with in-order execution is that performance suffers. Operating a processor in an out-of-order fashion allows the processor to exploit parallelisms present in the source code. By implementing multiple execution units, parallel sequences can be processed at the same time, thereby minimizing the processing time.
In an out-of-order processor, loads are allowed to pass other loads. This phenomenon is known as speculative loading. However, when a load passes another load which has not yet been executed, there is a potential for a violation of processor ordering. Hence, processor ordering and cache coherency are especially critical for high performance processors that utilize out-of-order processing.
One mechanism for handling potential violations involves the use of "fencing" operations to prevent the violations. However, this approach imposes additional burdens on the programmers. Furthermore, previous coding sequences would be incompatible with this approach. Thus, there is a need in multi-processor computer systems for an apparatus and method for maintaining a processor ordering model. It would be preferable if such an apparatus or method could maintain the processor ordering model in an out-of-order environment, and yet be compatible with previously written source codes.