1. Field of the Invention
This invention relates in general to the field of microelectronics, and more particularly to a technique for incorporating selective suppression of store checking features at the instruction level into an existing microprocessor instruction set architecture.
2. Description of the Related Art
Since microprocessors were fielded in the early 1970's, their use has grown exponentially. Originally applied in the scientific and technical fields, microprocessor use has moved over time from those specialty fields into commercial consumer fields that include products such as desktop and laptop computers, video game controllers, and many other common household and business devices.
Along with this explosive growth in use, the art has experienced a corresponding technology pull that is characterized by an escalating demand for increased speed, expanded addressing capabilities, faster memory accesses, larger operand size, more types of general purpose operations (e.g., floating point, single-instruction multiple data (SIMD), conditional moves, etc.), and added special purpose operations (e.g., digital signal processing functions and other multi-media operations). This technology pull has resulted in an incredible number of advances in the art which have been incorporated in microprocessor designs such as extensive pipelining, super-scalar architectures, cache structures, out-of-order processing, burst access mechanisms, branch prediction, and speculative execution. Quite frankly, a present day microprocessor is an amazingly complex and capable machine in comparison to its 30-year-old predecessors.
But unlike many other products, there is another very important factor that has constrained, and continues to constrain, the evolution of microprocessor architecture. This factor—legacy compatibility—accounts for much of the complexity that is present in a modern microprocessor. For market-driven reasons, many producers have opted to retain all of the capabilities that are required to insure compatibility with older, so-called legacy application programs as new designs are provided which incorporate new architectural features.
Nowhere has this legacy compatibility burden been more noticeable than in the development history of x86-compatible microprocessors. It is well known that a present day virtual-mode, 32-/16-bit x86 microprocessor is still capable of executing 8-bit, real-mode, application programs which were produced during the 1980's. And those skilled in the art will also acknowledge that a significant amount of corresponding architectural “baggage” is carried along in the x86 architecture for the sole purpose of retaining compatibility with legacy applications and operating modes. Yet while in the past developers have been able to incorporate newly developed architectural features into existing instruction set architectures, the means whereby use of these features is enabled—programmable instructions—have become scarce. More specifically, there are no “spare” instructions in certain instruction sets of interest that provide designers with a way to incorporate newer features into an existing architecture.
In the x86 instruction set architecture, for example, there are no remaining undefined 1-byte opcode states. All 256 opcode values in the primary 1-byte x86 opcode map are taken up with existing instructions. As a result, x86 microprocessor designers today must choose either to provide new features or to retain legacy compatibility. If new programmable features are to be provided, then they must be assigned to opcode values in order for programmers to exercise those features. And if spare opcode values do not remain in an existing instruction set architecture, then some of the existing opcode values must be redefined to provide for specification of the new features. Thus, legacy compatibility is sacrificed in order to make way for new feature growth.
There are a number of features that programmers desire in a present day microprocessor, but which have heretofore been precluded from incorporation because of the aforementioned reasons. One particular feature that is desirable for incorporation is store check suppression control at the instruction level.
Since virtually all microprocessors utilize multi-stage pipeline architectures, it is possible—indeed probable—that an instruction which is being fetched into the pipeline may very well be the target of a pending store operation that is proceeding through to completion in later stages of the pipeline, but which has not yet been completed. That is, the data to be stored to a destination location has not yet been written to memory (that is, external memory or internal cache). This situation can exist under many different conditions. For example, the store instruction may be proceeding through an earlier pipeline stage that is not dedicated to writing memory. Alternatively, the data may have been placed in a pending store buffer that is waiting for a convenient time to write to memory, while the store instruction was allowed to exit the pipeline. One skilled in the art will appreciate that pipeline architectures present various challenges to microprocessor designers that relate to the synchronization of instructions which are programmed for sequential execution, but which are executed in part by parallel operations in an pipeline fashion.
Store checking is an inherent feature of all pipeline microprocessors that is provided to insure that all instructions resident within in a microprocessor pipeline are indeed the instructions intended for execution by the application programmer. Apparatus and means are provided within these processors' pipelines to check all instructions proceeding into the pipeline against pending store events that have yet to post in memory, and furthermore to check all instructions in preceding pipeline stages against the destination addresses of store instructions when those instructions are executed. If a pending store event is detected whose destination address corresponds (corresponding generally with cache line granularity) to the location of an incoming instruction, then the pipeline is stalled and the store is allowed to post to memory. When the pipeline is stalled, the progression of instructions through the various pipeline stages is halted until the stall is removed. Following posting of the data, the incoming instruction is again fetched from its location and is allowed to proceed through the pipeline. During execution of a store instruction, if an instruction is detected in a previous pipeline stage whose location (i.e., its instruction pointer (IP)) corresponds to the destination address of the store instruction, then synchronization hardware in the microprocessor stalls the pipeline and flushes all pipeline stages above and up through the previous pipeline stage containing the detected instruction. After the store instruction writes its data, then the pipeline is refilled.
Store checking is an incredibly onerous task, requiring hardware that is proportional to the number of pipeline stages in a microprocessor. This is the reason why store destinations and instruction locations, as alluded to above, are typically checked only with cache line granularity. Furthermore, because of the complexities inherent in the translation of virtual to physical addresses, store checking is also generally accomplished using virtual addresses rather than physical addresses.
At present, a programmer has no control over store checking features in a microprocessor. If the programmer chooses to employ self-modifying code techniques, then he/she must insure that subsequent instructions which are the store targets of previous store operations are indeed desired for execution of the corresponding application program. At the source code level, this can be accomplished, although such a programming technique may not be desirable. Yet, a microprocessor does not execute source code. Automated compilers generate the instruction stream for microprocessors from the provided source code. And the resulting instruction stream may very well contain interlaced code and data within the same cache line due to alignment properties of a given compiler. Hence, even though a programmer has provided means for ensuring the coherency of self-modifying source code, pipeline synchronization events may be disadvantageously introduced as a result of code compilation.
A programmer, for various performance reasons, may desire to precede an instruction with a store that modifies the instruction's location, but the desired execution sequence is that the former contents of the location be executed. This is presently not possible because store checking mechanisms preclude such a sequence of execution events.
Therefore, what is needed is an apparatus and method that incorporate suppression of store checking features into an existing microprocessor architecture having a completely full opcode set, where incorporation of the suppression features allow a conforming microprocessor to retain the capability to execute legacy application programs while concurrently providing application programmers and/or compilers with the capability to control whether or not store checking is performed on any given instruction.