1. Field of the Invention
The present invention relates in general to computer architecture and in particular to a method and system of organizing memory access.
2. Description of the Related Art
Video, graphics, communications and multimedia applications require high throughput processing power. As consumers increasingly demand these applications, microprocessors have been tailored to accelerate multimedia and communications applications.
Media extensions, such as the Intel MMX(trademark) technology, introduced an architecture and instructions to enhance the performance of advanced media and communications applications, while preserving compatibility with existing software and operating systems. The new instructions operated in parallel on multiple data elements packed into 64-bit quantities. The instructions accelerated the performance of applications with computationally intensive algorithms that performed localized, reoccurring operations on small native data. These multimedia applications included: motion video, combined graphics with video, image processing, audio synthesis, speech synthesis and compression, telephony, video conferencing, and two and three-dimensional graphics applications.
Although parallel operations on data can accelerate overall system throughput, a problem occurs when memory is shared and communicated among processors. For example, suppose a processor performs data decompression of a video image. If a memory load or store occurs from an external agent or another processor while the data image is not complete, the external agent would receive incomplete or corrupt image data. Moreover, the situation becomes particularly acute, as many multimedia applications now require communications and data exchange between many external agents, such as external graphics processors.
Thus, what is needed is a method and system that allow computer architecture to perform computations in parallel, yet guarantee the integrity of a memory access or store.
The load fencing process and system receives a load fencing instruction that separates memory load instructions into older loads and newer loads. A load buffer within the memory ordering unit is allocated to the instruction. The load instructions newer than the load fencing instruction are stalled. The older load instructions are gradually retired. When all older loads from the memory subsystem are retired, the load fencing instruction is dispatched.