1. Field of the Invention
The invention relates to the field of computer systems. More specifically, the invention relates to the selective writing of data element from packed data based on a Mask using predication.
2. Background Information
Computer technology continues to evolve at an ever-increasing rate. Gone are the days when the computer was merely a business tool primarily used for word-processing and spreadsheet applications. Today, with the evolution of multimedia applications, computer systems have become a common home electronic appliance, much like the television and home stereo system. Indeed, the line between computer system and other consumer electronic appliance has become blurred as multimedia applications executing on an appropriately configured computer system will function as a television set, a radio, a video playback device, and the like. Consequently, the market popularity of computer systems is often decided by the amount of memory they contain and the speed at which they can execute such multimedia applications.
Those skilled in the art will appreciate that multimedia and communications applications require the manipulation of large amounts of data represented in a small number of bits to provide the true-to-life renderings of audio and video we have come to expect. For example, to render a 3D graphic, a relatively large collection of individual data items (e.g., eight-bit data) must be similarly processed.
One common operation required by such applications is the writing of selected data items from a collection of data items to memory. Whether a given data item is to be written to memory is based upon a mask. An approach to moving select bytes of data uses a test, branch, and write series of instructions. In accordance with this approach, one or more of the mask bits for each corresponding data item is tested and a branch is used to either write or bypass writing the byte to memory. However, this approach suffers a performance penalty for branch mispredictions.
To avoid this branch misprediction penalty, a Single Instruction, Multiple Data (SIMD) processor architecture is used to support a SIMD xe2x80x9cByte Mask Writexe2x80x9d instruction to write packed data from one storage location to another (see U.S. patent application Ser. No. 09/052,802; filed Mar. 31, 1998, now U.S. Pat. No. 6,173,393). FIG. 1 is a block diagram illustrating specialized parallel circuitry for implementing a SIMD Byte Mask Write instruction in a SIMD architecture. FIG. 1 illustrates the SIMD byte masked quadword move instruction (MASKMOVQ) which moves up to 64-bits representing integer data from a first SIMD register, labeled MM1 and denoted by the first operand SRC1100, to a memory location 106 implicitly specified by a register, using the byte packed data mask located in a second SIMD register, labeled MM2 and denoted by the second operand SRC2102. Bytes 110 and 114 of the register MM1100 are write-enabled by bytes 108 and 112 of the mask stored in the register MM2102.
As illustrated in FIG. 1 this SIMD Byte Mask Write instruction requires specialized circuitry in the processor to process each byte of a packed data item in parallel. Although the parallel nature of this specialized circuitry achieves relatively good processor throughput, this specialized circuitry requires valuable die area and is only utilized for graphical and similar type processing.
A method and apparatus for selectively writing data elements from packed data based upon a mask using predication are described. In one embodiment of the invention, for each data element of a packed data operand, the following is performed in parallel processing units: determining a predicate value for the data element from one or more bits of a corresponding packed data mask element indicating whether the data element is selected for writing to a corresponding storage location, and storing in the corresponding storage location the data element based on the predicate value.