A known way to increase the performance of a computer system is to include a local, high speed memory known as a cache. A cache increases system performance because there is a high probability that once the central processing unit (CPU) has accessed a data element at a particular address, its next access will be to an adjacent address. The cache fetches and stores data which is located adjacent to the requested piece of data from a slower, main memory or lower-level cache. In very high performance computer systems, several caches may be placed in a hierarchy. The cache which is closest to the CPU, known as the upper-level or "L1" cache, is the highest level cache in the hierarchy and is generally the fastest. Other, generally slower caches are then placed in descending order in the hierarchy starting with the "L2" cache, etc., until the lowest level cache which is connected to main memory. Note that typically the L1 cache is located on the same integrated circuit as the CPU and the L2 cache is located off-chip. However as time passes it is reasonable to expect that lower-level caches will eventually be combined with the CPU on the same chip.
Recently, microprocessors designed for desktop applications such as personal computers (PCs) have been modified to increase processing efficiency for multimedia applications. For example, a video program may be stored in a compression format known as the Motion Picture Experts Group MPEG-2 format. When processing the MPEG-2 data, the microprocessor must create frames of decompressed data quickly enough for display on the PC screen in real time. The video frame can be represented as a two-dimensional vector, wherein each pixel location corresponds to a unique row and column of the vector. In order to display the data in real time, the microprocessor must process this two-dimensional vector quickly. However conventional microprocessors for desktop applications only contain integer and floating-point scalar processing capability. What is needed then is a data processor for desktop applications which can process vector data quickly as well. This need is met by the present invention, whose features and advantages will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings.