1. Field of the Invention
The present invention relates generally to a data processing technique of a portable multimedia apparatus, and in particular, to a subword parallelism technique for efficiently processing multimedia data and a method for converting input data.
2. Description of the Related Art
In a multichannel image coding scheme, standard images can be expressed with image signals based on vector values, and each pixel of the images is composed of three components, i.e., Red, Green and Blue (RGB). However, the RGB color space is not suitable to be recognized by human beings. In order to solve this problem, the image and video processing field frame-coverts the RGB color space into a YCbCr color space. The YCbCr color space is a color coordinate space based on the color perceptibility of humans, and because the human eye is less susceptible to high frequency in terms of chrominance (for example, Cb and Cr), humans cannot recognize color distortion with the naked eye even though it undergoes undersampling. In addition, a luminance component Y of the image can be processed independently of the chrominance components Cb and Cr.
Meanwhile, a subword parallelism technique that can simultaneously operate for several small data elements, like 8-bit pixels, is used for image processing. For subword parallelism, several small data elements (for example, 8-bit pixels) are packed into one large register while the individual elements are processed in parallel.
FIG. 1 is a conceptual diagram of the conventional subword parallelism technique.
Referring to FIG. 1, in a 32-bit parallelism mechanism divided into four 8-bit Arithmetic Logic Units (ALUs) 110, 120, 130 and 140, two 32-bit words 11 and 13, including information, are being processed.
The words 11 and 13 each include 3 subwords having Y, Cb and Cr information. In this case, the 8 Least Significant Bit (LSB) bits of each word are unused. The subwords undergo computation in their associated ALUs 110, 120, 130 and 140, and are output as another word 15.
However, in the subword parallelism technique, not only because chrominance data is not arranged in a range of a square of 2, but also because the type of the stored data is not suitable for computation, overhead for processing the data occurs, affecting data processing capability.
FIGS. 2A and 2B are conceptual diagrams of packing and unpacking processes in the conventional subword parallelism technique.
In FIG. 2A, the subword parallelism technique parallel-adds 8-bit Y1, Cb1, and Cr1, stored in a first register R1 to associated 8-bit Y0, Cb0 and Cr0 stored in a second register R2, and stores the resulting value in an 8-bit region of a third register R3. However, the resulting value obtained by adding 8-bit data to 8-bit data may exceed 8 in the number of bits, causing overflow. In this case, the desired resulting value cannot be obtained.
In order to solve this problem, the conventional subword parallelism technique uses an unpack instruction. That is, the subword parallelism technique shifts an 8-bit Y1 value in the first register R1 to a fourth 32-bit register (not shown), shifts an 8-bit Y0 value in the second register R2 to a fifth 32-bit register (not shown), performs addition computation thereon, and stores the resulting value in a sixth 32-bit register (not shown).
FIG. 2B illustrates an exemplary method of storing 16-bit values stored in a first register R1 and a second register R2, in a 32-bit register divided in 8 bits.
In this case, if at least one of C0, C1, C2 and C3 is greater than 255 (binary 11111111), 255 is stored in a designated location of a divided third register R3. However, this packing/unpacking process increases the number of instruction executions in the image processing process, causing performance degradation of the image processing technique. Therefore, various process architectures are being proposed in order to reduce the computation overhead.
FIG. 3 is a conceptual diagram for a description of the conventional 48-bit datapath subword parallelism technique for solving the computation overhead problem.
Referring to FIG. 3, four 12-bit ALUs are used for processing 8-bit pixels. In this case, the subword parallelism technique performs computation on 8-bit data signals in their associated 12-bit ALUs 310, 320, 330 and 340, and stores the resulting values in a 12-bit storage 37. Therefore, it is possible to solve the overflow problem which may occur in the 8-bit computation. However, the subword parallelism technique may increase the size and cost of the hardware because it uses the 12-bit ALUs.