The invention relates to data processing systems using vector processing and Very Long Instruction Word (VLIW) architecture, more particularly to the concatenation of codewords of variable length.
A frame of image can be represented by a matrix of points referred to as pixels. Each pixel has one or more attributes representing the color associated with the pixel. Video streams are represented by consecutive frames of images. To efficiently store or transport image and video information, it is necessary to use data compression technologies to compress the data representing the attributes of each pixel of each frame of the images.
Various standards have been developed for representing image or video information in compressed formats, which includes Digital Video (DV) formats, MPEG2 or MPEG4 formats from Moving Picture Expert Group, ITU standards (e.g., H.261 or H.263) from International Telecommunication Union, JPEG formats from Joint Photographic Expert Group, and others.
Many standard formats (e.g., DV, MPEG2 or MPEG4, H.261 or H.263) use block based transform coding techniques. For example, 8xc3x978 two-dimensional blocks of pixels are transformed into frequency domain using Forward Discrete Cosine Transformation (FDCT). The transformed coefficients are further quantized and coded using zero run length encoding and variable length encoding.
Zero run length encoding is a technique for converting a list of elements into an equivalent string of run-level pairs, where each non-zero element (level) in the list is associated with a zero run value (run) which represents the number of consecutive elements of zero immediately preceding the corresponding non-zero element in the list. After zero run length encoding, strings of zeros in the list are represented by zero run values associated with non-zero elements. For example, the non-zero elements and their associated zero run values can be interleaved into a new list to represent the original list of elements with strings of zeros.
Variable length coding is a coding technique often used for lossless data compressing. Codes of shorter lengths (e.g., Huffman codewords) are assigned to frequently occurring fixed-length data (or symbols) to achieve data compression. Variable length encoding is widely used in compression video data.
After the Forward Discrete Cosine Transformation and quantization, the frequency coefficients are typically reordered in a zigzag order so that the zero coefficients are grouped together in a list of coefficients, which can be more effectively encoded using a zero run length encoding technique. The energy of a block of pixels representing a block of image is typically concentrated in the lower frequency area. When the coefficients are reordered in a zigzag order, the coefficients for the lower frequencies are located relatively before those for higher frequencies in the reordered list of coefficients. Thus, non-zero coefficients are more likely to concentrate in the front portion of the reordered coefficient list; and zero coefficients are more likely to concentrate in the end portion of the reordered list.
Since compressing images is a computational intensive operation, it is desirable to have highly efficient methods and apparatuses to perform run length encoding and variable length encoding.
Methods and apparatuses for concatenating codewords of variable lengths using a vector processing unit are described here.
In one aspect of the invention, a method for execution by a microprocessor to concatenate codewords of variable lengths includes: receiving a plurality of codewords from a first vector register; receiving a plurality of lengths representing bit lengths of the plurality of codewords respectively; generating a first bit stream from concatenating the plurality of codewords; summing the plurality of lengths to generate the bit length of the first bit stream; and outputting the first bit stream and the first length; wherein the above operations are performed in response to the microprocessor receiving a single instruction.
In one example according to this aspect, summing the plurality of lengths is performed concurrently while generating the first bit stream. The plurality of lengths are received from the first vector register; and the first bit stream and its bit length are output into a vector register. A plurality of indicators are generated, each of which indicates whether or not a corresponding one of the plurality of lengths is zero. Each of the plurality of indicators is stored in a bit in a condition register. In one example, generating the plurality of indicators is also performed concurrently while generating the first bit stream.
The present invention includes apparatuses which perform these methods, including data processing systems which perform these methods, and computer readable media which when executed on data processing systems cause the systems to perform these methods.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follow.