The invention relates to data processing systems using vector processing and Very Long Instruction Word (VLIW) architecture, more particularly to packing bitstreams.
A frame of image can be represented by a matrix of points referred to as pixels. Each pixel has one or more attributes representing the color associated with the pixel. Video streams are represented by consecutive frames of images. To efficiently store or transport image and video information, it is necessary to use data compression technologies to compress the data representing the attributes of each pixel of each frame of the images.
Various standards have been developed for representing image or video information in compressed formats, which includes Digital Video (DV) formats, MPEG2 or MPEG4 formats from Moving Picture Expert Group, ITU standards (e.g., H.261 or H.263) from International Telecommunication Union, JPEG formats from Joint Photographic Expert Group, and others.
Many standard formats (e.g., DV, MPEG2 or MPEG4, H.261 or H.263) use block based transform coding techniques. For example, 8xc3x978 two-dimensional blocks of pixels are transformed into frequency domain using Forward Discrete Cosine Transformation (FDCT). The transformed coefficients are further quantized and coded using zero run length encoding and variable length encoding.
Zero run length encoding is a technique for converting a list of elements into an equivalent string of run-level pairs, where each non-zero element (level) in the list is associated with a zero run value (run) which represents the number of consecutive elements of zero immediately preceding the corresponding non-zero element in the list. After zero run length encoding, strings of zeros in the list are represented by zero run values associated with non-zero elements. For example, the non-zero elements and their associated zero run values can be interleaved into a new list to represent the original list of elements with strings of zeros.
Variable length coding is a coding technique often used for lossless data compressing. Codes of shorter lengths (e.g., Huffman codewords) are assigned to frequently occurring fixed-length data (or symbols) to achieve data compression. Variable length encoding is widely used in compression video data.
After the Forward Discrete Cosine Transformation and quantization, the frequency coefficients are typically reordered in a zigzag order so that the zero coefficients are grouped together in a list of coefficients, which can be more effectively encoded using a zero run length encoding technique. The energy of a block of pixels representing a block of image is typically concentrated in the lower frequency area. When the coefficients are reordered in a zigzag order, the coefficients for the lower frequencies are located relatively before those for higher frequencies in the reordered list of coefficients. Thus, non-zero coefficients are more likely to concentrate in the front portion of the reordered coefficient list; and zero coefficients are more likely to concentrate in the end portion of the reordered list.
Since compressing images is a computational intensive operation, it is desirable to have highly efficient methods and apparatuses to perform run length encoding and variable length encoding.
Methods and apparatuses for concatenating codewords of variable lengths using a vector processing unit are described here.
In one aspect of the invention, a method for execution by a microprocessor to pack bit streams of variable lengths including: receiving a first bit segment from a first vector register; receiving a second bit segment from a second vector register; determining whether or not the sum of the bit length of the first bit segment and the bit length of the second bit segment is larger than a required length; generating a third bit segment from the first and second bit segments; and outputting the third bit segment in a third vector register; where the above operations are performed in response to the microprocessor receiving a first single instruction. The third bit segment is generated from concatenating the first bit segment and a beginning portion of the second bit segment such that the bit length of the third bit segment is equal to the required length when the sum is larger than the required length; and the third bit segment is generated from concatenating the first and second bit segments when the sum is not larger than the required length.
In one example according to this aspect, the first bit length is zero in one scenario; the second bit length is zero in another scenario. An indicator is generated to indicate whether or not the second bit length is zero.
When the sum is larger than the required length, information is generated to specify an ending portion of the second bit segment that contains the bits in the second bit segments that are not in the beginning portion of the second bit segment. In the execution of a second single instruction, the ending portion of the second bit segment is received from the second vector register to generate a forth bit segment from the ending portion of the second bit segment.
An overflow indicator is generated to indicating whether or not the sum is larger than the required length and stored in a bit of the third vector register and/or in a condition register.
A full indicator is generated to indicate whether or not the third bit length is equal to the required length and stored in a bit of the third vector register and/or in a condition register.
In one example, an underflow indicator is also generated to indicate whether or not the sum is smaller than the required length.
The present invention includes apparatuses which perform these methods, including data processing systems which perform these methods, and computer readable media which when executed on data processing systems cause the systems to perform these methods.
Other features of the present invention will be apparent from the accompanying drawings and from the detailed description which follow.