This invention relates to manipulating data, for example for encoding or decoding digital video signals.
It is becoming increasingly common for video signals to be transmitted or stored in a digital, rather than an analogue, format. Digital video signals are usually compressed before transmission or storage (using a standard compression system such as MPEG-2, H.261 or H.263) and decompressed before playback. Several video compression standards use block-format video encoding, in which the pixels of the image to be compressed are split into blocks of adjacent pixels and each block is then compressed by a series of steps. This is efficient because most naturally-occurring images have areas which look fairly uniform and when compressing the image in this way the local uniformity reduces the amount of data needed to describe the image.
The first step of a typical block-format compression process is to split the image into smaller component blocks of adjacent pixels. Typically, the image is split into macroblocks (MBs), which consist of 256 pixels in a 16×16 array. The image in a macroblock is characterised by a luminance value (Y) for each pixel and two chrominance values (U and V) which in what is known as the 4:2:0 format (as used in many video compression standards), are each in an 8×8 array. Thus in this format each chrominance sampling point covers four luminance pixels (see FIG. 1). The main purpose of splitting the image in this way is to make the job of spatial compression easier; only a small section of the image needs to be examined, and so the task—although less efficient—is less complicated.
The usual technique used next is a discrete cosine transform (DCT). This works in much the same way as the Fourier transform, but in two dimensions on a set of pixels. Each coefficient in the DCT output represents a wave in the pixel domain, with the amplitude determined by the value of the coefficient, and the frequency in both dimensions determined by the position in the coefficient matrix. Moving to the right or the bottom of the DCT coefficient matrix increases the frequency of this wave. A superposition of a number of these waves leads to a reconstruction of the original image.
At this point, compression can begin on each DCTed luminance and chrominance matrix by removing some of the coefficients from the matrix, and quantising others. This leads to inaccuracies in the reconstituted image (lossy compression) but this is often acceptable, and the resulting matrix is easier to compress since it contains less information.
Another refinement to the compression process is the use of run-length encoding. This is a useful way of compressing sparse matrices. The technique involves thinking of the matrix as a long string of data, much as would be the case in a computer's memory. Run-length encoding (RLE) then consists of describing that string as a number indicating the length of a series of zeroes, followed by a non-zero data element, followed by a number of zeroes, followed by a non-zero data element, and so on.
To improve compression yet further, these RLE strings are Huffman-encoded. Huffman encoding consists of expressing some data item as a symbol—in this case, the number of zeroes (the run-length) followed by the data item. Huffman encoding relies on previous knowledge of the probability of occurrence of a particular symbol, such that the most likely symbols are encoded with fewer bits than the original, whereas the least likely symbols are encoded with more bits than the original. With sufficient knowledge of the likely data set, the number of bits required to represent that set are reduced, since the most frequently occurring symbols are represented in a small number of bits.
The success of a Huffman encoder relies on the predictability of its input data. In the example of the block of DCT coefficients outlined above, a raster scanning mechanism does not give particularly predictable data since the values tend to cluster in the top-left corner of the matrix (the low-frequency area). Thus scanning the first few lines will tend to give a data burst, followed by a few zeroes, followed by a slightly shorter data burst, followed by a few more zeroes, and so on. It is more efficient to group the non-zero data together, leading to a more predictable run-length, and so better Huffman compression. This is achieved by zigzag scanning.
FIG. 2 shows the scanning route for a standard zigzag scan of an 8×8 pixel block. This can be used for the 8×8 U and V blocks of a macroblock and, by splitting the 16×16 luminance block into four 8×8 blocks, for the luminance data too. In this way, each macroblock can be represented by six 8×8 blocks (4 Y, one U, one V) in the 4:2:0 format. Clearly, other forms of scan, for instance unidirectional rather than bi-directional or at angles other than 45° could be used.
The zigzag scanned data is then Huffman encoded, so a simplified I-frame (spatial only) compression method could be summarised as the steps shown in FIG. 3.
In order to perform real-time video compression or decompression there is a need to perform these steps very quickly.
In practice, the zigzag encoding illustrated in FIG. 2 is performed by reading each element of the input matrix array, accessing a look-up table as illustrated in FIG. 4 to find the element's destination location in the output array and then storing the element at that location in the output array. Similar procedures are also used for other applications to reorder sets of data.