Digital video systems have become increasingly important in the communication and broadcasting industries. The International Standards Organization (ISO) has established a series of standards to facilitate the standardisation of compression and transmission of digital video signals. One of the standards, ISO/IEC 1318-2 entitled “Generic Coding of Moving Picture and Associated Audio Information” (or MPEG-2 in short, where “MPEG” stands for “Moving Picture Experts Group”) was developed in late 1990's. MPEG-2 has been used to encode digital video for a wide range of applications, including the Standard Definition Television (SDTV) and the High Definition Television (HDTV) systems.
In a typical MPEG-2 encoding process, digitized luminance and chrominance components of video pixels are input to an encoder and stored into macroblock (MB) structures. Three types of pictures are defined by the MPEG-2 standard. These picture types include “I-picture”, “P-picture”, and “B-picture”. According to the picture type, Discrete Cosine Transform (DCT) and Motion Compensated Prediction (MCP) techniques are used in the encoding process to exploit the spatial and temporal redundancy of the video signal thereby achieving compression.
The I-picture represents an Intra-coded picture that can be reconstructed without referring to the data in other pictures. Luminance and chrominance data for each intra-coded MB in the I-picture are first transformed to the frequency domain using a block-based DCT, to exploit spatial redundancy that may be present in the I-picture. Then the high frequency DCT coefficients are coarsely quantised according the characteristics of the human visual system. The quantised DCT coefficients are further compressed using Run-Level Coding (RLC) and Variable Length Coding (VLC), before finally being output into the compressed video bit-stream.
Both the P-picture and the B-picture represent inter-coded pictures that are coded using motion compensated data based upon other pictures. FIG. 10 illustrates the concept of a picture that is composed using inter-coding. For an inter-coded MB 104 in a current picture 101 in question, the MCP technique is used to reduce the temporal redundancy with respect to the reference pictures (these being pictures that adjoin the current picture 101 in the temporal scale, such as a “previous picture” 102 and a “next picture” 103 in FIG. 10) by searching in a search area 105 in a said reference picture 102 to find a block which minimizes a difference criteria (such as mean square error) between itself and 104.
The block that results in the minimal difference over the search area is named as “the best match block” being the block 106 in FIG. 10. Then, the displacements between 101 and 102 in the horizontal (X) and the vertical directions (Y) are calculated, to form respective motion vectors (MV) 104 which are associated with 101. After that the pixel-wise difference between 101 and 104, which is referred to as the “motion residue”, is calculated between two blocks and compressed using block-based DCT and quantisation. Finally, the motion vector and associated quantised motion residues are entropy-encoded using VLC and output to the compressed video bit-stream.
The principal difference between a P-picture and a B-picture lies in the fact that a MB in a P-picture only has one MV which corresponds to the best-matched block in the previous picture (i.e., vector 107 for block 106 in picture 102), while a MB in a B-picture (or a “bidirectional-coded MB”) may have two MVs, one “forward MV” which corresponds to the best-matched block in the previous picture, and one “backward MV” which corresponds to the best-matched block in the next picture (i.e., vector 109 for block 108 in picture 103). The motion residue of a bidirectional-coded MB is calculated as an average of the motion residue produced by the forward MV and by the backward MV.
With the diversity of digital video applications, it is often necessary to convert the compressed MPEG-2 bit-stream from one resolution to another. Examples include conversion from HDTV to SDTV, or from the pre-encoding bit-rate to another different bit-rate for re-transmission. In this description the input to a resolution conversion module is referred to as the input stream (or input compressed stream if appropriate), and the output from the resolution conversion module is referred to as the scaled output stream (or scaled compressed output stream if appropriate).
One solution to this requirement uses a “tandem transcoder”, in which a standard MPEG-2 decoder and encoder are cascaded to provide the resolution and bit-rate conversions. However, fully decoding and encoding MPEG-2 compressed bit-streams demands heavy computational resources, particularly by the operation-intensive MCP module in the MPEG-2 encoder. As a result, the tandem transcoding approach can be an inefficient solution for resolution or bit-rate conversion of compressed bit-streams.
Recently new types of video transcoders have been used to address the computational complexity of the tandem solution. Thus, for example, the computational cost of operation-intensive modules, such as the MCP on the encoding side, has been avoided through predicting the output parameters, which usually includes the encoding mode (such as intra-coded, inter-coded, or bidirectional-coded) and the MV value associated with current MB (or “current coding unit”), from side information (which may include the encoding mode, the motion vector, the motion residues, and quantisation parameters associated with each MB unit) extracted from the input compressed bit-streams. Such transcoders are able to achieve a faster speed than the tandem solution, at a cost of marginal video quality degradation.
When down-converting compressed bit-streams, such as when converting a high quality HDTV MPEG-2 bit-stream to a moderate quality SDTV MPEG-2 bit-stream, the prediction of the output motion vectors can be performed using a motion summarization algorithm based upon the input motion data of a supporting area from which the output macroblock is downscaled.
One motion summarization algorithm usually predicts the output MV as a weighted average of all the input MV candidates. The algorithm determines the significance of each MV candidate according to some activity measure, which can be the overlap region size, the corresponding motion residue energy, the coding complexity, and others. However, this approach is prone to outliers (which are MVs which have values significantly different from their neighbours) in the MV candidates that can reduce the performance in non-smooth motion regions.
An order statistics based algorithm has been developed to counter the aforementioned outlier problem. This approach uses scalar median, vector median, and weight median algorithms. These algorithms are able to overcome the effects of outlier motion vectors, and some of the algorithms (i.e., the weight median) are able to take into account the significance of each MV candidate. However, these algorithms do not perform well in the vicinity of a steep motion transition such as object boundary.
There have been some techniques based upon a hybrid of the weighted average and weighted median together, to produce an output by selecting the best one according to block-based matching criteria. However, due to the limitation of weighted average/median (i.e., weighted average tends to smooth a steep motion transition, while weighted median tends to shift the position of a motion transition), the hybrid technique is still unable to provide good results in the vicinity of object boundaries and high texture areas.