The demand to incorporate video data in numerous transmission and storage systems, and the desire to improve the quality of video in such systems, have prompted rapid advancement in digital video compression techniques. Compression of digital video without significant degradation in quality is possible because of the high degree of spatial, temporal, and spectral redundancy in video sequences. Video encoders exploit the spatial, temporal, and spectral correlations in un-compressed video data to generate compressed video streams through complex predictive coding techniques.
During the past decade, a number of ISO/ITU/SMPTE video coding standards targeting the vast range of video applications have evolved. These standards include H.261, MPEG1, MPEG2, H.263, MPEG4, VC-1, and AVC/H.264. Each new video coding standard improves the coding efficiency of its predecessor by introducing more complex and efficient prediction and estimation tools. The coding efficacy of video coding algorithms and the computational load have therefore risen sharply.
The issue of computational complexity becomes more significant with the arrival of the H.264/AVC (ISO/IEC 14496-10) video coding standard, as such standard offers more coding options comparing to the previous standards. The H.264/AVC standard delivers higher compression efficiency relative to the earlier standards but at the cost of higher computational load. The higher computational load is evident from the comprehensive set of video coding tools that the H.264/AVC standard provides. The tools include multiple prediction block sizes for Intra (I), Predicted (P), and Bi-directionally predicted (B) type pictures, multiple short-term and long-term reference frames for P and B type pictures, multiple hypothesis prediction modes, generalized B images that can act as predictors for other B images, Arithmetic coding and in-loop deblocking. In order to encode a video frame, an encoder has to select between numerous Inter and Intra macroblock prediction modes to obtain the optimum encoding mode. Such a selection process is time-consuming but vital to achieve the compression performance provided by the H.264/AVC standard.
The high computational complexity of the H.264/AVC standard presents a major hurdle in the implementation of H.264/AVC compliant encoders and decoders, particularly in real-time resource constrained environments. This can be appreciated from the fact that encoders generating H.264/AVC compliant streams are generally four to five times computationally more demanding than MPEG2. This fact is significant in consumer electronics where the success of a system depends largely on its cost competitiveness, and where digital signal processors (DSPs) and other devices having low or limited computing power are frequently used. The emergence of high definition television (HDTV) has raised the stakes further by increasing the computational demand several folds. H.264/AVC offers multiple spatial prediction modes of blocks from neighboring blocks. However, the prediction model is cumbersome and less effective for high textured images.
In order to help deploy low cost systems, there is a need for methods and systems that are capable of reducing the computational complexity of encoders and decoders compliant to a specific standard, such as H.264/AVC, without compromising coding efficiency. There is also a need for video coding and decoding techniques that can reduce the computational complexity without massive changes in the embedded prediction algorithms prescribed by video coding standards, such as H.264/AVC.
This disclosure describes unique techniques and embodiments of video coding and decoding that meet one or more above needs. According to one embodiment, a sub-sampled image prediction method is merged with a video coding/decoding standard, such as the H.264/AVC encoding process, in a unique way so that the generated compressed video bit streams remain compliant with H.264/AVC standard. In one aspect, the disclosure makes use of a multiple reference frames tool and the concept of generalized B images, as provided by the H.264/AVC standard, taking full advantage of H.264/AVC coding tools and also reaps the benefits of sub-sampled image prediction. A higher resolution input image is sub-sampled to form a set of lower resolution sub-sampled images. Utilizing the high degree of correlation among the sub-sampled images in a set, a motion compensated prediction of a sub-sampled image in a set is performed from another sub-sampled image in the set. Employing a multiple reference frame paradigm as provided by the H.264/AVC standard, the above prediction is compared with predictions from other sub-sampled images in the same set or in previously coded sets, and the best predictors are used to code a slice or macroblock of the current sub-sampled image.
In one aspect, an exemplary encoding process according to this disclosure divides a higher resolution input image into a corresponding set of lower resolution sub-sampled images, and feeds the sub-sampled images in appropriate order to a video encoder compliant with a specific video coding standard, such as the H.264/AVC standard. Each set of sub-sampled images corresponding to a higher resolution input image comprises a first sub-sampled image and subsequent sub-sampled images. In one embodiment, the video encoder is a H.264 encoder and encodes the first image of each set either as an independent I picture, or as a P or B picture, with respect to the first image(s) of other set(s); while any subsequent image of a set is coded with respect to the first image or a subsequent image of the same set, or an image of a previously coded set, as a regular P or B picture. All sub-sampled images of a set are coded either in Intra predictive coding format or in motion compensated Inter predictive coding format as prescribed by a video coding standard, such as the H.264/AVC standard. The compressed streams generated by the exemplary coding process can be decoded by a decoder conforming to the same video coding standard, such as the H.264/AVC standard. The amalgamation of sub-sampled image prediction with H.264 tools reduces the computational complexity of the encoding process.
In decoding the video streams generated by the exemplary encoding process of this disclosure, a decoder is utilized to rearrange the decoded lower resolution sub-sampled images of each set into corresponding higher resolution output images. The output images can be displayed or stored on appropriate devices.
According to another embodiment of this disclosure, an exemplary coding process utilizes a unique spatially scalable H.264 encoding paradigm that does not require up-scaling of the base layer for predictive encoding of the enhancement layer. In one embodiment, the first sub-sampled image of a set corresponding to each input image acts as the base layer, while the enhancement layer comprises all subsequent sub-sampled images of the set that are predicted from the base layer image and/or one or more enhancement layer images through motion compensated prediction. Without affecting the integrity of the video stream, the decoding process may choose to decode just the base layer, the base layer and some parts of the enhancement layer, or the base layer and the entire enhancement layer. According to yet another embodiment of this disclosure, an exemplary encoding and decoding process utilizes proprietary extensions to H.264 encoding and decoding processes for further improvement in coding efficacy. The encoding process may choose to enhance a reference sub-sampled image of a set prior to predicting other sub-sampled images through motion compensated prediction; thereby forming predictors with better quality. Enhancement may be carried out through any filtering or sharpening techniques. Moreover, in one aspect, the exemplary encoding process may utilize the high degree of correlation between the sub-sampled images of a set, and decide not to encode motion vector data of the motion vectors between two sub-sampled images of a set. The motion vector data can be easily created within the decoding process by considering the sub-sampling order. Furthermore, the exemplary encoding process may decide not to encode the motion vector data of the motion vectors between two sub-sampled images of two different sets, and instead reuse the motion vectors between two previously coded sub-sampled frames of the same sets.
Additional advantages and novel features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The advantages of the present teachings may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below. The drawing figures depicted herein are by way of example only, not by way of limitation. In the figures, like reference numerals refer to the same or similar elements.