Aspects of this invention relate generally to data processing, and more particularly to a method and apparatuses for high quality, fast intra coding usable for creating digital video content
Video compression technology enables the creation, distribution, receipt, and/or display of digital video data, which includes any pre-recorded or live electronic signals representing video images, by sources such as consumer devices (for example, personal computers, hard-drive storage devices, digital televisions, digital video camera recorders, digital video disk recorders/players, digital set-top boxes, telecommunications devices, and video production devices, among other devices), television networks and stations, studios, Internet broadcasters, wireless operators, and cable/satellite operators, among others.
Various industry specifications, or standards, relating to video compression technology have been promulgated by groups desiring, among other things, to ensure interoperability between devices and systems that create, deliver, receive and/or display digital video data. The International Telecommunication Union—Telecommunications Standardization Sector's (“ITU-T”) Video Coding Experts Group (“VCEG”) and the International Organization for Standardization/International Electrotechnical Commission's (“ISO/IEC”) Moving Picture Experts Group (“MPEG”), for example, are jointly developing a video compression standard referred to by the ITU-T as “H.264,” and by the ISO/IEC as “MPEG-4 Advanced Video Coding,” which is embodied in a document entitled “Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITU-T Rec. H.264, ISO/IEC 14496-10 AVC”), Pattaya, Thailand, 7-14 Mar., 2003 (hereinafter, the video compression standard will be referred to as the “H.264/AVC Standard”). The H.264/AVC Standard is hereby incorporated by reference in its entirety for all purposes, as if set forth in full herein.
The H.264/AVC Standard defines, among other things, a video coding layer (“VCL”) to produce a digital representation of input video images. An encoder/decoder pair (“CODEC”) implementing the VCL of the H.264/AVC Standard generally performs the well-known functions of prediction, transformation, quantization, and entropy coding, to produce/decode an encoded bit stream having a particular syntax. Each picture of an input video is partitioned into fixed-sized blocks of data called macroblocks that cover a rectangular picture area of 16×16 samples of the luminance (“luma”) component of the picture color, and 8×8 samples of each of the two chrominance (“chroma”) components of the picture color. All luma and chroma samples of a macroblock are either spatially or temporally predicted, the prediction residuals thereof are transformed using an integer transform, and the transform coefficients are quantized and transmitted using entropy-coding methods.
Macroblocks are organized into slices, which are subsets of a given picture that are independently decodable. Each macroblock may be coded using one of several coding types, depending on the slice type of the macroblock. One type of slice is an intra-(“I”) slice, which provides for the coding of macroblocks without referring to other pictures within the input video sequence (hereinafter referred to as “intra coding”). The H.264/AVC Standard specifies techniques for intra coding luma-component macroblocks as 16 4×4 blocks or as a single 16×16 block. Chroma-component macroblocks are intra coded in the same manner as 16×16 luma-component macroblocks. Each 4×4 block contains sixteen pixels.
The H.264/AVC Standard designates prediction modes, which are used to generate predictive pixel values. There are nine prediction modes for 4×4 luma-component blocks, four prediction modes for 16×16 luma- and chroma-component blocks. The reference software of H.264/AVC, popularly known as JM (Joint Model) software, uses a full search (“FS”) algorithm for determining the prediction mode with which a given macroblock should be encoded—the FS algorithm calls for examining each of the pixels in a macroblock using each of the nine prediction modes to determine the prediction mode that yields predictive pixel values closest to original samples of the picture of the input video.
Although the H.264/AVC Standard has higher compression efficiency than previous video compression technologies such as MPEG-2, the computational complexity, or cost, for intra coding I-slice type macroblocks (and also for coding P-slice type macroblocks, motion estimation, and block selection algorithms) according to the FS algorithm is high, and therefore very processor-intensive, which may impact upon the design and/or cost of H.264/AVC Standard-compliant CODECS, or other hardware, software, or firmware.
Other proposed fast intra coding prediction mode selection techniques relevant to the H.264/AVC Standard include: (1) using an edge map histogram for macroblocks to reduce the number of prediction modes used for mode decisions (see Feng Pan et al., “Fast Mode Decision for Intra prediction,” JVT-G013, Pattaya, Thailand, 7-14 Mar., 2003); (2) performing a combined motion estimation and prediction mode decision, based on comparisons of block energies with a threshold to eliminate certain prediction modes (see Yin Peng et al., “Fast Mode Decision and Motion Estimation for JVT/H.264,” ICIP 2003); and (3) reducing the number of prediction modes used for mode decisions according to, among other things, a locally adaptive threshold factor based on a frequency term associated with local image information (see Bojun Meng et al., “Efficient Intra-Prediction Mode Selection for 4×4 Blocks in H.264,” ICME 2003, III-521-III-524 (“Meng et al.”)). Considerable computation is necessary to find edge map histograms and to determine block energies, however, and Meng et al. do not disclose how to compute the frequency term in the proposed adaptive threshold factor, which undoubtedly increases computational complexity.
There is therefore a need for a computationally efficient algorithm for use in determining optimal prediction modes for intra coding I-slice type macroblocks in the context of the H.264/AVC Standard and other data processing applications, which algorithm accurately preserves decoded video quality while also allowing a tunable tradeoff between computational complexity and decoded video quality.