Video codecs are employed to convert initial video sequence (a set of video images, also named pictures, or frames) into encoded bitstream (a set of compressed video sequence binary data), and also converting video sequence binary data produced by a video codec system into a reconstructed video sequence (a decoded set of video images, or reconstructed frames). (While the expression “codec” can be viewed as referring to a component that serves to both “code” and “decode” information, as used herein it will be understood that a “codec” at least serves to code information and may, or may not, also serve to decode information.) Hereinafter, the terms “frame” and “picture” are assumed to be identical.
Video coding for telecommunication applications have evolved through development of a number of video coding standards in a continued effort to maximize coding efficiency and hence reduce bandwidth requirements. One relatively new international video coding standard H.264/AVC is designed to enable improved operation for a broad variety of applications which may be deployed over existing and future networks.
It is known that video compression relies on two basic assumptions. The first is that human sensitivity to noise in the picture (frame) is highly dependent on the frequency of the noise. The second is that in a picture sequence every picture usually has a lot in common with the preceding picture. In a picture the large objects result in low spatial frequencies, whereas small objects result in high spatial frequencies. The noise detected by human vision is mostly at low spatial frequencies. The data may be compressed by sending information describing only the difference between one picture and the next, and raising the noise where it cannot be detected, thus shortening the length of data words.
A video sequence contains a significant amount of statistical and subjective redundancy within and between pictures that can be reduced by data compression technique to make its size smaller. For still pictures (as in JPEG format), an intra-frame or spatial redundancy is used, which treats each picture individually, without reference to any other picture. In Intra-coding the main step is to perform a spatial frequency analysis of the image, using a known technique of Discrete Cosine Transform (DCT). DCT converts input pixels into a form in which the redundancy can be identified. The picture is broken up into rectangular areas called macroblocks and converted a macroblock at a time. In most video coding formats the reduction of spatial redundancy is reached by intra prediction, which means that the texture of the current macroblock is predicted from a reconstructed texture of one or more previously coded macroblocks, and the transform is applied to the macroblock texture prediction residual. Other kinds of the texture transform also may be used instead of DCT.
For moving pictures, the mode of Inter-coding is known to be used to exploit the redundancy between pictures, which gives a higher compression factor than the Intra-coding. The “difference” picture is produced by subtracting every pixel in one picture from a corresponding prediction pixel derived from the previously coded pictures. After that the transform is applied to the blocks of the difference picture. All major prior video coding standards use a transform block size of 8×8.
The new H.264 video coding standard provides the possibility of using two types of texture transforms in High Profile Coding. In the H.264 standard a picture is divided into macroblocks, each covering an area of 16×16 samples of the luma component, which represents brightness, and 8×8 samples of each of the two chroma components, which represent the extent to which the color deviates from gray to blue and from gray to red. The macroblock 16×16 texture prediction residual for the luma component may be either divided into 16 blocks of the size 4×4, which are the subject of 4×4 texture transform, or divided into 4 blocks of the size 8×8, for which 8×8 transform is applied.
For many application settings it can be important for the efficient video compression in H.264 standard to choose the optimal transform size for the macroblocks where both types of texture transform can be used. However, a simple selection criteria like texture prediction residual SAD (sum of absolute differences) or Hadamard transform appear to be hardly applicable for this purpose.
Many prior art practitioners consider that the only known method which is adequate for making a transform size decision is a method based on a full rate-distortion optimization. However, this method requires full block texture processing (including the texture transform, quantization of 256 transform coefficients, their de-quantization and inverse transform) for both transform types. Moreover, for both transform types the calculation of the number of entropy coding bits is required. This makes this method, as well as most or possible even all other known conventional rate-distortion optimization methods, quite slow and complicated, which calls for a new efficient algorithm for making a decision for an optimal transform size in H.264 standard.
The most computationally complex part of H.264 video encoding is the procedure of the choice of optimal macroblock type (intra 4×4, intra 16×16, inter), and the parameters related to the type chosen. Hereafter it is assumed for simplicity that the choice is done between intra 4×4, intra 16×16 and inter macroblock types. In most H.264 encoders the following scheme is used for this decision-making procedure. At the first stage the best parameters for each of three possible macroblock types are derived. Namely, a calculation is made for the optimal motion data for Inter macroblock type, the optimal Intra 16×16 prediction mode for Intra 16×16 type and 16 optimal intra 4×4 prediction modes for Intra 4×4 type. After that some quality function (such as SAD or rate-distortion Lagrangian value) is calculated for the optimal collection of the parameters for each of the three possible macroblock types. Finally, the macroblock type providing the best value of the quality function is chosen.
Since a large variety of fast and efficient motion estimation methods are known, the calculation of the optimal prediction modes for Intra 4×4 macroblock type becomes the most highly computational part of the decision-making method described above. Indeed, one should select one of the nine Intra 4×4 prediction modes for each of sixteen 4×4 sub-blocks. Therefore, the decision-making procedure would be much faster and simpler if the calculation of the optimal parameters for Intra 4×4 macroblock type could be omitted at least for a part of the picture macroblocks. Of course, this should be done without reducing the compression efficiency.
Many of the motion estimation algorithms used in video encoding are based on a best matching block search. That is, the current texture block is being compared with some set of reference picture “candidate” texture blocks of the same size. The reference picture block from this set that provides the best correspondence to the current texture block is considered as the reference block, and the spatial offset between the current block position and the reference block position is considered as a motion vector. Thus, two algorithmic components define completely such motion estimation methods. The first component is the algorithm for generating a set of the candidate reference blocks spatial positions. The second component is the method for correspondence measurement of the texture blocks. The most conventional cost function for texture blocks correspondence measurement is the weighted sum of the block texture SAD (Sum of Absolute Differences) of the corresponding pixel values of the blocks, and the motion vector cost.
However, in some situations the cost functions described above provide absolutely inadequate texture prediction and cause notable visual artifacts for low bit-rate video coding. For example, if the pictures contain a nonlinearly moving object on a slowly varying low-complexity background, then sometimes the typical residual artifacts could be noted along the object trajectory, which is caused by improper motion compensation. That is why the choice of the appropriate cost function appears to be highly important for providing the sufficient visual quality of the low bit-rate video sequences.
A quantization parameter is used for determining the quantization of transform coefficients in H.264 standard. There is a need to optimize the calculation of macroblock quantization parameters in order to provide a more adequate and efficient encoding in terms of video compression efficiency and visual quality ratio.
The idea of pre-processing the sequence of frames using the pixels of current and previous frames was repeatedly treated in the prior art. However, most processing algorithms suffer from very high complexity.
A typical traditional approach is to carry out each transform sequentially for the whole video frame. Each High Definition (HD) video frame, however, occupies about 3 MB of memory. Therefore, such an approach will lead to extensive data exchanges between the processor and external memory. Furthermore, similar post-processing transforms should be done on the decoder side as well. Thus, pre- and post-processing appears to be the most resource-consuming operations for HD video.
Many film-originated videos have exhibit a so called film-grain effect. Film grain results from the physical granularity of the photographic emulsion. Preservation of the grainy appearance of film is often desirable for artistic reasons. On the other hand, film grain has very high entropy and is not very compressible using traditional compression schemes, such as MPEG2 or H.264 standard. Moreover, compression without removing film-grain partially removes and distorts that film-grain effect. A better approach might be to remove the film-grain from the original video before encoding, and to later emulate similar film-grain effects on the decoder side, using some predefined model and film-grain parameters. The H.264 standard defines some optional messages to pass such parameters on to the decoder. However, there is a need to meaningfully calculate such parameters on the encoder side in the first place.
In view of the increasing use of real-time and close-to-real-time video compression and the arrival of the H.264/AVC standard that improves the quality of real-time video communications, there is a need for new effective algorithms applicable to video encoders that are compatible with the new improved H.264/AVC standard.