Video codecs are employed to convert initial video sequence (a set of video images, also named pictures, or frames) into encoded bitstream (a set of compressed video sequence binary data), and also converting video sequence binary data produced by a video codec system into a reconstructed video sequence (a decoded set of video images, or reconstructed frames). Hereinafter, the terms “frame” and “picture” are assumed to be identical. It is known that video compression relies on two basic assumptions. The first is that human sensitivity to noise in the picture (frame) is highly dependent on the frequency of the noise. The second is that in a picture sequence every picture has a lot in common with the preceding picture. In a picture large objects result in low spatial frequencies, whereas small objects result in high spatial frequencies. The noise detected by human vision is mostly at low spatial frequencies. The data may be compressed by sending only the difference between one picture and the next, and raising the noise where it cannot be detected, thus shortening the length of data words. Video sequence contains a significant amount of statistical and subjective redundancy within and between pictures that can be reduced by data compression technique to make its size smaller. For still pictures (as in JPEG format), an intra-frame or spatial redundancy is used, which treats each picture individually, without reference to any other picture. In intra-coding the main step is to perform a spatial frequency analyses of the image, using a known technique of Discrete Cosine Transform (DCT). DCT converts input pixels into a form in which the redundancy can be identified. The frame is broken up into rectangular areas called macroblocks and converted a macroblock block at a time. A typical two-dimensional 2D-block is 8×8 pixels. The 2D-DCT converts the block into a block of 64 coefficients. A coefficient is a number which describes the amount of a particular spatial frequency which is present. The coefficients then zig-zag scanned, weighted and run-length coded.
For moving pictures, the mode of inter-coding is known to be used to exploit redundancy between pictures, which gives a higher compression factor than the intra-coding. The “difference” picture is produced by subtracting every pixel in one picture from a pixel in the same position in the next picture. The difference picture may be then compressed using intra-coding with DCT.
In the case of significant movement between the pictures resulting in large differences, it is known to use motion compensation (MC), which allows a higher compression factor. According to the known MC technique, at the coder, successive pictures are compared and the shift of an area from one picture to the next is measured to produce motion vectors. The codec attempts to model the object in the new picture from the previous picture using motion vectors. Each macroblock has its own motion vector which applies to the whole block. The vector from the previous picture is coded and vector differences are sent. Any discrepancies are eliminated by comparing the model with the actual picture. The codec sends the motion vectors and the discrepancies. The decoder does the inverse process shifting the previous picture by the vectors and adding the discrepancies to produce the next picture. The quality of a reconstructed video sequence is measured as a total deviation of it's pixels from the initial video sequence. The increased use of real-time digital video communication applications, such as video conferencing and video telephony presents an ever increasing demand in high video quality.
In view of the increasing use of real-time and close to real time video compression and arrival of a new standard improving quality of the real time video communication, there is a need for new effective algorithms applicable to different types of video codecs, which can be used in the video encoders complying with ITU-T Recommendation H.264, also known as MPEG-4 Part 10, or AVC (ISO/IEC 14496-100), etc.
Most of known block-based video coding systems such as MPEG-4 or ITU-T H.264, use coding algorithms with the common steps of dividing each video frame into blocks of pixels (pels); predicting the block pixels using “inter” prediction, or “intra” prediction technique; transforming texture prediction error blocks; predicting the motion vectors and calculating the motion vector prediction differences; and coding texture prediction error quantized transform coefficients, motion vectors prediction differences, intra prediction types and the auxiliary frame data.
The idea of motion pictures sequence pre-processing using the pixels of current and previous frames was repeatedly treated in the prior art. However, most of such algorithms suffer either from possible over-smoothing due to application of the spatial filters together with temporal ones or from very high complexity. The advantages of the proposed method are: relatively low complexity (depending mostly on the motion estimation, and the way of smoothing the blocks edges) and efficient denoising while preserving good original image details (especially for high noise).
The possibility of creation the error resilient streams is also highly important for the industrial codecs used in broadcasting, streaming and the other applications operating in the error-prone environment. One of the universal classes of the error resilient streams creation methods are intra update methods (also called intra refresh methods). These methods are based on inserting some extra INTRA macroblocks inside the inter-coded frames. These INTRA macroblocks should use for the texture prediction only the reconstructed texture of the previously coded macroblocks of the current frame, which are INTRA coded as well. Thus, all the INTRA macroblocks of each frame will be decoded correctly even if the texture of the previously decoded frames is lost or corrupted. There are several conventional approaches to the INTRA update method.
The simplest approach is to insert the INTRA macroblocks in random positions of the frame with the probability corresponding to the expected loss rate. According to the other approach, the INTRA macroblocks are being inserted into the current frame according to the pre-specified geometric scheme changing from frame to frame by some pre-specified rule. The main drawback of such methods is that they lead to enormous bitrate growth.
Another class of the INTRA update scheme is highly dependent on the current frame texture and motion. In these methods the INTRA macroblocks are being inserted either in the areas of highest activity, determined by the average motion vectors magnitude or using loss-aware rate-distortion optimization scheme under the assumption that the current macroblock may be lost with the given probability. Such methods are described, for example, in the following papers:
Yao Wang, Stephan Wenger, Jiangtao Wen, and Aggelos K. Katsaggelos, “Review of Error Resilient Coding Techniques for Real-Time Video Communication”, IEEE Signal Processing Magazine, vol. 17, no. 4, pp. 61-82, July 2000;
R. Zhang, S. L. Regunathan and K. Rose, “Video Coding with Optimal Inter/Intra Mode Switching for Packet Loss Resilience,” IEEE Journal on Selected Areas in Communications, Special Issue on Error-Resilient Image and Video Transmission. pp. 966-976, vol. 18, no. 6, June 2000; and
Minyoung Kim, Hyunok Oh, Nikil Dutt, Alex Nicolau, Nalini Venkatasubramanian, “PBPAIR: An Energy-efficient Error-resilient Encoding Using Probability Based Power Aware Intra Refresh”, ACM SIGMOBILE Mob. Comput. Commun. Rev. 10(3): 58-69, 2006.
The drawback of these schemes is that they do not take into account that high potential reconstruction error caused by the loss of the current macroblock or the previous frame texture will necessarily increase the reconstruction error of the next frames inter macroblocks, which refer to the current macroblock.
In most encoders, which deal with different motion compensation block sizes, a separate motion estimation procedure is used for each block size. This increases the complexity of the motion estimation algorithm and could present a problem in providing efficient interconnections between the motion vectors used in texture blocks of different sizes.
The new H.264 Standard improved the accuracy of the motion vector calculation using a quarter-pel-accurate motion compensation form. However, during motion estimation and motion compensation a quite complicated interpolation procedure is needed for calculating the pixel values with non-integer coordinates. In order to provide an adequate motion estimation using known methods, it is necessary either to store in memory a 4-times-zoomed frame, or to perform a non-integer pixel interpolation during the motion estimation. Both methods have their disadvantages. In the first case a memory storage required for reference frames is increased by 16 times. The second method increases the algorithm computational complexity and leads to an additional CPU load.
The upcoming high-performance systems will integrate tens of multithreaded processor cores on a single chip, resulting in hundreds of concurrent threads sharing system resources. Proposed new modular video encoding and decoding design fit much better to such multi-core systems. It is based on splitting all coding operations into separate “tasks”. Such architecture makes possible to load big number of cores even for one-slice-per-picture coding. Another benefit of the proposed design is high flexibility of system integration. One can easily construct any required system (encoder, decoder, transcoder, etc.) from appropriate set of modules. Moreover, such encoder system assembling can be done dynamically, depending on available resources to control load balancing for maximum encoding quality.