Video codecs are employed to convert initial video sequence (a set of video images, also named pictures, or frames) into encoded bitstream (a set of compressed video sequence binary data), and also converting video sequence binary data produced by a video codec system into a reconstructed video sequence (a decoded set of video images, or reconstructed frames). Hereinafter, the terms “frame” and “picture” are assumed to be identical. It is known that video compression relies on two basic assumptions. The first is that human sensitivity to noise in the picture (frame) is highly dependent on the frequency of the noise. The second is that in a picture sequence every picture has a lot in common with the preceding picture. In a picture large objects result in low spatial frequencies, whereas small objects result in high spatial frequencies. The noise detected by human vision is mostly at low spatial frequencies. The data may be compressed by sending only the difference between one picture and the next, and raising the noise where it cannot be detected, thus shortening the length of data words. Video sequence contains a significant amount of statistical and subjective redundancy within and between pictures that can be reduced by data compression technique to make its size smaller. For still pictures (as in JPEG format), an intra-frame or spatial redundancy is used, which treats each picture individually, without reference to any other picture. In intra-coding the main step is to perform a spatial frequency analyses of the image, using a known technique of Discrete Cosine Transform (DCT). DCT converts input pixels into a form in which the redundancy can be identified. The frame is broken up into rectangular areas called macroblocks and converted a macroblock block at a time. A typical two-dimensional 2D-block is 8×8 pixels. The 2D-DCT converts the block into a block of 64 coefficients. A coefficient is a number which describes the amount of a particular spatial frequency which is present. The coefficients then zig-zag scanned, weighted and run-length coded.
For moving pictures, the mode of inter-coding is known to be used to exploit redundancy between pictures, which gives a higher compression factor than the intra-coding. The “difference” picture is produced by subtracting every pixel in one picture from a pixel in the same position in the next picture. The difference picture may be then compressed using intra-coding with DCT.
In the case of significant movement between the pictures resulting in large differences, it is known to use motion compensation (MC), which allows a higher compression factor. According to the known MC technique, at the coder, successive pictures are compared and the shift of an area from one picture to the next is measured to produce motion vectors. The codec attempts to model the object in the new picture from the previous picture using motion vectors. Each macroblock has its own motion vector which applies to the whole block. The vectors from the previous picture is coded and vector differences are sent. Any discrepancies are eliminated by comparing the model with the actual picture. The codec sends the motion vectors and the discrepancies. The decoder does the inverse process shifting the previous picture by the vectors and adding the discrepancies to produce the next picture. The quality of a reconstructed video sequence is measured as a total deviation of it's pixels from the initial video sequence. The increased use of real-time digital video communication applications, such as video conferencing and video telephony presents an ever increasing demand in high video quality.
In view of the increasing use of real-time and close to real time video compression and arrival of a new standard improving quality of the real time video communication, there is a need for new effective algorithms applicable to different types of video codecs, which can be used in the video encoders complying with ITU-T Recommendation H.264, also known as MPEG-4 Part 10, or AVC (ISO/IEC 14496-100), etc.
Most of known block-based video coding systems such as MPEG-4 or ITU-T H.264, use coding algorithms with the common steps of dividing each video frame into blocks of pixels (pels); predicting the block pixels using “inter” prediction, or “intra” prediction technique; transforming texture prediction error blocks and providing quantization of the transform coefficients; predicting the motion vectors and calculating the motion vector prediction differences; and coding texture prediction error quantized transform coefficients, motion vectors prediction differences, intra prediction types and the auxiliary frame data.
In most encoders, which deal with different motion compensation block sizes, a separate motion estimation procedure is used for each block size. This increases the complexity of the motion estimation algorithm and could present a problem in providing efficient interconnections between the motion vectors used in texture blocks of different sizes.
The new H.264 Standard improved the accuracy of motion vector calculation using a quarter-pel-accurate motion compensation form. However, during motion estimation and motion compensation a quite complicated interpolation procedure is needed for calculating the pixel values with non-integer coordinates. In order to provide an adequate motion estimation using known methods, it is necessary either to store in memory a 4-times-zoomed frame, or to perform a non-integer pixel interpolation during the motion estimation. Both methods have their disadvantages. In the first case a memory storage required for reference frames is increased by 16 times. The second method increases the algorithm computational complexity and leads to an additional CPU load.
Thus, there is a need for new methods and algorithms, which reduce the computational complexity of the motion estimation without significant CPU load and employment of additional memory, as well as need for new common motion estimation methods sufficiently improving interconnections of motion vectors for blocks of different size.
An object of the proposed video coding/decoding method is:                to increase quality of a reconstructed video sequence at a given size of compressed video sequence binary data; and        to reduce size of compressed video sequence binary data at a given quality of a reconstructed video sequence.        
An object of the proposed codec is to provide real-time encoding/decoding of video data on a PC platform without an acceleration board.