In video coding, an original image is converted into a digital format by sampling in space and time, and by quantizing in brightness or color. The digitally formatted image comprises a sequence of image frames. Each image frame has an array of pixels, each pixel having a value corresponding to the brightness or color of the image at the point represented by the pixel. A sequence of such image frames provides a video image.
For a typical image frame comprising 1024×1024 pixels, each pixel having a value between 0 and 255, results in an image frame comprising 1 megabyte of data per image. For typical television systems, 25 images per second are displayed. Thus, the data rate for such video images is 25 megabytes per second. Such a data rate would prohibit the transmission of digital video images over most communication systems since it would require much, if not all or more, of the communication systems available data bandwidth. Consequently, the transmission of digital video images would either be prohibitively expensive or not possible, due to the data rate which the communication system needs to support in order to transmit real time video images.
However, it is well-known to encode image frames in order to reduce the amount of data necessary to represent a particular image. Examples of intra-frame coding are Run Length Coding (RLC) in which a series of identical message elements are transmitted by way of a code representing the element and the number of successive occurrences. Another form of coding is termed Variable Length Coding (VLC), sometimes known as entropy coding. This form of coding is a bit-rate reduction method based on the fact that the probability of occurrences of an element generated by a source encoded in n-bits is not the same for all elements amongst the 2n different possibilities. Thus, it is advantageous to encode the most frequently used elements with less than n bits and less frequent elements with more bits, resulting in an average length that is less than the fixed length of n bits. A particular well-known method for Variable Length Coding is known as Huffmann coding.
Yet another form of coding which can be applied to images is the Discrete Cosine Transform (DCT). The Discrete Cosine Transform is a particular case of the Fourier Transform applied to discrete or sample signals which decomposes a periodic signal into a series of sine and cosine harmonic functions. The signal can then be represented by a series of coefficients of each of these functions. The image frame is a sampled bi-directional signal and has bi-dimensional DCT (horizontal and vertical) directions which transform the brightness (luminance) or color (chrominance) values of a group of pixels into another group or matrix of coefficients representing the amplitude of each of the cosine harmonic functions corresponding to each pixel. A feature of DCT coding is that the energy of a block or group of pixels is concentrated in a relatively small number of coefficients situated in the top left hand corner of the block of pixels. Additionally, these coefficients are typically decorrelated from each other. Due to the psycho-physiological aspects of human vision, i.e. a reduced sensitivity to high spatial frequencies, it is possible to eliminate coefficient values below a certain threshold function of frequency without any perceptible degradation of picture quality. The eliminated values are replaced by 0. The remaining coefficients are quantized.
In typical coding systems, DCT coding will be followed by run length coding or variable length coding in order to further compress the data necessary to represent the image.
The foregoing coding techniques are known as intra-coding techniques since the spatial content is encoded image frame-by-image frame. However, it has been observed that there is typically very little change in content between two successive frames. That is to say, the temporal correlation between two successive frames is high. The high temporal correlation between two successive frames may be utilized to reduce the amount of information needed to represent an image since only the difference between two successive frames is needed. Such coding dramatically reduces the amount of information necessary to represent an image frame and, consequently, the data rate necessary to support communication of a video image. By utilizing such coding techniques, the transmission of digital video images over many communications systems is feasible since the data rate is significantly reduced. Such a scheme can be further improved to reduce the information necessary to represent the image if the changes between two successive image frames could be predicted. Such an enhanced scheme would result in only parameters which describe predicted changes from a previous to a current frame being necessary to represent a current image. This would result in a huge reduction in information for representing the image and, consequently, result in even easier transmission of the images over communication systems. coding However, it is not possible to predict spatial content in a current frame that did not exist in a previous frame, for example previously hidden background or views of a rotating three-dimensional object may emerge in the current frame that were not visible in the previous frame. Thus, there will be a difference between a predicted current frame and the true current frame. This difference information is necessary to properly represent the current image frame. This difference is known as the prediction error. Most of the changes between two successive frames are typically caused by object or camera motion. These changes can be predicted (or estimated) by comparing the two frames and determining from which location in a previous frame a pixel has moved to in a current frame. The motion of this pixel can then be described by a motion vector. The motion vector and prediction error are all that is needed to characterize the difference between the current frame and the previous frame for that pixel. Thus, temporal correlation is exploited by estimating the motion of pixels, thereby reducing the amount of information required to encode a video image, yet maintaining a similar visual quality to that achieved by intra-coding.