This invention relates to a data compression technique which is useful in the field of video teleconferencing and more particularly to an improved motion compensated image sequence compression technique.
In video teleconferencing, one desires to obtain the best possible picture quality at the receiver and to transmit the minimum amount of information necessary to provide the picture. Thus, there exists a tradeoff between picture quality and signal bandwidth. A number of approaches aimed at reducing the amount of information which must be transmitted have been developed. Among the most well known of these data compression methods or techniques are the spatial domain techniques, the transform domain techniques and the motion compensated techniques.
Spatial domain techniques are those which exploit the inherent spatial redundancy in television images to predict the intensity of a picture element based upon the intensity of previously transmitted picture elements. These were the first approaches to image compression, and many of the algorithms were subsequently prototyped in hardware.
Video teleconferencing algorithms which utilize linear transformations such as the Fourier, Hadamard, slant, or cosine are known as transform domain techniques. These transforms differ in the amount of data compression and computational complexity. However, they all pack most of the signal energy into a few coefficients which allows more data compression with less picture quality degradation than with spatial domain coders. However, as great as the data compression of these algorithms is, equally as great is the difficulty of implementing them in real-time as shown by the paucity of existing hardware.
Motion-compensation techniques predict the frame-to-frame (or field-to-field) motion of a pixel (picture element) and then access the intensity values from the previous frame (or field). The assumption is that predicting the motion and accessing the intensity values from the previous frame (or field) results in a better prediction of the intensity values than trying to predict the intensity values directly. It has been shown that in general motion-compensation techniques improve the predictions of the intensity values in the images.
There have been two basic approaches to motion compensation: block-matching and pel-recursion. In block-matching a block of intensity values in a frame is compared with blocks of intensity values in the previous frame until a best-match is determined. From this an interframe displacement vector (how much the block has moved between frames) for the whole block can be estimated for the frame being transmitted. Poor estimates result if all pixels in the block do not move the same way. Using a pel-recursive approach on the other hand, a displacement is determined for each pel value. This technique allows for a more exact estimation of the intensity value and has the ability to handle scale changes (zooming, movement perpendicular to the image plane).
In both block-matching and pel-recursion the prediction can be backward or forward, i.e., the displacement can be determined from previously transmitted information only (backward) or from past values and the current value (forward). Forward prediction requires explicit transmission of information about the displacement value; while backward, on the other hand, does not. The advantage of the forward prediction technique is that the presumably better estimate of the displacement vector reduces the error in the intensity prediction. The majority of the previously developed approaches have used backward prediction which yields the benefits of reduced bit rates, lower computational requirements and faster prediction/estimation techniques.
Although motion-compensation techniques have existed for over 10 years, there is significant room for improvement. It is an object of the present invention to provide an improved motion prediction technique in which the total prediction error is decreased and the resulting picture quality is thus improved.
The video image is comprised of a series of images which appear in sequence on a display device, such as a cathode ray tube (CRT). The instantaneous image, at any given point in time or "frame" is a matrix of picture elements (pels). A matrix containing 262,144 pels is typical. The goal in motion prediction techniques, including the particular technique to be presently described, is to predict which pel intensity values will change in the next frame by an amount greater than a fixed predetermined threshold amount, to determine what that intensity value will be, and to transmit only the predicted difference to the receiving end. This is generally accomplished by recursive updating techniques on a pel-by-pel basis.
The basic pel-recursive technique and algorithm for estimating the displacement of a moving object in an image sequence are described in A. N. Netravali and J. D. Robbins, "Motion Compensated Television Coding, Part I," BSTJ, Vol. 58, No. 3, pp. 631-670 March 1979 and representative systems employing this technique and algorithm are described in U.S. Pat. Nos. 4,218,703; 4,218,704 and 4,278,996.
In the development of the basic pel-recursive displacement estimation technique, the intensity values within a frame are represented by I(z,t), where z is a two-dimensional spatial vector and t is the frame at time t. If an object moves with purely translational motion, then for some d, where d is the two-dimensional spatial translation displacement vector of the object point during the time interval [t-1,t], EQU I(z,t)=I(z-d,t-1).
A function called the displaced frame difference (DFD) may be defined as follows: EQU DFD(z,d.sup.i)=I(z,t)-I(z-d.sup.i,t-1),
where d.sup.i is an estimate of the displacement vector. The DFD converges to zero as d.sup.i converges to the actual displacement, d, of the object point. Thus what is sought is an iterative algorithm of the form EQU d.sup.i+1 =d.sup.i +update term,
where for each step, the update term seeks to improve the estimate of d. The ultimate goal is minimization of the magnitude of the prediction error, DFD. This can be accomplished by minimization techniques such as a steepest descent or gradient method.
The basic pel-recursive motion compensated prediction technique generally consists of the following sequence of operations or steps:
(1) calculating an initial displacement estimate d.sup.0 of an object point at the current pel,
(2) generating a predicted intensity value I(z,t) for the object point at the current pel by accessing the intensity value of the object point at a displaced location in the previous frame I(z-d.sup.0, t-1),
(3) calculating the difference between the actual intensity of the object point at the current pel and said predicted intensity, and
(4) correcting the initial displacement estimate of the object point at the current pel, if necessary.
With regard to step 1, there have been two predominant methods of displacement estimation: spatial and temporal. Most systems, such as the Netravali and Robbins technique described in the aforementioned publication and patents for example, use a spatially adjacent displacement vector as an initial estimate. Others predict the displacement along the temporal axis. The present invention uses a third approach: project the displacement estimation forward along the motion trajectory (PAMT). This would require a minimal increase in computation and memory over the temporal projection procedure.