Full motion digital image sequences in typical video applications require the processing of massive amounts of data in order to produce good quality visual images from the point of view of shape, color and motion. Data compression is often used to reduce the amount of data which must be stored and manipulated. A data compression system typically includes modelling sub-systems which are used to provide simple and efficient representations of the large amount of video data.
A number of compression systems have been developed which are well suited for video image compression. These systems can be classified into three main groups according to their operational and modelling characteristics. First, there is the causal global modelling approach. An example of this type of model is a three dimensional (3D) wire frame model which implies spatial controlling position and intensity at a small set of more or less fixed wireframe grid points and interpolates between the grid points. In some applications, this approach is combined with 3D ray tracing of solid objects. This wire frame approach is capable of providing very efficient and compact data representation, since it involves a very deep model, i.e., a significant amount of effort must be invested up front to develop a comprehensive model. Accordingly, this model provides good visual appearance.
However, this approach suffers from several significant disadvantages. First, this causal type model requires detailed a priori (advance) modelling information on 3D characterization, surface texture, lighting characterization and motion behavior. Second, this approach has very limited empirical flexibility in generic encoders, since once the model has been defined, it is difficult to supplement and update it dynamically as new and unexpected images are encountered. Thus, this type of model has limited usefulness in situations requiring dynamic modelling of real time video sequences.
A second type of modelling system is an empirical, updatable compression system which involves very limited model development, but provides relatively inefficient compression. The MPEG 1 and MPEG 2 compatible systems represent such an approach. For example, in the MPEG standard, an image sequence is represented as a sparse set of still image frames, e.g., every tenth frame in a sequence, which are compressed/decompressed in terms of pixel blocks, such as 8.times.8 pixel blocks. The intermediate frames are reconstructed based on the closest decompressed frame, as modified by additional information indicating blockwise changes representing block movement and intensity change patterns. The still image compression/decompression is typically carried out using Discrete Cosine Transforms (DCT), but other approaches such as subband, wavelet or fractal still image coding may be used. Since this approach involves very little modelling depth, long range systematic redundancies in time and space are often ignored so that essentially the same information is stored/transmitted over and over again.
A third type of modelling system is an empirical global modelling of image intensities based on factor analysis. This approach utilizes various techniques, such as principal component analysis, for approximating the intensities of a set of N images by weighted sums of F "factors." Each such factor has a spatial parameter for each pixel and a temporal parameter for each frame. The spatial parameters of each factor are sometimes referred to as "loadings", while the temporal parameters are referred to as "scores". One example of this type of approach is the Karhunen-Loeve expansion of an N.times.M matrix of image intensities (M pixels per frame, N frames) for compression and recognition of human facial images. This is discussed in detail in Kirby, M. and Sirovich, L. "Application of the Karhunten-Loeve Procedure for the Characterization of Human Faces", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 1, pp. 103-108 (1990), and R. C. Gonzales and R. E. Woods, Digital Image Processing, Chapter 3.6 (Addison-Wesley Publ.Co., ISBN 0-201-50803-6, 1992) which are incorporated herein by reference.
In Karhunen-Loeve expansion (also referred to as eigen analysis or principal component analysis, Hotelling transform and singular value decomposition), the product of the loadings and the scores for each consecutive factor minimizes the squared difference between the original and the reconstructed image intensities. Each of the factor loadings has a value for each pixel, and may therefore be referred to as "eigen-pictures"; the corresponding factor score has a value for each frame. It should be noted that the Karhunen-Loeve system utilizes factors in only one domain, i.e., the intensity domain, as opposed to the present invention which utilizes factors in multiple domains, such as intensity, address and probabilistic domains.
Such a compression system is very efficient in certain situations, such as when sets of pixels display interrelated intensity variations in fixed patterns from image to image. For example, if every time that pixels a, b, c become darker, pixels d, e, f become lighter, and vice versa, then all of pixels a, b, c, d, e, f can be effectively modelled by a single factor consisting of an eigen picture intensity loading having positive values for pixels a, b, c and negative values for pixels d, e, f. The group of pixels would then be modelled by a single score number for each image. Other interrelated pixel patterns would also give rise to additional factors.
This type of approach results in visually disruptive errors in the reconstructed image if too few factors are used to represent the original images. Additionally, if the image-to-image variations include large systematic spatial changes, such as moving objects, then the number of eigen pictures required for good visual representation will be correspondingly high. As a result, the compression rate deteriorates significantly, Thus, the Karhunen-Loeve systems of factor modelling of image intensities cannot provide the necessary compression required for video applications.
A fourth approach to video coding is the use of object oriented codecs. This approach focuses on identifying "natural" groups of pixels ("objects") that move and/or change intensity together in a fairly simple and easily compressible manner. More advanced versions of object oriented systems introduce a certain flexibility with respect to shape and intensity of individual objects, e.g., affine shape transformations such as translations, scaling, rotation and shearing, or one factor intensity changes. However, it should be noted that the object oriented approach typically employs only single factors.
In prior art systems, motion is typically approximated by one of two methods. The first of these methods is incremental movement compensation over a short period of time which is essentially a difference coding according to which the difference between pixels in a frame, n, and a previous frame, n-1, are transmitted as a difference image. MPEG is one example of this type of system. This approach allows for relatively simple introduction of new features since they are merely presented as part of the difference image. However, this approach has a significant disadvantage in that dynamic adaptation or learning is very difficult. For example, when an object is moving in an image, there is both a change in location and intensity, making it very difficult to extract any systematic data changes. As a result, even the simplest form of motion requires extensive modelling.
Another approach to incremental movement compensation is texture mapping based on a common reference frame, according to which motion is computed relative to a common reference frame and pixels are moved from the common reference frame to synthesize each new frame. This is the approach typically employed by most wire frame models. The advantage of this approach is that very efficient and compact representation is possible in some cases. However, the significant downside to this approach is that the efficiency is only maintained as long as the moving objects retain their original intensity or texture. Changes in intensity and features are not easily introduced, since existing systems incorporate only one dimensional change models, in either intensity or address.
Accordingly, it is an object of the present invention to provide a method and apparatus for data analysis which provides very efficient and compact data representation without requiring a significant amount of advanced modelling information, but still being able to utilize such information if it does exist.
It is also an object of the present invention to provide a method and apparatus for data analysis having empirical flexibility and capable of dynamic updating based on short and long range systematic redundancies in various domains in the data being analyzed.
It is a further object of the present invention to provide a method and apparatus for data analysis which utilizes factor analysis in multiple domains, such as address and probabalistic domains, in addition to the intensity domain. Additionally, the factor analysis is performed for individual subgroups of data, e.g., for each separate spatial object.
An additional object of the present invention is to provide a method and apparatus for data analysis which uses multiple factors in several domains to model objects. These "soft" models (address, intensity, spectral property, transparency, texture, type and time) are combined with "hard" models in order to allow for more effective learning and modelling of systematic change patterns in input data, such as a video image. Examples of such "hard" modelling are: a) conventional affine motions modelling of moving objects w.r.t. translation, rotation, scaling and shearing (including camera panning and zooming effects), and, b) multiplicative signal correction (MSC) and extensions of this, modelling of mixed multiplicative and additive intensity effects (H. Martens and T. Naes, Multivariate Calibration, pp. 345-350, (John Wiley & Sons, 1989), which is incorporated herein by reference.
A further object of the present invention is the modelling of objects in domains other than the spatial domain, e.g., grouping of local temporal change patterns into temporal objects and grouping of spectral patterns into spectral objects. Thus, in order to avoid undesirable oversimplifying associated with physical objects or object oriented programming, the term "holon" is used instead.
Yet another object of the present invention is the use of change data in the various domains to relate each individual frame to one or more common reference frames, and not to the preceding frame of data.