The present invention relates to methods and apparatus for the compression of image data for efficient transmission or storage and the subsequent expansion and reconstruction to a copy of the image in substantially its original form. The invention relates to transmission (or storage) of data for single images, as well as to images which are undergoing changes with time, as in the exemplary case of television images.
In a number of respects the present invention is an improvement over the invention of my U.S. Pat. No. 4,447,886, dated May 8, 1984, and entitled "Triangle and Pyramid Signal Transforms and Apparatus", the entire disclosure of which is hereby expressly incorporated by reference. However, it should be understood that significant aspects of the present invention are equally applicable as improvements to other transform techniques and are not limited to use in combination with the Triangle and Pyramid Transforms of the above-incorporated U.S. Pat. No. 4,447,886.
Image data is conveniently generated by a television camera or suitable scanner such as to provide an electronic video signal which can be converted to digitized samples for compression signal processing. One general compression process for accommodating a fixed rate transmission channel generally involves two distinct processes: (1) A redundancy reduction process, such as a Transform operation, which generates data in the form of transform domain coefficients; and (2) A Coding process wherein the resulting transform domain data is operated on by such processes as thresholding, truncation, variable length coding representation, and prioritization. A feedback path may optionally exist from a rate buffer back to the Coding operation to provide an adaptive feature to limit the quantity of data to accommodate the fixed transmission channel capacity. The present invention relates to primarily to the second of these two, in particular, to Zonal Coding methods for coding (and later recovering) transform domain coefficients resulting from a Transform operation such as, but not limited to, the Pyramid Transform of the above-incorporated U.S. Pat. No. 4,447,886. The invention relates both to coding of single images and to coding of motion images.
Historically motion images have been compressed for transmission by either one of two general types of compression processes: Intra-frame compression or Inter-frame compression. In an Intra-frame process, all of the compression is accomplished within each single image in a sequence of image frames. In an Inter-frame process, all of the compression is accomplished between successive image frames, taking advantage of the fact that certain areas of a motion image often do not change from one frame to the next. More recently, attempts have been made to combine the two general types of compression processes to obtain the advantages of each and, hopefully, an overall greater compression of motion image data.
Difficulties and limitations arise when both general processes are used in combination. First, the overall compression achieved cannot in general be the multiplication of the compression factors achieved by the two processes individually since the redundancy reduction for Intra-frame compression may occur in areas of a motion image which remain the same over successive images and would also be counted as compression in an Inter-frame process.
Second, a difficulty arises with either Adaptive Coding or Zonal Coding where a Transform which must be blocked into a mosaic of sub-blocks is employed as the basic Intra-frame compression instrument. The Adaptive Coding method essentially more roughly codes all transform coefficients which must be sent to effect a change due to motion in the image so as to accommodate the fixed transmission channel capacity. The Zonal Coding method selects only those lower frequency coefficients which can be accommodated by the fixed transmission channel capacity. In either case the edges of the sub-blocks become visible to the observer of the reconstructed image due to the approximations made to accommodate the fixed and limited transmission channel capacity.
Third, when all of the sub-blocks which require change are not changed as fast as the response time of the eye and only a fraction of those requiring change are actually changed, the motion image exhibits spatial warping and un-natural sticking of some of the blocked areas and not others.
An Intra-frame Transform, such as the Pyramid Transform of the above-incorporated U.S. Pat. No. 4,447,886, avoids the artifact appearance of the sub-block edges since it does not require division into sub-blocks for calculation. Thus either Adaptive Coding or Zonal Coding can be performed with the Pyramid Transform coefficients to accommodate a fixed rate transmission channel without generating objectionable blocking artifacts. Moreover, the action required to make the image data generation rate be equal to the fixed channel rate can be made to apply more uniformly over those portions of the image in motion to preclude the warping appearance caused by changing some blocks but not some of their adjacent neighbors which also require change.
It is well known that a human observer is more tolerant, at times to the point of not noticing, of temporary distortion in imagery caused by either Adaptive or Zonal Coding to achieve a fixed transmission rate if all four of the following conditions are satisfied: (1) the distortion occurs uniformly in all of those portions of the image undergoing motion; (2) the distortion continues for only a short time after the motion ceases; (3) the distortion does not leave trailing pieces of previous images; and (4) the distortion does not cause stationary or apparent artifacts. Previous methods have not been able to simultaneously achieve all four of these conditions at relatively low transmission channel capacities in the range of 20,000 to 200,000 bits per second.
The present invention achieves these desired results in part by prioritizing the transform domain coefficient data to be transmitted in such a way that the image viewed at the point of reconstruction appears as quickly and as naturally as possible and, in the case of motion images, changes as quickly and as naturally as possible. The present invention provides means for Transform coefficient prioritization to achieve the aforementioned desired results using Zonal Coding, while providing for operation on a fixed, relatively low bit-rate transmission channel such as in the exemplary range of 20,000 to 200,000 bits per second given above.
It is believed that the description hereinbelow of the present invention will be best understood if read in the context of the disclosure of the above-incorporated U.S. Pat. No. 4,447,886. While reference should be had to that disclosure for a full description of the details, the following is provided by way of summary as an aid to the understanding of the present invention. For convenience, elements of the disclosure of U.S. Pat. No. 4,447,886 are referred to herein employing the terminology "Pyramid Transform" and "Original Mapping Technique".
The Pyramid Transform involves several defined basis functions which operate on input data points P(i). The basis functions are essentially weighting functions such that terms and coefficients (in the transform domain) calculated in accordance with the basis functions are each a particularly weighted average of the values of a selected plurality of input data points. Successive terms and coefficients generated in accordance with each of the defined basis functions are calculated from successive pluralities of the input data points, with overlap of input data points depending on the particular basis function.
The Pyramid Transform is organized into a plurality N of bands or levels. Band N is the highest, and Band 1 the lowest. The bands or levels are significant for two different reasons: (1) Coefficients are output from the transform process for each band; and (2) In the preferred fast calculation methods the bands represent successive stages of calculation. For forward transformation, calculation begins with the highest band, Band N, and works down. For inverse transformation (reconstruction), calculation begins with the lowest band, Band 1, and works up. Results of processing in each band are then employed as inputs for processing in the next lower band until the last band is reached.
A "B-function" is defined which is a weighting function with an envelope shaped as a pyramid for a two-dimensional transform as applies in the case of images. As a simple example of a two-dimensional B-function, the pyramid weighting function is as follows for a three-by-three matrix of input data points to yield a single B-function term:
______________________________________ 1/16 1/8 1/16 1/8 1/4 1/8 1/16 1/8 1/16 ______________________________________
In actual two-dimensional implementation, a stream of B-function terms for Band 1 are generated and output, with overlap of the input data points contributing to each B-function term for Band 1 such that the beginning data point for each ensuing successive B-function term for Band 1 shifts ahead by 2.sup.N input data points in one or the other or both of the two directions of the two-dimensional set of input data points. Each term for Band 1 may be defined as a pyramid-weighted average of the values of a square array of input data points, with each side of the square containing 2.sup.N+1 -1 consecutive input data points.
The disclosed implementation of the Pyramid Transform includes a fast calculation technique wherein the B-function terms for Band N only are calculated from actual input sample points. The B-function terms for all lower bands are calculated from the B-terms from the next higher band. Only those from Band 1 are output.
B-function terms alone are not sufficient for reconstruction, and the Pyramid Transform accordingly has a number of additional functions defined, also as weighted averages of selected input data points, for which coefficients are generated and outputted for each band. The number of input data points contributing to each coefficient is the least for Band N, and increases by powers of two for each successive band below Band N. The predetermined functions are selected so as to enable reconstruction of the values of the input data points as a build-up from Band 1 upwards of linear interpolation between B-function terms for Band 1, with departures from linear interpolation being indicated by non-zero coefficients for Bands 1 through N. The build up process begins with Band 1, and works upward through Band N.
For reconstruction (inverse transform) the processes are reversed, and the original data points are determined through algebraic manipulation.
The details of these additional coefficient-generating functions are not repeated herein, and reference should be had to U.S. Pat. No. 4,447,886 for their complete definitions and examples.
A significant aspect and advantage of the Pyramid Transform is the manner in which zero-valued coefficients are treated for efficient transmission, taking advantage of the probabilities of occurrence relationships between various coefficients which result from the finite length of the basis functions and the layering of coefficients in multiple bands. This treatment involves a mapping technique, as disclosed in U.S. Pat. No. 4,447,886, and referred to herein as the "Original Mapping Technique".
The original mapping technique is based on the following: A local length can be associated with each function. In a certain locality a high level of signal activity may produce non-zero value coefficients in multiple bands where the particular basis functions with non-zero values align with this signal activity. If there are gradual edges in the activity only the lower bands may produce non-zero coefficients; if there are steep edges then the higher bands as well as the lower bands will produce non-zero coefficients. Existence of non-zero lower band coefficients does not carry with it a high probability of higher band coefficients; existence of higher band coefficients does, however, carry with it a high probability of the existence of lower band non-zero coefficients in spatially aligned locations. This aspect for naturally occurring image signals is utilized in minimizing the amount of overhead map data which must accompany the non-zero coefficient data for reconstruction purposes.
This original mapping techniques is described in greater detail hereinbelow with particular reference to FIG. 3, inasmuch as certain aspects of the present invention employ the original mapping technique, as well as augmentations and modifications thereof.