The present invention relates to video coding, and more particularly to scalable video coding in which motion estimation and compensation can be optimized as a function of a desired bit rate range, frame rate and resolution.
Three dimensional sub-band wavelet coding has been proposed as an efficient scalable video coding (SVC) technique, the development of which as described by J. Ohm in “Three Dimensional Sub-band Coding with Motion Compensation,” IEEE Trans. on Image Processing, Vol. 3, No. 9, pp 559-571, September 1994. In such a scheme, four types of redundancy are removed: temporal, spatial, perceptual and statistical.
Temporal redundancy can be removed by performing an open look based motion compensation, whereby the reference frames for the motion compensation are the original ones instead of the reconstructed ones in the existing standards such as in MPEG 1/2/4 compression standards, and H.263/4. The open loop based motion compensation technique is referred to herein as “motion compensation temporal filtering,” or alternatively, MCTF, the development of which as described by J. Ohm in the aforementioned reference. A further refinement of MCTF was described by S. Choi and J. Woods in “Motion Compensated 3-D Sub-band Coding of Video,” IEEE Trans. on Image Processing. Vol. 8, No. 2, pp. 155-167, February 1999. In this later work, MCTF was improved by making the direction of the motion estimation the same as that of the motion compensation. In this technique, several rounds of MCTF are performed to provide the desired temporal scalability and remove the unnecessary temporal redundancy. During each MCTF round, high and low sub-band coefficients are generated for each motion compensated pair using a rate distortion optimization with the utilization of a Lagrangian multiplier (λ), where λ corresponds to a bit rate range and a trade-off between the motion information and residual data. The trade-off between the quantity of transmitted motion information and residual data is an important feature in the scalable video systems, whereby a large λ corresponds to a low bit rate and small quantity of transmitted motion information, and a small λ corresponds to a high bit rate and large quantity of transmitted motion information. Generally, the optimal point of the SVC system is the point where the first residual image is generated for each motion compensation pair, and usually only one such point exists.
Once all of the necessary MCTF operations have been performed, a spatial transformation is typically performed on each sub-band to remove spatial redundancy. Most typically, the spatial transform used in such an operation is either a discrete cosine transformation (DCT) or a discrete wavelet transform (DWT).
Perceptual redundancy is typically removed by quantizing frequency domain residual data, usually through the use of a quantization matrix. The quantization matrix is designed according to an important feature of the human visual system (HVS), i.e., the human eyes are more sensitive to low frequency components and less to high frequency components. Accordingly, a small element is chosen for the residual data at low frequency while a large one is chosen for the high frequency data. The quantization process is typically lossy, and the SNR scalability is achieved through proper selection of quantization steps at different transmission bit rates.
Upon the removal of the temporal, spatial and perceptual redundancy, residual data and motion information are generated for entropy coding, which is used to remove statistical redundancy. In this process, short symbols are used to represent values which occurring more frequently, and long symbols for values occurring less so. Variable length coding and arithmetic coding are exemplary coding types used in the process.
Conventional scalable video coding systems typically employ small Lagrangian multipliers λ in order to obtain the optimal system performance at the highest bit rate. However, the visual quality at lower bit rate streams in such systems is relatively poor, as the truncated bits contain too much motion information without sufficient residual data. An attempt to improve the coding efficiency at lower bit rates was described by H. Hang, S. Tsai, and T. Chiang in “Motion Information Scalability for MC-EZBC: Response to Call for Evidence of Scalable Video Coding,” ISO/IEC JTC1/Sc29/WG11, MPEG200/m9756, July, 2003 Tronheim. Therein, the motion information in the temporal decomposition is divided into a base and an enhancement layers. The base layer is composed of motion information form 64×64 to 16×16, while the enhancement layer consists of information form 8×8 to 4×4, wherein only the motion information from the base layer is sent to the decoder at the low bit rate. However, the residual data is obtained at the high bit rate, and thus a motion mismatch occurs. If the area is relatively smooth, the difference in the motion information obtained will not be too significant. However, if the area is highly textured, the difference in the motion information will result in significant distortion. As a result, coding efficiency may remain low at low bit rate transmissions with the proposed scheme.
Further disadvantageously, the proposed techniques are not optimal from an implementation or commercial point of view, in that they do not take into consideration that different providers will have varying customer compositions and accordingly varying bit rate, frame rate and resolution requirements. Table 1 illustrates an example of this.
TABLE 1Customer Composition of Companies A & BQCIFCIF4CIF7.5 f/s15 f/s60 f/sCompany64 kbs512 kbs2 Mb/sA 2M100K 10KB10K100K1.5M
Assume that Companies A and B having the illustrated customer compositions. Clearly, the optimal operating conditions for Company A are QCIF, 7.5 f/s, and 64 Kb/s, whereas Company B operates at conditions 4CIF, 60 f/s and a bit rate of 2 Mb/s. In such an instance, the conventional video coding systems which are designed for optimal performance at the highest bit rate are not optimal for Company A, as the majority of its customers utilize a lower bit rate service.
Accordingly what is needed is an improved video coding system which can provide optimal performance at arbitrary bit rates, frame rate and resolution.