Field of Invention
The present invention relates in general to coding of image, video and other signals.
Background
It is well known that there is a large amount of redundancy in an image, video and other signal. An image or still image is a two-dimensional (2-D) spatial signal. In a digitized image signal, there is a high level of spatial correlation between the adjacent pixels. In general, the spatial correlation exists in all directions in image. For example, it exists in the horizontal direction between the current pixel and the left adjacent ones as well as between the current pixel and the right adjacent ones, in the vertical direction between the current pixel and the upper adjacent ones as well as between the current pixel and the lower adjacent ones, and in other directions too. This type of correlation is referred as the bilateral or two-sided correlation. Thus, an image signal has bilateral or two-sided spatial correlation.
A video signal is a temporally discrete image sequence, where each image in the sequence is called a video frame or frame. A video signal is three-dimensional (3-D), including 2-D spatial inside each frame and one-dimensional (1-D) temporal between frames. In a digitized video signal, there is a high level of spatial correlation between the spatially adjacent pixels inside each frame as there is in a still image, which is referred as the intra-frame correlation in video coding. As it is in still image, the intra-frame correlation is bilateral or two-sided. Additionally, there is also a high level of temporal correlation between pixels in adjacent frames, which is referred as the inter-frame correlation. In general, the inter-frame correlation exists between the pixels in current frame and ones in past frames as well as between the pixels in current frame and ones in future frames. Thus, the inter-frame correlation is bilateral or two-sided too. Therefore, a video signal has the bilateral or two-sided spatial and temporal correlation.
In an image or video transmission or storage system, in order to improve image or video quality and reduce required transmission bandwidth or storage capacity, it is desirable to remove such redundancy. The predictive coding is a common method to remove the correlation in signals. In the predictive image or video coding, a pixel prediction is generated for the current pixel from its correlated pixels. The pixels used to generate the prediction are called the reference pixels. The generated prediction is subtracted from the original pixel to produce a residual pixel. The residual pixels are uncorrelated or largely uncorrelated depending on the accuracy of the prediction. Thus, the original image or video signal is converted into a residual image or video signal. The residual signal is also called the error signal. This process is called the predictive encoding and is carried out at encoder side. Often after further digital compression, the residual image or video signal is transmitted or stored. At decoder side, the decoding processing generates the predictions and reconstructs the original pixels from the received or retrieved residual image or video signal. This process is called the predictive decoding.
It is normally adopted that the pixel processing takes the raster-scan order, i.e. form the top row to the bottom row in an image and from the leftmost pixel to the rightmost pixel on each row. Given such raster-scan processing order, the predictive image coding is called the spatial causal predictive coding if it generates the prediction for current pixel solely upon the reference pixels on the upper rows and on the left side of the same row. Such prediction is called the spatial causal prediction. Contrarily, the predictive image coding is called the spatial anticausal predictive coding if it generates the prediction for current pixel solely upon the reference pixels on lower rows and on the right side of the same row. Such prediction is called the spatial anticausal prediction. However, the predictive image coding is called the spatial noncausal predictive coding if it generates the prediction for current pixel upon the reference pixels on both the upper rows and the lower rows and/or on both the left side and the right side of same row. Such prediction is called the spatial noncausal prediction. Some example image predictors are shown in FIG. 1. The location of current pixel, marked by x, is (x, y), where the x is the column index, a positive integer starting from 1 which is the leftmost column, and y is the row index, a positive integer starting from 1 which is the topmost row. The current pixel has four nearest adjacent pixels, named A, C, E and G with location (x−1, y), (x, y−1), (x+1, y) and (x, y+1) respectively, and four 2nd nearest adjacent pixels, named B, D, F and H with location (x−1, y−1), (x+1, y−1), (x+1, y+1) and (x−1, y+1) respectively. The image predictor, for example, which generates the prediction for the current pixel upon any one or ones of reference pixel ABCD, which denotes pixel A, B, C and D, is causal predictor. Contrarily, the image predictor, which generates the prediction for the current pixel upon any one or ones of reference pixel EFGH, is anticausal predictor. However, the predictor, which generates the prediction for the current pixel upon AE, CG or ACEG, is noncausal predictor.
Similarly, in the predictive video coding, the predictor, which generates the prediction for the current pixel upon the adjacent reference pixels in same frame is called the intra-frame predictor or intra-predictor. The predictor, which generates the prediction for the current pixel upon the reference pixels in the past and/or the future frames, is called the inter-frame predictor or inter-predictor. Causal, anticausal and noncausal predictors are defined for intra-predictor in the same way as in the predictive image coding. Similarly, Causal, anticausal and noncausal predictors are defined for the inter-predictors if the reference frame(s) include(s) the past frame(s) only, the future frame(s) only, and both the past and future frames respectively.
It is to be noted that there is no interdependence in the causal or anticausal prediction while there is in noncausal prediction. For example in FIG. 1, the causal prediction for current pixel x depends on the reference pixel ABCD while none of the predictions of the pixel ABCD depends on current pixel x in return. Similarly, the anticausal prediction for current pixel x depends on the reference pixel EFGH while none of the predictions of the pixel EFGH depends on the current pixel in return. This allows the causal predictive encoding and decoding to be done in a single and simple forward iterative process from the first pixel to the last pixel. Similarly, it allows the anticausal predictive encoding and decoding to be done in a single and simple backward iterative process from the last pixel back to the first pixel, at some additional cost of required memory and processing delay. However, there is interdependence in noncausal prediction. For example, assume a simple noncausal image predictor, which refers to nearest adjacent reference pixel ACEG in FIG. 1. While the pixel prediction of the current pixel x depends on pixel ACEG, the predictions for its reference pixel ACEG also depend on current pixel x in return. Neither forward nor backward single iterative process can allow the predictive decoding to generate the prediction for current pixel x as the prediction will depend on the undecoded yet unknown pixels as reference. The interdependence behind the noncausal decoding creates a complicated computing problem.
Due to its straightforwardness and low computational cost, the 2-D spatial causal predictive coding is widely adopted to compress still image and the intra-frame of video. DPCM (Differential Pulse Coded Modulation) was invented as a 1-D first-order causal predictive coding [1]. It is expanded to 2-D and adopted in the lossless mode of image coding standard JPEG (JPEG-LS) to compress the still image, where the current pixel x is predicted with reference to pixel ABC as shown in FIG. 1.
In video coding standard H.264, a frame coded with intra-prediction only is called an I-frame. H.264 adopts a causal DPCM-like inter-block prediction to compress blocks in an I-frame, where already coded blocks including the ones on the left, upper left, upper and upper right, are used to predict the current block. Still refer to FIG. 1, in this case the current block is shown as x and its adjacent blocks ABCD are its reference blocks.
As image and video signals have the two-sided correlations, the two-sided noncausal predictive coding inherently generates predictions with higher accuracy, and permits higher performance and better compression than the one-sided causal coding. Efforts have been made to build the noncausal predictors. FIG. 2 shows an example of inter-frame predictive coding scheme in H.264. In H.264, a frame coded with only causal inter-frame predictions from a past frame is called a P-frame, and a frame coded with noncausal predictions from both past and future frames is called a B-frame. Although each B-frame, such as the frame 202, is coded with noncausal inter-frame predictions, and thus achieves the highest compression ratio, its reference frames, such as the frame 201 and 204, have to be either an I-frame or a P-frame without noncausal predictions. Furthermore, each P-frame, such as the frame 204, is coded with only causal predictions from a previous I-frame or P-frame, such as the I-frame 201. Thus, in general, an I-frame has the lowest compression ratio and a P-frame has a compression ratio in between. The above restriction on the inter-frame prediction is carefully chosen to avoid the interdependence of the full noncausal inter-frame prediction. It is a partially noncausal inter-frame prediction with partial benefit achieved.
The prior invention [2] discloses a layered video predictive coding scheme. The base layer adopts the conventional causal prediction and the enhancement layer adopts the noncausal prediction with reference to the already coded pixels in base layer. There is no interdependence in [2]. This is a partially noncausal prediction too as the base layer cannot be coded with noncausal prediction. Further, as the base layer usually has lower resolution or lower SNR depending on the choice of scalability, the accuracy of the noncausal prediction from the pixels in base layer is limited too.
The prior invention [3] discloses a fully noncausal predictive coding for still image. Theoretically, in [3], the two-sided noncausal residual image signal is neither generated nor transmitted. Rather, the encoder in [3] converts the two-sided noncausal residual image signal into an equivalent one-sided representative signal through the potential matrix LU decomposition and matrix inversion. Practically, both encoder and decoder in [3] have prohibitive computational cost. The potential matrix in [3] is large. For example, in the case of the simplest noncausal predictor with reference to pixel ACEG as shown in FIG. 1, the potential matrix in [3] is over 2M×2M for a 1920×1080 sized image, where 2M×2M denotes 2 million rows by 2 million columns. The invention [3] breaks the conversion into a line-by-line iterative process. In each line iteration, the image-width sized matrix decomposition and inversion are involved. For example, the matrix is 1920×1920 for a 1920×1080 sized image, larger than the original image. As those matrices cannot be transmitted to the decoder side, the decoder in [3] needs matrix decomposition and inversion again to convert the one-sided representative signal back to the original image. The decomposition and inversion of large matrix involve prohibitive computational cost.
Therefore, it is desirable to find the fully noncausal prediction and reconstruction methods with affordable computational cost for image and video coding.