(1) Field of the Invention
The present invention relates to a method for encoding images according to the preamble of claim 1. The present invention also relates to a device for encoding images according to the preamble of claim 12. Furthermore, the present invention relates to an encoder according to the preamble of claim 23, to a decoder according to the preamble of claim 24, to a codec according to the preamble of claim 25, to a mobile terminal according to the preamble of claim 26, and a storage medium for storing a software program according to the preamble of claim 27.
(2) Description of Related Art Including Information Disclosed Under 37 CFR 1.97 and 1.97
The image can be any digital image, a video image, a TV image, an image generated by a video recorder, a computer animation, a still image, etc, In general, a digital image consists of pixels, are arranged in horizontal and vertical lines, the number of which in a single image is typically tens of thousands. In addition, the information generated for each pixel contains, for instance, luminance information relating to the pixel, typically with a resolution of eight bits, and in colour applications also chrominance information, e.g. a chrominance signal. This chrominance signal generally consists of two components, Cb and Cr, which are both typically transmitted with a resolution of eight bits. On the basis of these luminance and chrominance values, it is possible to form information corresponding to the original pixel on the display device of a receiving video terminal. In this example, the quantity of data to be transmitted for each pixel is 24 bits uncompressed. Thus, the total amount of information for one image amounts to several megabits. In the transmission of a moving image, several images are transmitted per second. For instance in a TV image, 25 images are transmitted per second. Without compression, the quantity of information to be transmitted would amount to tens of megabits per second. However, for example in the Internet data network, the data transmission rate can be in the order of 64 kbits per second, which makes uncompressed real time image transmission via this network practically impossible.
To reduce the amount of information to be transmitted, a number of different compression methods have been developed, such as the JPEG, MPEG and H.263 standards. In the transmission of video, image compression can be performed either as inter-frame compression, intra-frame compression, or a combination of these. In interframe compression, the aim is to eliminate redundant information in successive image frames. Typically, images contain a large amount of non-varying information, for example a motionless background, or slowly changing information, for example when the subject moves slowly. In inter-frame compression, it is also possible to utilize motion compensated prediction, wherein the aim is to detect elements in the image which are moving, wherein motion vector and prediction error information are transmitted instead of transmitting the pixel values.
To enable the use of image compression techniques in real time, the transmitting and receiving video terminal should have a sufficiently high processing speed that it is possible to perform compression and decompression in real time.
In several image compression techniques, an image signal in digital format is subjected to a discrete cosine transform (DCT) before the image signal is transmitted to a transmission path or stored in a storage means. Using a DCT, it is possible to calculate the frequency spectrum of a periodic signal, i.e. to perform a transformation from the time domain to the frequency domain. In this context, the word discrete indicates that separate pixels instead of continuous functions are processed in the transformation. In a digital image signal, neighbouring pixels typically have a substantial spatial correlation. One feature of the DCT is that the coefficients established as a result of the DCT are practically uncorrelated; hence, the DCT conducts the transformation of the image signal from the time domain to the (spatial) frequency domain in an efficient manner, reducing the redundancy of the image data. As such, use of transform coding is an effective way of reducing redundancy in both inter-frame and intra-frame coding.
Current block-based coding methods used in still image coding and video coding for independently coded key frames (intra-frames) use a block-based approach. In general, an image is divided into N×M blocks that are coded independently using some kind of transform coding. Pure block-based coding only reduces the inter-pixel correlation within a particular block, without considering the inter-block correlation of pixels. Therefore, pure block-based coding produces rather high bit rates even when using transform-based coding, such as DCT coding, which has very efficient energy packing properties for highly correlated data. Therefore, current digital image coding standards exploit certain methods that also reduce the correlation of pixel values between blocks.
Current digital image coding methods perform prediction in the transform domain, i.e. they try to predict the DCT coefficients of a block currently being coded using the previous coded blocks and are thus coupled with the compression method. Typically a DCT coefficient that corresponds to the average pixel value within an image block is predicted using the same DCT coefficient from the previous coded block. The difference between the actual and predicted coefficient is sent to decoder. However, this scheme can predict only the average pixel value, and it is not very efficient.
Prediction of DCT coefficients can also be performed using spatially neighbouring blocks. For example, a DCT coefficient that corresponds to the average pixel value within a block is predicted using the DCT coefficient(s) from a block to the left or above the current block being coded. DCT coefficients that correspond to horizontal frequencies (i.e. vertical edges) can be predicted from the block above the current block and coefficients that correspond to vertical frequencies (i.e. horizontal edges) can be predicted from the block situated to the left. Similar to the previous method, differences between the actual and predicted coefficients are coded and sent to the decoder. This approach allows prediction of horizontal and vertical edges that run through several blocks.
In MPEG-2 compression, the DCT is performed in blocks using a block size of 8×8 pixels. The luminance level is transformed using full spatial resolution, while both chrominance signals are subsampled. For example, a field of 16×16 pixels is subsampled into a field of 8×8 pixels. The differences in the block sizes are primarily due to the fact that the eye does not discern changes in chrominance equally well as changes in luminance, wherein a field of 2×2 pixels is encoded with the same chrominance value.
The MPEG-2 standard defines three frame types: an I-frame (Intra), a P-frame (Predicted), and a B-frame (Bi-directional). An I-frame is generated solely on the basis of information contained in the image itself, wherein at the receiving end, an I-frame can be used to form the entire image. A P-frame is typically formed on the basis of the closest preceding I-frame or P-frame, wherein at the receiving stage the preceding I-frame or P-frame is correspondingly used together with the received P-frame. In the composition of P-frames, for instance motion compensation is used to compress the quantity of information. B-frames are formed on the basis of a preceding I-frame and a following P- or I-frame. Correspondingly, at the receiving stage it is not possible to compose the B-frame until the preceding and following frames have been received. Furthermore, at the transmission stage the order of the P- and B-frames is changed, wherein the P-frame following the B-frame is received first. This tends to accelerate reconstruction of the image in the receiver.
Intra-frame coding schemes used in prior art solutions are inefficient, wherein transmission of intra-coded frames is bandwidth-excessive. This limits the usage of independently coded key frames in low bit rate digital image coding applications.
The present invention addresses the problem of how to further reduce redundant information in image data and to produce more efficient coding of image data, by introducing a spatial prediction scheme involving the prediction of pixel values, that offers a possibility for prediction from several directions. This allows efficient prediction of edges with different orientations, resulting in considerable savings in bit rate. The method according to the invention also uses context-dependent selection of suitable prediction methods, which provides further savings in bit rate.
The invention introduces a method for performing spatial prediction of pixel values within an image. The technical description of this document introduces a method and system for spatial prediction that can be used for block-based still image coding and for intra-frame coding in block-based video coders Key elements of the invention are the use of multiple prediction methods and the context-dependent selection and signalling of the selected prediction method. The use of multiple prediction methods and the context-dependent selection and signalling of the prediction methods allow substantial savings in bit rate to be achieved compared with prior art solutions.
It is an object of the present invention to improve encoding and decoding of digital images such that higher encoding efficiency can be achieved and the bit rate of the encoded digital image can be further reduced.
According to the present invention, this object is achieved by an encoder for performing spatially predicted encoding of image data.