This invention relates to the encoding and decoding of interlace scanned picture signals using predictive encoding and DCT transformation, and to a recording medium on which the encoded signals are recorded.
The Moving Picture Experts Group has proposed a standard for high efficiency encoding of progressively scanned picture signals and recording of the same on digital storage media in its Committee Draft MPEG 90/176/Revision 2 (1990), subsequently published as ISO/IEC 11172-2 (1992). The high efficiency is achieved through reduction of temporal and spatial redundancy in a picture.
Storage media intended for use with such encoded picture signals, such as an optical compact disc (CD), a digital audio tape (DAT) or a magnetic disk, have a continuous transfer rate of up to 1.5 Mbit/sec. A storage medium may be directly connected to a decoder or it may be connected thereto via a transmission medium such as a computer bus, local area network (LAN) or telecommunications line.
The 1990 MPEG draft standard contemplates the implementation of special functions beyond normal forward playback, such as random accessing, high speed playback, and reverse playback.
Temporal redundancy between successive pictures is reduced by predictive encoding, wherein corrections are applied to a previously encoded picture to obtain a current picture, that is, predictive encoding avoids the need to transmit a picture in its entirety. More specifically, motion compensation vectors are applied to a previous picture to obtain a predicted picture, which is subtracted from the current picture to provide differential data. The current picture is represented by the motion compensation vectors and differential data. This technique is very efficient, that is, permits representation of a picture with a substantially reduced amount of data, for a picture having little motion relative to a previous picture.
As shown in FIG. 1, three types of pictures may exist in a sequence of pictures.
An intra coded picture (I picture) is coded without reference to other pictures. An I picture permits random access of a sequence of pictures, but cannot be efficiently coded.
A predictive coded picture (P picture) is coded by predicting forward in time from a previously encoded I picture or P picture. A P picture is used as a reference for further prediction, and can be efficiently coded.
A bidirectionally coded picture (B picture) is coded using one or both of a temporally preceding (past) picture and a temporally succeeding (future) picture as reference pictures. B pictures are never used as references for prediction, but can be compressed with extreme efficiency.
A decodable sequence of pictures includes at least one I picture and a variable number of P and B pictures. One or more B pictures may be located temporally between two P pictures, or between an I picture and a P picture. When these pictures are encoded for transmission or recording, their sequence is changed from a temporal sequence to an encoded sequence, so that the decoder will have decoded the one or more pictures (I or P pictures) from which a current picture (P or B picture) is predicted before decoding of the current picture commences. The decoder returns the decoded pictures to their original temporal sequence, and presents the thus decoded sequence for display.
Spatial redundancy within a picture is reduced by an orthogonal transformation, such as a discrete cosine transformation (DCT), of a portion of the picture from the time domain into the frequency domain. A block of pixel data from the picture having a dimension of, for example, 8 pixels width.times.8 rows height, representing luminance or chrominance amplitudes at the respective pixels, is converted by DCT transformation into a block of 8.times.8 frequency coefficients, which is scanned in a predetermined zigzag manner from low frequency to high frequency to provide a sequence of 64 coefficients representing the amounts of respective frequencies contained in the block. The first coefficient is referred to as the DC coefficient, while the other 63 coefficients are referred to as the AC or high frequency coefficients. A pixel block representing a solid portion of an image corresponds to a DC coefficient indicating the amplitude of the solid portion, and no high frequency coefficients. A pixel block representing a highly detailed image portion corresponds to coefficient data with many non-zero AC values.
A picture of a natural scene tends to be smooth, that is, to lack highly detailed image portions. Consequently, the pixel blocks of such a picture correspond to DCT coefficients lacking AC coefficients, that is, having runs of zero data. These runs of zero data are variable length coded by representing them as a run-length number indicating how many zeros are in the run. The run-length value is further encoded using a Huffman code.
At the decoder, the encoded signal is variable length decoded (inverse variable length coded), and then inverse DCT transformed to recover the original pixel data for the picture.
Applying the techniques of predictive encoding and orthogonal transformation to a picture sequence removes significant amounts of temporal and spatial redundancy from the picture sequence and results in a highly efficiently encoded representation of the picture sequence.
The 1990 MPEG draft standard is concerned with processing pictures on a frame by frame basis, and assumes that each frame is progressively scanned. In progressive scanning, the rows of pixels in a frame are scanned from top to bottom. During display, the pixels are presented in this same order.
In interlace scanning, first the odd-numbered rows of pixels forming an odd field in a frame are scanned, then the even-numbered rows of pixels forming an even field in the frame are scanned. During display, the odd field is displayed and then the even field is displayed such that its rows are interlaced with the rows in the odd field.
If motion is represented in a sequence of interlaced scanned pictures, each frame exhibits comb-like deformation. FIG. 2 shows an image of a car moving from the left side of the picture to the right side of the picture. When the odd field is scanned, the car is in one position. By the time that the even field is scanned, the car has advanced towards the right. During display of the interlaced scanned fields of a frame representing this picture, the edges represented by the even field are shifted with respect to the edges represented by the odd field, causing the edges of an object to appear jagged. The comb deformation may be particularly seen in a vertical edge, such as the front of the car.
An interlace scanned picture having comb deformation due to motion cannot be efficiently encoded using the technique proposed in the 1990 MPEG draft standard due to the large amount of data needed to represent the moving (jagged) edges in the picture.
If this interlace scanned picture is considered as two fields which are separately encoded, the resulting signal is also encoded with low efficiency due to the inefficiency of representing stationary portions of the image with field by field encoding.
Thus, there is no known way to encode a picture having stationary portions and moving portions with high efficiency.