Orthogonal transforms are used in various applications in diverse digital signal processing systems. Orthogonal transforms permit signal processing to be carried out in the frequency domain. The Fast Fourier Transform (FFT) and the Discrete Cosine Transform (DCT), etc. are widely known types of orthogonal transform. An orthogonal transform analyzes, for example, a fragment of a signal in the time domain into frequency components (varying depending upon the orthogonal transform function applied) indicating the spectrum (i.e., the distribution of energy versus frequency) of the original signal fragment in the time domain. By processing in various ways the frequency components (commonly called transform coefficients) resulting from orthogonally transforming the signal fragment, redundancy in the original signal fragment can be reduced. In other words, by orthogonally transforming the original signal fragment, and processing the resulting transform coefficients, the original signal fragment can be represented using fewer bits than were used to represent the original signal fragment. Moreover, by inversely orthogonally transforming the transform coefficients, the original signal fragment in the time domain can be recovered.
Apparatus for compressing a motion picture signal and for expanding a compressed motion picture signal are common examples of digital signal processing systems that use orthogonal transform processing.
It is known that the signal power of signals having a high correlation is concentrated at lower frequencies in the frequency domain. As the concentration of signal power on a specific coordinate axis (e.g., the frequency axis) increases, signal redundancy can be progressively reduced, and the signal can be compressed more efficiently.
Since a motion picture signal is generally highly correlated, both spatially, and in time, orthogonal transform processing can be applied to concentrate the signal power on a specific coordinate axis, and the motion picture signal can be compressed with high efficiency.
Hitherto, an extremely large amount of information has been required to represent a motion picture, using, for example, an NTSC-standard video signal. Because of this, recording a motion picture signal has required a recording medium with a very high storage capacity if the medium is to provide an acceptably-long recording time. Moreover, the information rate at which the motion picture signal is recorded on and reproduced from such a medium has been very high. Physically large magnetic tapes or optical discs have therefore been required to store motion picture signals.
If it is desired to record a motion picture signal on a more compact recording medium with an acceptably-long recording time, signal compression must be applied to the motion picture signal to reduce the amount of information that needs to be stored. In addition, an apparatus must be provided that is capable of expanding the compressed motion picture signal reproduced from the compact recording medium.
To meet the requirements just described, various motion picture signal compression systems have been proposed that exploit the correlation between and within the portions of the motion picture signal representing the pictures constituting the motion picture signal. For example, the motion picture signal compression system proposed by the Moving Picture Experts Group (MPEG) is widely known. Since the MPEG system has been widely described in various printed publications, a detailed explanation of the MPEG system will not be repeated here.
The following description will refer frequently to a "picture." Since the signal processing techniques described herein relate to processing a motion picture signal representing a motion picture, it is to be understood that the word "picture," as generally used herein, refers to the portion of a motion picture signal that represents a picture of the motion picture. Moreover, a motion picture signal can represent a picture of the motion picture as a frame or a field. Unless stated otherwise, a "picture" means a field or a frame.
The MPEG system first determines the differences between the pictures constituting the motion picture signal to reduce the redundancy of the motion picture signal in the time domain. Then, the MPEG system reduces the redundancy of the motion picture signal in the spatial domain by applying orthogonal transform processing to blocks of inter picture differences in the spatial domain. The MPEG system applies discrete cosine transform (DCT) processing as the orthogonal transform processing. By reducing redundancy in both the time and spatial domains, the motion picture is compressed extremely efficiently. The compressed motion picture signal resulting from the compression process just described may then be recorded on a recording medium, or transmitted via a suitable transmission medium.
When the compressed motion picture signal is reproduced from the recording medium, or is received from the transmission medium, the blocks of transform coefficients resulting from the DCT transform are extracted from the compressed motion picture signal. The transform coefficients are processed using an inverse orthogonal transform (an inverse discrete cosine transform (IDCT) in the MPEG system) to recover blocks of inter picture differences in the course of reconstructing the pictures of the original motion picture signal.
An example of the construction of a motion picture signal compressor apparatus based on the MPEG system is shown in FIG. 1. In the compressor shown in FIG. 1, a digital motion picture signal is fed into the block formatting circuit 101, where it is convened from a standard video format, e.g., from the NTSC standard video signal format, into a block format to provide a blocked motion picture signal. In this, each picture of the motion picture signal is divided in the spatial domain, i.e., horizontally and vertically, into macroblocks of, e.g., 16.times.16 pixels. The macroblocks are also sub-divided into blocks of 8.times.8 pixels.
The apparatus shown in FIG. 1 compresses each picture of the motion picture signal block-by-block until all the blocks constituting the picture have been processed. The apparatus then processes another picture of the motion picture signal, which may or may not be the next picture in the sequence of pictures constituting the motion picture. The following description of the apparatus shown in FIG. 1, the compression of one block of pixels in one picture will be described. The block of pixels being compressed is the current picture block, which is a block of the current picture. The blocked motion picture signal is delivered to the motion predictor 102. The motion predictor feeds the current picture, including the current picture block S1, block-by-block to the difference block calculating circuit 103.
When the difference block calculating circuit 103 receives the current picture block from the motion predictor 102, it also receives the matching block S2 corresponding to the current picture block from the motion predictor 102. The matching block is derived from the reconstructed pictures stored in the picture memory block 112 by the predictor 113. The difference block calculating circuit 103 determines the pixel-by-pixel difference between the current picture block S1 and its corresponding matching block S2. The resulting block of differences, the difference block S3, is fed to the orthogonal transform circuit 104.
The orthogonal transform circuit 104, which is normally a discrete cosine transform (DCT) circuit, applies orthogonal transform processing to the difference block S3, and feeds the resulting block of transform coefficients to the quantizer 105. The quantizer 105 quantizes the block of transform coefficients to provide a block of quantized transform coefficients. The variable-length coder 106 subjects the block of quantized transform coefficients from the quantizer 105 to variable-length coding, such as Huffmann coding or run length coding, etc. The resulting block of coded transform coefficients is then fed to, e.g., a digital transmission path, via the output buffer 107.
A control signal indicating the number of bits stored in the output buffer 107 is fed back to the quantizer 105. The quantizer adjusts the quantizing step size in response to the control signal to prevent the output buffer from overflowing or underflowing. Increasing or decreasing the quantizing step size respectively decreases or increases the number of bits fed into the output buffer.
The block of quantized transform coefficients is also delivered from the quantizer 105 to the inverse quantizer 108, which forms pan of the local decoder used in the compressor to derive from the quantized transform coefficients the reconstructed pictures used in the prediction coding. The inverse quantizer 108 inversely quantizes the block of quantized transform coefficients by performing processing complementary to the quantizing processing performed by the quantizer 105. The resulting block of transform coefficients is fed to the inverse orthogonal transform circuit 109, where it is inversely orthogonally transformed by processing complementary to the orthogonal transform processing performed by the orthogonal transform circuit 104. The resulting restored difference block S4 is fed to the adder 110.
The adder 110 also receives the matching block S2 for the current picture block S1 from one of the picture memories in the picture memory group 112 selected by the predictor 113. The adder 110 performs pixel-bypixel addition between the restored difference block S4 from the inverse orthogonal transform circuit 109 and the matching block S2 from the picture memory group 112 to provide the reconstructed picture block S5. The reconstructed picture block is delivered to one of the picture memories 112A to 112D selected by the selector 111, where it is stored.
The reconstructed picture block is stored in the selected picture memory, where it forms one block (corresponding to the current picture block) of the reconstructed picture being reconstructed, block-by-block, from reconstructed picture blocks in the selected picture memory. When complete, the reconstructed picture will be used for deriving matching blocks for performing prediction coding to compress other pictures of the motion picture signal.
The motion predictor 102 determines, for each macroblock of the current picture, a motion vector between the macroblock of the current pictures and different macroblocks of the other pictures of the motion picture signal stored therein. The motion predictor also generates a sum of the absolute values of the differences (the "difference absolute value sum") between the pixels in each macroblock of the current picture and the different macroblocks of the other pictures. Each difference absolute value sum indicates the degree of matching between each macroblock of the current picture and the macroblocks of the other pictures. The motion predictor feeds each motion vector and its corresponding difference absolute value sum to the prediction mode determining circuit 115.
The prediction mode determining circuit 115 uses the data received from the motion predictor 102 to determine the prediction mode that will be used for prediction coding the current picture relative to one or more other reconstructed pictures. The current picture can be prediction coded using any of the following prediction modes:
(1) Intra picture mode, in which the picture is compressed by itself, without reference to any other pictures. A picture coded in this way is called an I-picture. PA1 (2) Forward prediction mode, in which prediction is carried out with reference to a reconstructed picture occurring earlier in the motion picture. A picture coded in this way is called a P-picture. PA1 (3) Bidirectional prediction mode, in which block-by-block prediction is carried out with reference to a reference block derived from a reconstructed picture occurring earlier in the motion picture, a reconstructed picture occurring later in the motion picture, or by performing a pixel-bypixel linear operation (e.g., an average value calculation) between an earlier reconstructed picture and a later reconstructed picture. A picture coded in this way is called a B-picture. PA1 Type (1): Errors resulting from insufficient operational accuracy. PA1 Type (2): Errors resulting from systematic differences in rounding.
In other words, an I-picture is a picture in which intra picture coding is completed within the picture. A P-picture is predicted from a reconstructed I-picture or P-picture occurring earlier in the motion picture. A B-picture is predicted block-by-block using an earlier or a later reconstructed I-picture or P-picture or using a block obtained by performing a linear operation using a reconstructed I-picture or P-picture occurring earlier in the motion picture and a reconstructed I-picture or P-picture occurring later in the motion picture.
The prediction mode determining circuit 115 delivers the prediction mode and the corresponding motion vector to the predictor 113 and to the readout address generator 114. The readout address generator 114 provides readout addresses to the picture memory group 112 in response to the motion vector, which causes each picture memory 112A through 112D to read out a block of the reconstructed picture stored therein. The location of the read out block in the reconstructed picture is designated by the motion vector. The predictor 113 selects one of the read out blocks from the picture memories 112A to 112D in response to the prediction mode signal PM received from the prediction mode determining circuit 115. The selected read out block provides the matching block S2 for the current picture block S1. When the current picture block is part of a B-picture, the predictor also performs linear operations on the read out blocks from the picture memories 112A though 112D to provide the required matching block. The predictor delivers the matching block S2 to the difference block calculating circuit 103 and the adder 110.
An example of the construction of a compressed motion picture signal expander apparatus based on the MPEG system is shown in FIG. 2. In this, the compressed motion picture signal obtained directly from the compressor or by reproducing it from a recording medium is fed as a bit stream into the input buffer 121, where it is temporarily stored. The compressed digital signal includes blocks of coded transform coefficients (including a block of coded transform coefficients representing the current picture block), and prediction mode information, quantizing step-size information, and a motion vector for each block.
The compressed motion picture signal is read out of the input buffer 121 one picture at a time, and is delivered to the inverse variable-length coder (IVLC) 122. The inverse variable-length coder 122 applies inverse variable length coding to the compressed motion picture signal, and separates the compressed motion picture signal into its components, including blocks of quantized transform coefficients, and prediction mode information, step-size information, and a motion vector for each block.
Each block of coded transform coefficients is fed into the inverse quantizer 123, which uses the step-size information for the block to inversely quantize the block of quantized transform coefficients to provide a block of transform coefficients. The inverse orthogonal transform circuit 124 applies inverse orthogonal transform processing, normally IDCT processing, to the block of transform coefficients to derive a restored difference block. The inverse quantizer 123 and the inverse orthogonal transform circuit 124 respectively apply processing complementary to that applied by the quantizer 105 and orthogonal transform circuit 104 in the compressor shown in FIG. 1.
The readout address generator 130 provides a readout address to the picture memories 128A to 128D in response to the motion vector for the current picture block received from the inverse variable-length coder 122. In response to the readout address, each of the picture memories 128A to 128D reads out a block of the reconstructed picture stored therein. The predictor 129 selects one of the read out blocks from the picture memories 128A to 128D in response to the prediction mode signal PM, also received from the inverse variable-length coder 122. The selected read out block provides the matching block for reconstructing the current picture block. When the current picture block is part of a picture coded as a B-picture, the predictor also performs linear operations on the read out blocks from the picture memories 112A though 112D to provide the matching block. The predictor 129 delivers the matching block to the adder 125.
The adder 125 performs a pixel-by-pixel addition between the restored difference block from the inverse transform circuit 124 and the matching block from the predictor 129 to reconstruct the current picture block of the current picture. The selector 126 feeds the reconstructed current picture block for storage in the one of the picture memories 128A through 128D in which the current picture is being reconstructed. The reconstructed current picture block is stored in the selected picture memory in the position of the current picture block in the reconstructed current picture. When all the reconstructed blocks of the current picture have been stored in the selected picture memory 128A to 128D, the reconstructed current picture is ready for reading out, and also for use as a reference picture for reconstructing other pictures occurring earlier or later in the motion picture.
The reconstructed pictures stored in the picture memories 128A to 128D are read out as the output motion picture signal via the selector 126 in response to readout addresses generated by the display address generator 127. A scan converter (not shown) converts the output motion picture signal read out from the picture memories 128A through 128D to the raster format of the desired video signal format, e.g., NTSC. The resulting output motion picture signal can then be displayed on a suitable display, e.g., a CRT, etc. In this example, the sync. signal generator 131 is locked to an external sync. source, and periodically generates a frame sync. signal for delivery to the display address generator 127. The display address generator 127 the generates readout addresses in synchronism with the frame sync. signal.
The orthogonal transform circuits, for example, the DCT and the IDCT circuits used in the compressor and the expander described above, respectively perform arithmetic operations on pixel values and transform coefficients represented by integers having a finite number of bits. Thus, the orthogonal transform operations performed by the orthogonal transform circuits can result in a truncation of the number of bits. For this reason, a difference in the accuracy of the orthogonal transform operation using real numbers, or a difference in the configuration of the circuit used to perform the orthogonal transform operation, can change the result of the orthogonal transform operation. This can lead to a mismatch between the compressor and the expander, and to mismatches between expanders expanding a common compressed signal.
For example, in the compressor, the difference block derived from the motion picture signal is orthogonally transformed, and predetermined processing is applied to quantize the resulting transform coefficients in the course of generating the compressed motion picture signal. Then, in the expander, or in the local decoder in the compressor, if the real number operational accuracy or the configuration of the inverse orthogonal transform circuit does not correspond to that of the compressor, then it is possible the output of the expander will differ from the input to the compressor. Hence, the output of the expander can depend on the accuracy and the configuration of the apparatus used for the expander.
The operational accuracy or the configuration of an inverse orthogonal transform may vary depending upon the apparatus used to perform the inverse orthogonal transform. For example, inversely transforming a block of transform coefficients using two different constructions of the same type of inverse orthogonal transform circuit may produce different results. Such a difference in the results is called an inverse orthogonal transform mismatch error (a "mismatch error").
The MPEG system defines the operational accuracy with which the DCT and the IDCT are to be performed, but does not define the operational method and the configuration. This is because circuits and methods for performing DCTs and IDCTs were developed before the MPEG standards were established.
In the MPEG system, as described above, the compressor implements, e.g., inter picture motion-compensated prediction coding to the motion picture signal. In this, the motion picture signal is divided into blocks, a difference block is derived from the current picture block and a matching block obtained by applying motion compensation to a reconstructed picture, the difference block is orthogonally transformed using DCT processing, the resulting transform coefficients are quantized, the quantized transform coefficients are subject to variable-length coding, and the coded transform coefficients are assembled together with prediction mode information, quantizing step size information, and motion vectors to provide the compressed motion picture signal.
The expander applies inverse variable-length coding to the coded transform coefficients, inverse quantizing to the quantized transform coefficients resulting from the inverse variable-length coding, and IDCT processing to the transform coefficients resulting from the inverse quantizing. The resulting restored difference block is added to a matching block obtained by applying motion compensation to a reconstructed picture in response to the motion vector. The resulting reconstructed picture block is stored as a block of a reconstructed picture, which provides a picture of the motion picture output signal, and also is available for use as a reference picture.
The compressor includes a local decoder that derives, from the quantized transform coefficients, reconstructed pictures for use in carrying out the prediction coding. The local decoder includes an inverse quantizer and an inverse orthogonal transform circuit.
If the configuration of IDCT circuit in the local decoder in the compressor is different from that of the IDCT circuit in the expander, there are instances in which the reconstructed pictures produced by the local decoder in the compressor are different from the reconstructed pictures produced by the expander. The dependency of the IDCT processing on implementation can cause problems when the compressed motion picture signal generated by a compressor conforming with the MPEG standard is recorded on a recording medium, such as an optical disc, etc., for distribution to the public. When the compressed motion picture signal reproduced from the optical disc is expanded by expanders manufactured and sold by different makers, the reproduced picture may be different from the original picture. Moreover, the differences may depend upon the actual expander used. Similar incompatibilities between different expanders may also occur when the compressed motion picture signal is distributed by a distribution system such as terrestrial or satellite broadcasting, telephone system, ISDN system, cable, wireless, or optical distribution system, etc.
Mismatch errors are particularly problematical when inter picture prediction coding is carried out. Inter picture prediction coding can be interfield coding or inter-frame coding. Inter picture prediction coding can cause mismatch errors to accumulate to the extent that they result in fatal flaws in the reconstructed pictures.
In the motion picture signal compression performed by the MPEG system, each video sequence is divided into Groups of Pictures (GOPs) of, for example, eight or twelve pictures. Each picture is classified as an I-picture, a P-picture, and a B-picture, as described above.
A B-picture is not used as a reference picture in performing motion prediction. Hence, a mismatch error occurring in a B-picture does not lead to errors in other pictures.
When a mismatch error occurs in a P-picture, the picture with the mismatch error is stored in the picture memory for use in carrying out prediction coding. Accordingly, when inter picture prediction coding is carried out, the error in the P-picture stored in the picture memory gradually spreads to the P-pictures and B-pictures derived from it by prediction coding. The error accumulates until the picture is replaced by an I picture or a P-picture lacking such an error.
Similarly, when a mismatch error occurs in an I-picture, the reconstructed picture with the mismatch error is stored in the picture memory for use in carrying out prediction coding. Accordingly, when inter-picture prediction coding is carried out, the error in the I-picture stored in the picture memory spreads to the P-pictures and B-pictures derived from it by prediction coding. The error accumulates until the picture is replaced by a new I-picture lacking such an error.
Error accumulation is illustrated in FIG. 3. In FIG. 3, if the mismatch error in decoding an I-picture is EI, and the mismatch error in decoding the P-picture P1 is EP1, the value of the error in the reconstructed P-picture P1 is EI+EP1. Further, when the mismatch error in decoding the P-picture P2 is EP2, the value of the error in the reconstructed P-picture P2 is EI+EP1+EP2. Even if the individual mismatch errors are small, the gradual accumulation of these errors will result in a large error.
Mismatch errors produced by the IDCT processing used in the MPEG decoders in both the compressor and the expander can be classified into two distinct types:
The MPEG standard sets forth a requirement for operational accuracy. However, this requirement is not so stringent that it can guarantee that a mismatch error will not occur. Therefore, a Type (1) mismatch error can occur between IDCT devices whose operational accuracy satisfies the MPEG requirement.
The outputs of the IDCT processing are integers. Hence, after the IDCT processing has been performed using real numbers, the processing results must be rounded. In general, the processing results are rounded to the nearest integer. However, a problem occurs when the processing result is *0.5, where * is any integer. The MPEG standard does not define how a processing result of *0.5 should be rounded. Some IDCT devices round *0.5 up, and other IDCT devices round *0.5 down. Further, there are instances in which rounding up or rounding down depends on the sign of the processing result. Mismatch errors resulting from the systematic rounding errors just described are Type (2) mismatch errors.
Type (1) mismatch errors differ from Type (2) mismatch errors in that Type (1) errors normally occur randomly, whereas Type (2) errors are systematic. Because Type (1) errors normally are random, positive errors and negative errors occur with roughly equal probability. Hence, when prediction coding is carried out over a long time, it can be assumed that Type (1) mismatch errors will usually cancel out.
On the other hand, since Type (2) mismatch errors are systematic, and are inherent in the IDCT processing itself, such errors consistently have the same polarity. Accordingly, when prediction coding is carried out over a long time, the mismatch error will be cumulative in one direction. Although each Type (2) mismatch error is only +1 or -1, if many mismatch errors accumulate in one direction, the cumulative mismatch error will be large. In U.S. patent application Ser. No. 08/202,783, the application of which this application is a continuation-in-part, the disclosure of which is incorporated herein by reference, the inventor describes methods and apparatus for preventing Type (2) errors.
Even though Type (1) mismatch errors may occur quite frequently, they normally cancel out over time, and so are usually unproblematical. However, in some instances, the two (or more) blocks of DCT coefficients derived from a picture block located in the same position in two (or more) consecutively-processed P-pictures, or in one (or more) consecutively-processed P-picture following an I-picture, can be identical. If a Type (1) mismatch error occurs when each identical block of DCT coefficients is inversely orthogonally transformed, the resulting Type (1) mismatch errors are not random, but are cumulative in the second picture (and in subsequent pictures). The accumulated Type (1) mismatch errors make the reconstructed picture generated by the decoders in both the compressor and the expander different from the original picture in the motion picture signal. This degrades the picture quality that can be provided by the MPEG system.