1. Field of the Invention
The present invention can be used for a Moving Picture Experts Group (MPEG) moving picture compression/decompression application of motion prediction/compensation based on a discrete cosine transform and minimizes degradation of image quality.
This invention relates to a digital copy protection application of moving picture data which is seldom removed intentionally or unintentionally by a user except an author, and more particularly, relates to an MPEG2 moving picture encoder/decoder.
2. Description of the Related Art
The MPEG2 standard is a compression/decompression standard for video applications, and exploits temporal redundancy for motion compensated interpolated and predicted encoding. That is, the assumption is made that "locally" the current picture can be modeled as a translation of the picture at a previous and/or future time. "Locally" means that the amplitude and direction of the displacement are not the same everywhere in the picture.
The MPEG2 standard specifies predicted and interpolated interframe encoding and spatial domain intraframe encoding. It has block based motion compensation for the reduction of temporal redundancy, and a block based Discrete Cosine Transform based on compression for the reduction of spatial redundancy. The information relative to motion is based on a 16.times.16 array of pixels and is transmitted with the spatial information. Motion information is compressed with variable length codes, such as Huffman codes.
Recently, audio/video information expressed as digital information is becoming more widely used and a method utilizing digital products also has been increasing accordingly as digitalization of the A/V application and popularity of the Internet have been increasing rapidly.
Specifically anybody who is able to use a PC can copy/edit digital products easily and accordingly a social issue of illegal copying has been raised. A watermark technique has become prominent as a solution to prevent this problem.
There are two exemplary methods for providing copy protection of digital A/V data to prevent illegal copying. The first method is encryption, i.e., a copy protection method by scrambling the digital information. The second method is a digital watermark method with the purpose of preventing the illegal use of digital information.
The first method is a technique for prevention of illegal copying of digital A/V data, by providing descramble information and password information capable of accessing and running the A/V product only in the case that the A/V product is bought legally.
The second method is a technique which utilizes self-restraint by a user to not produce an illegal copy of the A/V product by embedding ID information or a logo in a state of noise into A/V contents data of the A/V product for the purpose of forbidding the illegal or commercial use of digital information. The watermark technique is used on the original image and is invisible to a person who would copy it, but the author can prove that the copied image is his by virtue of an arbitrary reverse processing.
For example, in a case where a counterfeiter forges money using a color copier, a vignette on the original bill turns clear by copying the bill, and consequently, it makes it virtually impossible to copy a bank note. This is called a visible watermark.
Also, in the case where a spy writes a message onto paper with salt water, other people think this is ordinary paper, but this paper is a medium having important information for the spy. The spy can see the message anytime he wants to see by heating the paper. In this case, we call it an invisible watermark.
At present, the watermark technique is used for digital still images or audio, i.e., putting the message distinguishable from the original image into the image. Therefore, in case that an author's own image circulates illegally, the image can be proved to be that of author by performing an arbitrary reverse processing.
Thus, techniques for preventing the illegal copying of digital product are increasingly being studied these days.
FIGS. 1 and 2 illustrate a conventional MPEG2 moving picture encoder and decoder respectively. FIG. 3 illustrates a structure of a video picture used in the MPEG2 moving picture encoder/decoder and, FIG. 4 illustrates three types of pictures and their relationship under the MPEG2 standard. We will explain the conventional MPEG2 moving picture encoder and decoder by referring to these figures.
MPEG encoding is a hybrid type lossy coding technique wherein the redundancy information which the image signals have in the spatial domain and the temporal domain are removed and the data are compressed (refer to FIG. 3). At this time, the compression technique of spatial domain is called intra-coding and the image data used in intra-coding are called intra-picture (in short, I picture). The compression technique of temporal domain is called inter-coding and in this case, the image data are classified by two according to two prediction methods. The first one is a predicted picture (P picture) wherein prediction error rate of a forward direction is encoded, and the second one is an interpolated picture or bi-directional picture (B picture) wherein a prediction error rate of the bi-directional direction is encoded.
In other words, the I picture is encoded independently of other near pictures (in this instance, the picture is a frame signal or a field signal). In the P picture, the difference signals of predicted/interpolated movement are encoded only after considering the correlation of the movement of the previous I picture or P picture. In the B picture, the difference signals of predicted/interpolated movement are coded only after considering the correlation of the movement of the previous I or P picture and the next I or P picture.
Among the three modes, that is, the forward direction mode, backward direction mode, and forward and backward direction mode, the mode having the smallest value of prediction error rate is selected in the prediction/interpolation method of the B picture.
The picture structure of the MPEG recommendation (called a main profile, main level; MP@ML) is I, B, B, P, B, B, P . . . pictures and needs a frame memory 110 which can store at least three pictures (the cycle of a picture).
A field/frame adaptive coding method is possible in an MPEG2 encoding method so as to increase coding efficiency. The unit of the picture can be defined as a field or frame according to the purpose of the encoding. A frame/field memory 112 stores the field data or frame data to be coded.
A subtractor 134 receives the field data or frame data from the frame/field memory 112 and interpolated predicted motion data from an adaptive estimator 130, to perform a subtraction of local decoded I or P pictures, wherein the motion prediction is interpolated to encode the prediction error rate of the P and B pictures, and the pictures which are now input. A Discrete Cosine Transformn (DCT) 114 performs an orthogonal transform which transforms image signals from the subtractor 134 spatially structured into image signals of the frequency domain. A quantizer 116 approximates signals to a typical value to map the DCT-transformed image signals to a code book which is defined in a variable length coder (VLC). Data loss occurs in the quantizer 116.
A dequantizer 122 performs an inverse process of the quantizer 116 for encoding the prediction error rate of P and B pictures. An inverse DCT (IDCT) 124 performs an inverse process of the DCT 114 for encoding the prediction error rate of the P and B pictures. A subtractor 126 performs a subtraction operation on the output from the IDCT 124 and the interpolated predicted output from the adaptive estimator 130. A frame memory 128 stores local decoding images output from the subtractor 126 according to the dequantizer 122 and the IDCT 124.
A motion estimator 132 encodes the prediction error rate of the P and B pictures output from the frame/field memory 112 and the adaptive estimator 130 is a motion compensator which interpolates predicted motion, providing its output to the subtractors 126 and 134.
An activity calculator 118 reflects the characteristics of the complexity of the input images to the quantizer 116 and a rate controller 120 sets up the quantizer 116 so that an overflow/underflow of an output buffer 138 does not happen. A VLC & MUX (variable length coder and multiplexer) 136 entropy encodes and multiplexes the signals output from the rate controller 120, the quantizer 116 and the motion estimator 130. The output buffer 138 provides a buffer for the MPEG coded bit stream output from the VLC & MUX 136.
FIG. 2 shows the conventional MPEG2 moving picture decoder. A buffer 150 stores the coded bit stream. A VLD (variable length decoder) & DEMUX 152 performs an inverse process of the VLC & MUX 136 (of FIG. 1) to decode the MPEG coded bit stream. A dequantizer 154 dequantizes the coded MPEG coded bit stream output from the VLD & DEMUX 152, and an IDCT 156 performs an inverse process of DCT of the output of the dequantizer 154. Adder 160 adds the output of the IDCT 156 with an output of a multiplexer (MUX) 170. A previous picture store 162 is a memory for motion compensation of the P or B picture output from the adder 160. A future picture store 164 is a memory for motion compensation of the P picture output from the adder 160. An adder 166 performs an addition of the outputs of the previous picture store 162 and the future picture store 164 when the motion prediction of the B picture is bi-directional. A 1/2 multiplier 168 multiplies the interpolation of average values when the motion prediction of the B picture is bi-directional, and the MUX 170 multiplexes the outputs of the previous picture store 162, the 1/2 multiplier 168, the future picture storer and a "0" bit.
One picture can be divided into uniformly sized regular square areas and each area is transformed. Therefore, the image is divided into image ingredients of different frequencies from an average value (DC value) to an image ingredient value of an extremely high frequency. This division process is called an orthogonal transformation and the orthogonal transformation is a discrete cosine transform (DCT).
Orthogonal transformations, because they have a frequency domain interpretation, are filter bank oriented. This means that the purpose of the DCT is to reduce the correlation of the image information. Since each DCT-transformed coefficient indicates individual frequency information, the correlation of adjacent coefficients is rare. The discrete cosine transform is also localized. That is, an encoding process illustrates samples on an 8.times.8 spatial window which is sufficient to compute 64 transform coefficients or sub-bands.
Another advantage of the discrete cosine transform is that fast encoding and decoding algorithms are available. Additionally, the sub-band decomposition of the discrete cosine transform is sufficiently well behaved to allow effective use of psychovisual criteria.
After the discrete cosine transform, many of the higher frequency coefficients are zero. These coefficients are organized into a zigzag, as illustrated in FIG. 5, and converted into run-amplitude (run-level) pairs. Each pair indicates the number of zero coefficients. This is coded in a variable length code.
Discrete cosine transform encoding is carried out in the three stages as illustrated in FIG. 5. The first stage is the computation of the discrete cosine transform coefficients. The second stage is the quantization of the coefficients. The third stage is the conversion of the quantized transformation coefficients into run-amplitude pairs after reorganization of the data into a zigzag scanning order.
Quantization can be viewed as a shift to the right by several bits. Quantization enables a very high degree of compression, and a high output bit rate, and retains high picture quality. Quantization can be adaptive with an I picture having fine quantization to avoid "blockiness" in the reconstructed image. This is important because I pictures contain energy at all frequencies. By way of contrast, P and B pictures contain predominately high frequency energy and can be coded at a coarser quantization.
One challenge facing decoder designers is the accommodation of a single decoder system to a variety of display output formats, while complying fully with luminance/chrominance relationships and the MPEG2 standard. The displayed output of the decoder chip must conform to Consultative Committee International Radio (CCIR) recommendation 601. This specifies the number of luminance and chrominance pixels in a single active line, and also how the chrominance pixels are subsampled relative to the luminance signals.
The format defined as 4:2:2 is supported in most cases in industry. This defines 720 active luminance signals, and 360 color differentiated signals, where each line of luminance signals has a corresponding line of chrominance signals. CCIR recommendation 656 goes on to define the number of active lines for National Television System Committee (NTSC) and Phase Alternation by Line (PAL) environments as 480 and 576, respectively. The contents as noted above are disclosed in U.S. Pat. No. 5,668,599.
The MPEG2 moving picture encoder 100 performs an encoding method by utilizing the cooperation of an intracoding method on the spatial domain and an interceding method on the temporal domain. The MPEG2 moving picture encoder 100 performs the intracoding method on the spatial domain by compressing the original image into a variable length coding of a Huffman code through the DCT 114 and the quantizer 116 and transmits the variable length code.
The MPEG2 moving picture encoder 100 performs the intercoding method on the temporal domain by decompressing the I picture compressed on the spatial domain through the dequantizer 122 and the Inverse Discrete Cosine Transform (IDCT) 124, and predicts by comparing the compressed I picture with the image being input at present through the frame memory 128 and the adaptation estimator 130, and then encodes a difference signal with the original signal by compensating motion, i.e., spatial-shifting the image being input at present as much as the predicted motion.
In the case that a method predicting motion is forward prediction, we call it the P picture and in case that a method predicting motion contains all of forward and backward predictions, we call it the B picture. Accordingly, motion prediction and compensation of P and B images are affected by the picture accuracy coded as the I picture. So, in the decoding process decoding the encoded image, first, the I picture must be decoded exactly so that the P and B images, to which the difference signals are transmitted, can be decoded accurately.
But even through the use of the copy preventing technique by encryption and scrambling as noted above, it is possible to easily copy data when duplication and key data are known. And also the watermark technique for moving picture data has some problems which can reduce encoding efficiency by embedding ID information and a logo in the form of noise.
The picture structure of an MPEG2 moving picture encoding method, as illustrated in FIG. 4, includes an intraframe (I picture) reducing spatial redundancy information of image information, a predicted frame (P picture) reducing interrelation between frames through forward prediction, and an interpolated frame (B picture) reducing between frames through bi-direction prediction.
Therefore, in the decoding of the image signal, only in the case where the decoded previous I picture exists, can the P picture be decoded perfectly through motion compensation, and only in the case where the decoded I and P pictures are used in B picture prediction in the encoding process, can the B picture be decoded through motion decoding.
Up to now, digital watermark information discrete-cosine-transformed in the form of noise is embedded into an original image and an I picture codes this digital watermark information.
And the case of predicting motion of the P and B pictures is performed according to encoding locally the I picture. Consequently there is a problem of an error while estimating motion of the P and B pictures by the mixed watermark information.
Because the watermark technique for still images, which has started to be studied recently as noted above, includes watermark data on the spatial domain, the watermark technique is not suitable for an MPEG encoding method compressing data by removing redundancy information in accordance with using interrelation of data on the spatial domain and the temporal domain.
That is, in case the image quality of the image that contains the watermark information deteriorates conspicuously in comparison with the image quality where the watermark information is not contained, the meaning can be lost because the image can deteriorate even though the original object contains watermark information.
Thus, the image containing the watermark information has to appear very similar to the image which does not contain the watermark information.