1. Field of the Invention
The present invention is related to a moving pictures encoding method, and an apparatus that uses same, which are applied to digital picture systems and applications, and picture databases utilized in a variety of fields, such as communications, broadcasting, data storage, and computers, and more particularly to a moving pictures encoding method, and an apparatus that uses same, which are advantageous when encoding picture data containing a scene change.
2. Description of the Related Art
In general, the amount of information comprising a moving picture itself is huge. For this reason, when encoding moving pictures, redundancy along a spatial axis is removed by using an orthogonal transform (discrete cosine transform) process and variable-length encoding. In addition, redundancy along a time axis is removed by finding the difference between the first and last frames, and encoding the difference data.
The constitution of a picture will be described here in order to understand the following explanation.
An interlaced picture, for which every other line of a single frame is scanned, and which constitutes an even field constituting only even-numbered scanning lines, and an odd field constituting only odd-numbered scanning lines, is currently used in TV formats (NTSC: National Television System Committee of the United States, PAL: Phase Alternation by Line of Europe).
MPEG-2 (Moving Pictures Experts Group) is an encoding scheme, which also corresponds to the encoding of interlaced pictures, and performs encoding, which takes frame/field into consideration for picture structure, and inter-picture motion prediction. These will be explained hereinbelow.
With MPEG-2, both frame allocation and field allocation are possible for a picture. When a frame is allocated as a picture, it is called a frame structure, and when a field is allocated, it is called a field structure. Encoding is performed by treating a picture as macro-block (MB) units.
FIG. 26 shows a frame structure macro-block MB1, and a field structure odd field macro-block MB2, and even field macro-block MB3. Each macro-block (MB) constitutes, for example, 16×16 pixels.
With MPEG-2, there is also frame prediction and field prediction, which are well-suited to encoding an interlaced picture, and one motion vector is used for frame prediction, and two motion vectors are used for field prediction.
Here, methods for the above-mentioned reduction of temporal redundancy can be broadly divided into 3 encoding methods in accordance with the scope of inter-picture prediction utilized.
The first is an intra-picture encoding method, wherein encoding is performed within a picture, the second is an inter-picture sequential predictive encoding method, wherein encoding is performed by also using inter-picture forward prediction. In addition, the third is a bi-directional predictive encoding method, wherein encoding is performed using two-directional inter-picture prediction, which combines the forward direction and the backward direction.
A picture by the above-mentioned first encoding method is called an I-picture (Intra-Picture), a picture by the second encoding method is called a P-picture (Predictive-Picture), and a picture by the third encoding method is called a B-picture (Bi-directionally predictive-Picture).
Further, from the standpoint of whether or not each type of picture can be used as an inter-picture predictive reference picture from another picture, an I-picture and a P-picture are referenced, but a B-picture is not referenced. Accordingly, an I-picture and a P-picture collectively are called a reference picture.
Furthermore, because the degree of temporal correlation is great between each picture of moving pictures, more redundancy can be removed from a P-picture, which utilizes a correlation with a forward picture, than from an I-picture. Further, more redundancy can be removed from a B-picture, which utilizes a correlation with a backward picture, than from a P-picture.
That is, viewed from the amount of data of pictures of the same picture quality, it is a relationship wherein an I-picture>P-picture>B-picture. Also, because a reference picture can at the least be used in motion prediction from another picture, and can constitute the original picture from which a predictive frame is generated, it is desirable that a reference picture be of the highest picture quality possible.
In this sense as well, a reference picture>non-reference picture is desirable with regard to amount of data.
However, efficiency cannot be said to be good if encoding is performed using only a P-picture or a B-picture, which have small amounts of data. This is because when an error occurs, the error is propagated temporally with the inter-picture encoding of a P-picture or B-picture.
Consequently, since this causes problems, it is desirable to refresh periodically with an I-picture. Actually, when encoding using a B-picture, it is common to perform encoding by changing the picture type, as shown in FIG. 27.
In FIG. 27, B indicates a B-picture, I indicates an I-picture, and P indicates a P-picture, and the same holds true for the other figures described hereinbelow. Further, an arrow of inter-picture prediction signifies that the predictive frame of the picture at the tip of an arrow is generated using the picture at the base of the arrow.
Incidentally, the size of the Group-of-Pictures (GOP) in the example of FIG. 27 is 12 pictures, that is, I-picture refresh is performed every 12 pictures. And the distance between each reference picture is 3 pictures.
Furthermore, inter-picture prediction is performed in macro-block units, which divide a picture into a plurality of blocks as described above, and determines the difference between a pixel unit and a block, wherein a coded picture macro-block is the same size as a reference picture.
And then, the cumulative sum thereof is treated as a prediction error, the block with the smallest prediction error is selected, and a predictive frame is generated. Next, difference data with an encoded macro-block is encoded. As the difference, the sum of absolute values of a simple pixel difference, the sum of squares of a pixel difference, and the like are utilized.
Here, when an I-picture is put to multiple uses, it is undesirable because the same scene is coded at the same picture quality, thereby increasing the amount of coded data. When encoding is performed at a fixed rate, picture unit data allocation becomes that much smaller, and picture quality deteriorates.
However, when scene changes and other inter-picture correlation is low, prediction efficiency deteriorates when inter-picture prediction is used, and in some cases, picture quality deteriorates even more than when intra-picture encoding is performed.
Therefore, as a measure against such trouble, there has been proposed a method, wherein, even when encoding is performed using an I-picture at a certain fixed interval, when a scene change SC is detected as shown in FIG. 28, the immediately succeeding picture is coded using I-picture intra-picture encoding.
When all is said and done, this method is undesirable because an I-picture, or an intra-picture coded picture is generated each time a scene change occurs, increasing the volume of coded data by that much.
Further, so as to prohibit to the utmost an increase in the frequency of an I-picture, as shown in FIG. 29, when a scene change occurs, the count value of the heretofore I-picture fixed interval cycle is reset. Then, there is a method, wherein GOP is once again reconfigured from the picture at reset time, and encoding is performed using a fixed interval cycle I-picture (for example, GOP size=12).
It might also be possible to have a method, wherein, when a scene change is generated in a B-picture as shown in FIG. 30, intra-picture prediction is held not with the B-picture thereof, but by making the reference picture that comes immediately thereafter in the input sequence either an I-picture or intra-frame encoding, and using inter-picture prediction from preceding and succeeding reference pictures.
However, in the case of the example shown in FIG. 30, if it is assumed that the encoding process is achieved in real-time using hardware, to include input-output apparatus in particular, there is absolutely no way of predicting the cycle at which a reference picture will appear. Therefore, picture memory management becomes more difficult than when a reference picture appears at a fixed cycle.
Further, when viewed from the aspect of data allocation, changing from a P-picture to an I-picture causes much less fluctuation of data allocation than changing from a B-picture to an I-picture, making it less likely that a stream buffer will exhibit an underflow or overflow state. As a result, the method of FIG. 30 is considered desirable.
However, even a control method like that shown in FIG. 30 has problems such as those hereinbelow.
Here, FIG. 31 is a diagram prepared for explaining the problem points in this FIG. 30. To simplify the explanation, nothing is shown other than a B-picture, which is a encoded picture, a forward reference picture, and a backward reference picture.
In general, in the case of a frame structure, in which a frame is encoded as a picture, inter-frame prediction and inter-field prediction are performed as inter-picture prediction.
More specifically, there are 3 forms of prediction in frame prediction: forward prediction, backward prediction, and bi-directional prediction, in which prediction is performed using a picture that combines predictive pictures obtained by predictions in both directions thereof. Similarly, in inter-field prediction as well, there are 3 forms of prediction: forward prediction, backward prediction, and bi-directional prediction, in which prediction is performed using a picture that combines predictive pictures obtained by predictions in both directions thereof.
Of the reference numbers of the motion vectors shown in FIG. 31, the numbers 5, 10 are frame vectors. Numbers 1, 2, 3, 4, 6, 7, 8, 9 are field predictions. Furthermore, in FIG. 31, the right field is the odd-numbered field, and the left field is shown as the even field.
In frame prediction, when using forward prediction, the vector that is selected is number 5, and when using backward prediction, the vector that is selected is the number 10 vector. When using bi-directional prediction, both the number 5 and 10 vectors are used.
Conversely, in field prediction, a predictive frame is generated using a motion vector in each of the odd and even fields.
In the case of forward prediction, the B-picture odd field generates a predictive frame using either the number 1 or 2 motion vector, and the even field generates a predictive frame using either the number 3 or 4 motion vector.
Similarly, in the case of backward prediction, the odd field uses either the number 6 or 7 motion vector, and the even field uses either the number 8 or 9 motion vector. In the case of bi-directional prediction, the odd field generates a reference frame by combining a forward predictive frame, which is generated using either the number 1 or 2 motion vector, and a backward predictive frame, which is generated using either the number 6 or 7 motion vector.
The even field generates a reference frame by combining a forward predictive frame, which is generated using either the number 3 or 4 motion vector, and a backward predictive frame, which is generated using either the number 8 or 9 motion vector.
The above is a method of motion prediction utilized in current MPEG-2 and other moving pictures encoding schemes. However, when a scene change occurs between the odd/even fields within an interlaced frame, the algorithms of current moving pictures encoding schemes only perform frame structure encoding, and in field prediction, one field is not able to perform forward prediction, while the other field performs backward prediction.
Consequently, in a relevant picture, encoding efficiency drops excessively, and picture quality deterioration becomes conspicuous. Further, since prediction is not applied between pictures, if there is an increase of macro-blocks, for which intra-picture encoding is performed using a B-picture, the significance of creating a reference picture of immediately thereafter using an I-picture or intra-picture encoding is also lost.