This invention relates to an apparatus for detecting the change point of a scene in a compressed moving-picture and relates to the related art.
Recently, the need to handle compressed moving-pictures, such as MPEG (Moving Picture Experts Group) or DV (Digital Video), has increased because of the increased usage of digital video cameras and by the appearance of digital broadcasting. Additionally, a great many analog images of the past are preserved as digital compressed moving-pictures, which again requires the handling of compressed moving-pictures. A technique has come into use for editing such coded compressed moving-pictures without decoding them.
In editing, it is necessarily to be able to quickly and automatically detect a scene change (more specifically, an image change point or a scene change point) in a compressed moving-picture, i.e., a bit stream. The reason is that position information of the first scene of detected scenes or the representative image of scenes cut by the scene change technique is useful as an index of scene content and is an important aid to search or editing of the content.
The encoding of MPEG2, that is widely used as the format of a compressed moving-picture, like it predecessors MPEG1, H.261, uses motion vectors and DCT (Discrete Cosine Transform). In a frame, data are divided into brightness (Y) and color, difference (Cb, Cr). These data and are encoded in macro block units of 16×16 pixels.
In encoding each of the macro blocks, motion compensation prediction in which motion prediction is performed from a reference image is selected or, alternatively, intra-encoding in which encoding is performed only by data for encoding is selected.
Motion compensation prediction is a method in which the encoding percentage rises when the time correlation between frames is high. A prediction error signal is derived from a difference between the data of the macro block to be encoded and the data of the macro block obtained by motion prediction from the reference image and compressed information in time and in space. In motion compensation prediction, the prediction error signal is converted into a space frequency domain by DCT for each block of 8×8 pixels.
On the other hand, intra-encoding is a method in which block data itself to be encoded is divided into each block of 8×8 pixels, and DCT encoding is simply performed for each block.
The unit of encoding in MPEG2 is an interlaced image that is also an object thereof. In addition a frame structure and a field structure each constitute an image-encoding unit.
In the frame structure, two interlaced fields, an odd field and an even field, are subjected to encoding. In the field structure, one field of either the odd field or the even field is subjected to encoding.
In this specification, an image encoded in the frame structure is referred to as “frame structure image”, and an image encoded in the field structure is referred to as “field structure image”.
Next, motion compensation will be described. As mentioned above, MPEG2 has the frame structure and the field structure. Motion compensation prediction of the frame structure image includes frame prediction, field prediction, and dual-prime prediction. Motion compensation prediction of the field structure image, uses 16×8 MC prediction, and dual-prime prediction. In predictions other than the frame prediction, a selection can be made about whether a reference field is an odd field or an even field.
Referring now to FIGS. 15(a) and 15(b) an encoding method in the frame structure image uses two kinds of DCTs, a frame DCT and a field DCT. As best seen in FIG. 15(a), the frame DCT divides a luminance signal of a macro block into four blocks in which each block is formed by a frame, to which DCT is applied.
On the other hand, as shown in FIG. 15(b), the field DCT divides a luminance signal of a macro block into four blocks in which each block is formed by a field, and thereafter applies DCT to it.
In encoding, either of the two DCTs can be used. Generally, it is known that, when the image data difference between the odd field and the even field is large, the efficiency of encoding is improved by using the field DCT. Especially, when two scenes exist together in one field, the use of the field DCT improves compressibility.
However, in the field DCT, there is a need to divide a frame structure into two fields. This decreases processing speed with respect to the frame DCT. Accordingly, the encoding efficiency of the frame structure image (interlace image) can be improved by appropriately using the two DCTs in accordance with the aforementioned characteristics. For the color-difference signal in the 4:2:0 format, the frame DCT is always used. In the field structure image, the macro block is constructed of only the signal of one field, and therefore the field DCT is always performed.
Based on the above description, a conventional scene change detection technique uses feature quantities of:                (1) histogram of image colors,        (2) data size of a compressed moving-picture,        (3) block data difference between images of two frames at the same position, etc.        
(1) Using the histogram of image colors, colors used for an image of one frame are indicated in the histogram in that frame or in a region where one frame is divided. With the histogram as the feature quantity of the frame, a degree of similarity is calculated in comparison with the feature quantities of frame images before and after the frame (see Japanese Unexamined Patent Publication No. Hei-7-59108, for example).
(2) Using the data size of a compressed moving-picture, the data sizes of adjacent frames are, compared by use of the tendency that the compressibility is low at a scene change part, and, when the difference exceeds a predetermined threshold, the judgement that it is a scene change is shown (see Japanese Unexamined Patent Publication No. Hei-7-121555, for example).
In the techniques of (1) and (2), it is only in each frame that the scene change can be detected. Therefore, if the scene change occurs between an odd field and an even field in one frame (i.e., between two fields), the scene change cannot be accurately detected.
To resolve this problem, Japanese Unexamined Patent Publication No. Hei-9-322120 has proposed a method of detecting a scene change without conducting decoding processing from encoded image data using a field prediction method. According to this proposal, in frames to be predicted, a plurality of degrees of similarity between fields are calculated, based on a reference selection field signal by which either the odd field or the even field of a reference frame is selected for prediction, and the scene change is detected from the result. However, this technique cannot be applied to pictures where the field prediction method (between-frames prediction method) is not used or pictures where a picture of the field prediction method and a picture of other prediction methods exist together because it depends on the field prediction method.
(3) Where positional correspondence cannot be taken when the difference of only the DC coefficient in DCT at the same position is used as data. The reason is that, since two kinds of DCTs of the frame DCT and field DCT can be used as the encoding method in the frame structure image, when block data is compared without solving DCT, the 8×8 pixel data in an image and the 8×8 pixel data of either the odd field only or the even field only in the 8×16 pixel data are compared with each other if one of the compared data is encoded by frame DCT and the other data is encoded by field DCT.
In order to improve this, a comparison must be made between a frame structure image of one frame and a field structure image corresponding to one frame (i.e., odd field image and even field image). However, a problem resides in that, according to this, a comparison can be made only when the data of two field structure images are prepared, and therefore processing becomes complex, and processing speed decreases.