1. Field of the Invention
The present invention relates to a moving image processing device automatically detecting a scene change from a moving image that is compressed with inter-frame prediction encoding, and a method thereof.
2. Description of the Related Art
In recent years, inter-frame prediction encoding methods such as H.261, ISO/IEC MPEG1, ISO/IEC MPEG2, etc. have been internationally standardized to realize the storage and the transmission of a digital moving image having an enormous amount of information. Moving image recording/reproducing devices encoding/decoding a moving image are developed with these methods. Additionally, moving image data conforming to a Video CD standard using the ISO/IEC MPEG1 has become popular on a worldwide scale. Furthermore, the ISO/IEC MPEG2 is used to record digital video signals onto a DVD.
In the meantime, the capacity of a storage medium (such as a hard disk, a magneto-optical disk, etc.) recording a moving image has been becoming large, and a long-duration moving image can be stored onto the storage medium and processed. Specific applications include moving image editing, video-on-demand, etc.
To edit a moving image, capabilities for assisting image search or editing, such as index generation from a moving image, etc. are essential. For the index generation, scene change detection is effective. Since a stream of a moving image for which the inter-frame prediction encoding is performed is a bit string of encoded data, a scene change cannot be detected from the bit stream directly. Accordingly, a variety of methods and devices detecting a scene change are conventionally proposed.
The scene change detecting methods are typified by a method using differential image information between frames, a method using discrete cosine transform coefficient (DCT coefficient) information, a method using color information, a method using data encoding amount information, a method using motion vector information, a method using macroblock number information, and a method combining these items of information. With these methods, however, if the scale of a particular circuit for detecting a scene change becomes large, the cost of a moving image reproducing device increases.
There are few conventional methods quickly detecting a scene change, for example, the following methods can be cited.    (1) A method examining the amount of encoding of a motion vector.    (2) A method detecting a scene change after once decoding encoded data completely, and restoring a moving image.
Examples of such a method include a method using differential image information between frames or color information, and a method obtaining a motion vector aside from a motion vector for motion compensation and using the obtained motion vector.    (3) A method partially decoding a moving image after being encoded, and quickly detecting a scene change with the partial data.
Examples of this method include a method using discrete cosine transform coefficient information, data encoding amount information, motion vector information, and macroblock information.
With the above described method (1), the scale applied to all motion vectors within a frame and the magnitude of an individual vector are separately encoded, which causes a scene change to be erroneously detected. This is because the motion vectors are difficult to reflect only with the amount of encoding of a motion vector.
With the above described method (2) requiring encoded data of a moving image to be completely decoded, a storage device for storing data after being decoded, and an arithmetic operation circuit for performing an arithmetic operation between pixels within a frame are necessary, which leads to an increase in the scale and the cost of circuitry. Furthermore, since at least a processing time equivalent to a reproduction process is required to decode encoded data, it is difficult to speed up scene change detection.
Although the detection processing can be made faster with the above described method (3) than that with the above described method (2), the following problems are posed.
Firstly, with the method using discrete cosine transform coefficient information, a discrete cosine transform coefficient is information that is possessed by each constituent element of an image. Therefore, the image must be decoded to just one step before the restorion of the image, and a considerable amount of time is required for decoding.
With the method using data encoding amount information, a frame with a large amount of data encoding is regarded as a scene change, and the processing can be made fast because only the amount of data encoding is used. However, the amount of data encoding for a frame becomes large also when the motion of a subject is active, not when a scene changes. Therefore, a scene change is prone to be erroneously detected.
The method using motion vector information focuses only on the magnitudes of motion vectors. Accordingly, even if there is an image constituent element having a motion vector the magnitude of which is 0, this information is not reflected. Therefore, information effective for detecting a scene change is not fully utilized. For this reason, this method lacks the accuracy of scene change detection, and a motion vector must be used by being combined with other information, leading to an increase in the processing time required for detecting a scene change.
Additionally, for a long-duration moving image, the number of scene changes included increases. However, most conventional scene change detection methods aim at optimizing encoding. To achieve this aim, all detected scene changes must be presented. If the number of detected scene changes is very large, viewing all of the detected scene changes as auxiliary information decreases operational efficiency when a moving image is searched or edited.
Furthermore, if a scene change is used as auxiliary information for searching or editing a moving image, the number of scene changes according to the reproduction time of a moving image, or a scene change of great importance must be presented. Besides, the degree of the importance must be changed depending on the contents (genre) of a moving image. However, there are no conventional methods presenting such information.