1) Field of the Invention
The present invention relates to an image processing device and an image processing method which detect a scene change based on inputted moving-image data, and in particular to an image processing device and an image processing method which capture data of pictures with the pixels of the pictures thinned out, and use the captured data for detection of a scene change.
2) Description of the Related Art
In recent years, it has become common practice to handle moving images in a form of digital data by using compressive encoding such as MPEG (Moving Picture Expert Group). For example, use of the video recorders has been widely spread. In the video recorders, the contents of television broadcasts are recorded in recording mediums such as optical discs in the form of digital data.
The moving-image data are normally displayed by switching still images tens of times per second, where each of the still images is called a frame (or a field in the interlace mode). When moving-image data are encoded in accordance with the MPEG standard, one of the attributes I, B, and P is assigned to each frame. The I frame is a frame constituted independently of the other frames without having information on time-dependent variations. The P frame is a frame produced by predictive encoding based on a preceding frame. The B frame is a frame produced by predictive encoding based on both of a preceding frame and a following frame.
Since, generally, successive frames in moving images highly correlate with each other, the B frames can achieve higher compression ratios than the I or P frames. However, it is necessary to arrange an I frame at a leading position of moving-image data. In addition, the I frames have advantages that editing of images is easy, and the I frames can be quickly displayed in fast-forward or backward reproduction. Therefore, normally I frames are inserted at intervals of tens to hundreds of frames. In the normal encoding procedures, the structure of the GOP (groups of pictures) is prescribed as an independent reproduction unit containing at least one I frame, and I, B, and P frames are successively produced in an order prescribed in the GOP structure, so that both of high compression ratios and high convenience such as high random-access performance are realized.
Nevertheless, when moving images are reproduced in accordance with the GOP structure, in some cases, it is impossible to achieve a high compression ratio at a boundary portion at which a scene change occurs. For example, a scene change occurs when a video recording sequence is temporarily stopped, and is thereafter restarted, or when a scene in a television is switched to a commercial, or on other similar occasions, and each frame immediately following a scene change has very low correlation to a preceding frame. Therefore, when a B or P frame is produced immediately following a scene change, the effect of data compression becomes low, and the image quality deteriorates. In order to overcome this problem, a technique of detecting a scene change, and controlling an attribute of a picture to be encoded has been considered.
FIG. 9 is a schematic diagram illustrating a construction of a conventional MPEG encoding circuit containing a scene-change detection circuit, and shows data flows in the case where the encoded frames in the prescribed GOP structure are P frames. In FIG. 9, a difference detection unit 501 calculates a difference between a first frame which is to be encoded and a second frame which immediately precedes the first frame. A scene-change detection unit 502 detects a scene change based on the first frame and the second frame. A selector 503 selects one of data representing the first frame and data representing the difference outputted from the difference detection unit 501. A data compression unit 504 compressively encodes the data selected by the selector 503.
In the above MPEG encoding circuit of FIG. 9, when no scene change is detected by the scene-change detection unit 502, the selector 503 selects the data from the difference detection unit 501. Thus, compressive encoding is performed based on data representing the difference between the first frame to be encoded and the second frame immediately preceding the first frame, so that a P frame is outputted from the MPEG encoding circuit. On the other hand, when a scene change is detected in the first frame to be encoded, by the scene-change detection unit 502, the selector 503 selects the data representing the first frame, and compressive encoding is performed by using only the data representing the first frame, so that an I frame is outputted from the MPEG encoding circuit.
According to the above construction of the MPEG encoding circuit, when a scene change occurs, the frame immediately following the scene change is an I frame regardless of the GOP structure, and inter-frame predictive encoding, which is ineffective and useless in data compression immediately after a scene change, is not performed. Thus, the image quality can be improved. In addition, it becomes possible to reliably search for a leading frame following a scene change by fast-forward reproduction or the like, or easily generate a thumbnail image of the leading frame for indexing.
In order to detect a scene change, generally, the so-called pixel difference method is used. According to the pixel difference method, a pixel difference between frames is obtained. Specifically, an absolute value of a difference in the value of a pixel at each position between frames is obtained, and the absolute values obtained at the respective pixel positions are accumulated. When the accumulated value is greater than a preset threshold value, it is determined that a scene change occurs.
Further, a technique as an application of the above pixel difference method has been disclosed, for example, in Japanese Unexamined Patent Publication No. 2000-324499 (Paragraphs <0020> to <0027> and FIG. 2). According to the disclosed technique, a first image correlation value, which corresponds to the accumulated value, is obtained. In addition, a second image correlation value, which corresponds to an inter-frame difference of the first image correlation value, is obtained. When the second image correlation value exceeds a threshold value, it is determined that a scene change occurs. Thus, only a true scene change in a moving image is detected with higher reliability even when the motion in the moving image is rapid.
Further, the motion vector method is known as another method for detecting a scene change. According to the motion vector method, a motion in an image from a frame to another frame is detected. Specifically, presence or absence of an image which is identical between the frames is detected in each of small rectangular sections. When absence is detected in a majority of the small rectangular sections, it is determined that a scene change occurs.
Although the motion vector method enables detection of a scene change with higher accuracy than the pixel difference method, the motion vector method requires more complicated processing. However, currently, there are strong demands for suppressing the processing load, the circuit size, and the manufacturing cost in image encoding. That is, it is desired to increase the accuracy in the detection by use of the pixel difference method.
Nevertheless, even when the pixel difference method is used, it is necessary to supply image data for a plurality of frames to the circuit as illustrated in FIG. 9 in order to perform processing for compressive encoding of the data and detection of a scene change. Therefore, a heavy load is imposed on the system bus through which data are transferred, and the transfer rate of the system bus is required to be increased. However, the necessity for the increase in the transfer rate of the system bus is a factor which impedes suppression of the circuit size and the manufacturing cost.