The present invention relates to a technique for detecting a change point of a video cut of a video image (continuous video image section photographed by a single video camera) stored in a video tape, a disk and the like, and also to a technique for playing/editing the video image by using this detecting technique. In particular, the present invention concerns a change-point-detection control method of a video image suitable for carrying out the editing work of the video image in a high efficiency, to a play-stop control method performed based on this change-point-detection control method, and to a video image editing system with employment of these control methods.
Very recently, video image (moving image) information could be utilized in digital forms since high-speed computers are available and storage devices with large memory capacities are available. Especially, in the editing field of video images, digitalized video images could be handled in photographing apparatuses and editing apparatuses, by which television broadcast programs are manufactured, video programs are formed, and hyper media moving pictures are manufactured. For instance, "Media Composer", "Video Shop" and the like are marketed by Avid Technology firm in U.S.A. as such video image editing apparatuses.
To effectively proceed with editing works of video images, image playing operations are preferably performed in a cut wise when contents of the video images are confirmed. This is because only a content of each video cut can be investigated without being affected by the contents of other video cuts. Also, it is a video cut and store this extracted video cut in order to utilize a storage device in a high efficiency.
However, the above-described products may provide only the editing work steps, but cannot automatically extract only each of video cuts to play only the extracted video cuts even when a plurality of video cuts are contained in picture elements. Therefore, in order to fine out a video cut desired by a user from a large number of video cuts contained in picture elements by using such an editing apparatus, contents of video images must be confirmed by repeatly carrying out the fast forward operation and the rewind operation by the user himself. Also, to extract a desirable video cut from the picture elements, the operator must find out the first frame (IN frame) and the last frame (OUT frame) of the video cut by utilizing the jog-shuttle dial, the scroll bar of the computer screen, and the like, while confirming the images and the sounds in unit one frame in the manual manner. This work should give heavy work loads to beginners and unexperienced editors, and would lower the editing work efficiency.
As to the techniques capable of easily confirming the contents of picture elements based on the video cuts, there are, for example, Japanese Laid-open Patent Application No. 4-111181 (i.e., CHANGE POINT DETECTING METHOD OF MOVING PICTURES" Japanese Patent Application No. 2-230930, U.S Pat. No. 5,083,860), and Japanese Patent Application No. 7-32027 ("METHOD FOR DETECTING CHANGE POINTS IN MOVING PICTURES AND APPARATUS") invented by Inventors of the present invention. These techniques can automatically segment the video image every video cut based upon the feature amounts of the images and the sounds of the video image, and can form the list of the still images as the typical images for the respective video cuts. Since this list of the typical images indicates the contents of the video image such as a table and an index of a book, the operators need not confirm the video image of the elements by frequently performing the fast forward operation and the rewind operation. Since the video image can be played in a segment unit called as a video cut, these techniques are useful to grasp the structure of the video image, and may achieve that a rough structure is conceivable during the editing work.
However, there are many possibilities that similar scenes are photographed several times as picture elements. At this time, generally speaking, typical images one similar to each other. Furthermore, there are other possibilities that since the photographing angle of the camera is varied during the photographing operation and also the subject under photograph is moved, the contents of the pictures are changed in the video cuts. Therefore, it is rather difficult to correctly grasp the contents of the video cuts only from the typical images (characteristic images). Thus, such a content confirming work is required by observing many video cuts which are most likely used in the editing work.
Although the video cut is automatically segmented, the picture elements must be previously segmented into the video cuts in order to form the list. To precisely segment the picture elements in the frame wise with respect to each video cut, the frame images contained in the video image must be compared with each other one by one. As a result, to execute the list forming process by the automatic video segment, the same or more time is required, as compared with the time required to play the overall video image. Thus, the preparation time until the user can commence the editing work would be prolonged. For instance, more than 1 hour is needed when picture elements for 1 hour are segmented into video cuts. This problem is not negligible with respect to such a quick-operable editing work, e.g., editing works of news programs. According to the experience rule available in the actual editing word, it is believed that "picture elements actually used in editing works are less than 10% of entire picture elements". Normally, since the video cuts actually used by the user are small among the photographed picture elements, it is not always preferable to segment all of the picture elements into the video cuts.
The problem of the conventional techniques to be solved are given as follows. That is, to extract a desirable video cut from the picture elements, the operator must find out the first frame (IN frame) and the last frame (OUT frame) of the video cut by utilizing the jog-shuttle dial, the scroll bar of the computer screen, and the like, while confirming the images and the sounds in unit one frame in the manual manner. Also, the list forming process by the automatic video cut segment would require the time longer than, or equal to that required when the overall video image is played, so that the preparation time would be prolonged until the user can commence the editing work. In other words, the automatic video cut extraction from the video image could not be limited to the video cut desired by the user.