Video programs are generally formed from a compilation of different video segments which are known as "shots" in the film and video industry. Each shot consists of a sequence of consecutive frames (i.e., images) generated during a continuous (uninterrupted) operating interval from a single camera. For example, in motion pictures, a shot is a continuous series of frames recorded on film that is generated by a single camera from the time it begins recording until is stops.
In live television broadcasts a shot constitutes those images seen on the screen from the time a single camera is broadcast over the air until it is replaced by another camera.
Shots can be joined together either in an abrupt mode (i.e., butt-edit, or switch) in which the boundary between two consecutive shots (known as a "cut") is well-defined, or through one of many other editing modes such as fade or dissolve which result in a gradual transition from one shot to the next. The particular transition mode that is employed is generally chosen by the director to provide clues about changes in time and place which help the viewer follow the progress of events.
There are known automatic video indexing methods which detect abrupt transitions between different shots. An example of such a method, which can detect abrupt as well as gradual transitions, has been disclosed in patent application Ser. No. 08/171,136, filed Dec. 21, 1993, entitled "Method and Apparatus for Detecting Abrupt and Gradual Scene Changes In Image Sequences", the contents of which is hereby incorporated by reference. In the context of automatic video program indexing these abrupt transitions are often referred to as "scenes" and the detected boundaries (i.e., cuts) are referred to as "scene boundaries". A "scene", however, is commonly considered to be a sequence of frames with closely related contents conveying substantially similar information. If video programs consisted only of "still shots" (i.e., shots in which the camera is motionless), each shot would contain only a single scene. However, in general, video programs are composed not only of still shots but also "moving shots" (i.e., shots in which the camera undergoes operations such as pan, tilt and zoom). Consequently, because of camera motion the contents of a series of frames over an individual shot may change considerably, resulting in the existence of more than one scene in a given shot. Therefore, while boundaries between different shots are scene boundaries, such boundaries may be only a subset of all the scene boundaries that occur in a video program since camera motion may produce inter-shot scene changes.
Known scene change detection methods are deficient because they can only detect scene changes that occur at the boundary between shots, not scene changes that occur within an individual shot.