The present invention relates to a method and apparatus for use in processing and recording audio plus video (herein AV) data streams and in particular, although not exclusively, to the automated detection and logging of scene changes.
A distinction is drawn here between what has been referred to by the terms scene change or scene cut in some prior publications and the meaning of these terms as used herein. In these prior publications, scene changes (also variously referred to as edit points and shot cuts) has been used to refer to any discontinuity in the video stream arising from editing of the video or change in camera shot during a scene: where appropriate such instances are referred to herein as shot changes or shot cuts. As used herein, scene changes (or scene cuts) are those points accompanied by a change of context in the displayed material. For example, a scene may show two actors talking, with repeated shot changes between two cameras focused on the respective actors"" faces and perhaps one or more additional cameras giving wider or different angled shots. A scene change only occurs when there is a change in the action location or time. Systems for detecting shot cuts by comparison of the contents of successive video fields or frames are known in the fields of video printers (U.S. Pat. No. 4,920,423xe2x80x94which refers to the breaks as xe2x80x9cscene transitionsxe2x80x9d) and format conversion systems (EP-A 0 685 968xe2x80x94edit point detection and correction to avoid mismatches in field combination).
An example of a further use for discontinuity detection, in video tape logging, is described in International patent application WO94/11995 of Dubner et al. The apparatus comprises a videotape recorder for recording both video data and the accompanying audio soundtrack, together with a data processing system coupled to simultaneously receive the video and arranged to generate an index of detected cut points by capturing video frames from those cut points and storing them separately from the video tape, for example on the hard disc of a personal computer hosting the system. The Dubner system is described as particularly suited for logging video signals from surveillance cameras, particularly those liable to record long sequences of images of generally uninhabited areas. Once again, comparison of pixel values from successive frames is used to identify xe2x80x9cbreakxe2x80x9d points of potential interest, although in this instance it is changes in scene content which are flagged (for example instances of sudden movement occurring within the field of vision). To simplify the data handling requirements, comparison is only made between small portions of successive frames, with a numerical value being derived for changes between portions in successive frames and a value exceeding a predetermined threshold being taken as indicative of a shot cut or notable change in scene content.
In various described embodiments, the Dubner system provides an on-screen display formed from a number of reduced scale versions of grabbed frames, with facilities provided for scrolling through all or a selection of these frames. Means are provided for annotating selected frames, in addition to the time stamp indicating the location of the original frame of the video tape, by typing in comments to be recalled when that frame is next selected. A further annotation technique provides a graphic representation of the audio waveform accompanying a sequence of captured frames, on which waveform representation the user can place markers to indicate additional cut points to be recorded by the system.
Whilst the Dubner system provides a number of useful features for the monitoring and editing of video streams, it is unable to distinguish between scene cuts and shot cuts, treating both alike. We have recognised that there is a desire for such a distinguishing system and in particular one which may be employed in domestic video recording equipment in addition to more complex video editing and recording suites.
It is accordingly an object of the present invention to provide means for detection of scene changes in a video stream and operable to distinguish them from shot cuts.
It is a further object to provide video recording means capable of detecting and identifying scene changes in a recorded video stream.
In accordance with a first aspect of the present invention there is provided a method for detecting video scene changes in a video signal with accompanying audio soundtrack, the method comprising the steps of:
filtering the audio soundtrack to periodically determine a background audio signal level;
comparing current and previously determined background audio signal levels to identify discontinuities in background level; and
flagging, as the first frame of a new video scene, a video signal frame commencing at or shortly after the background audio discontinuity.
By detecting discontinuities in audio in background levels, scene changes may generally be identified and distinguished from mere shot changes where the background audio level will generally remain fairly constant.
The invention also provides video scene change detection apparatus comprising an input for a video signal with accompanying audio soundtrack, and means for detecting scene changes in the video signal received via said input; characterised in that the means for detecting scene changes comprises a filtering arrangement coupled to receive and filter said audio soundtrack to periodically determine a background audio signal level, first storage means arranged to maintain a record of the last determined background level, and comparator means arranged to flag a scene change when the current background level differs from the stored last background level by more than a predetermined amount.
In order to introduce a tolerance for miscalculation of individual background level values the first storage means may be arranged to maintain a record of the last N determined background levels, where N is a value of two or more, with the apparatus further comprising averaging means coupled with the first storage means and arranged to generate an average background audio signal level from said last N determined background levels. With this averaging of previously calculated levels, the comparator means may then be arranged to flag a scene change when the current background level differs from the determined average background level by more than the said predetermined amount.
The detection apparatus may further include means operable to identify individual frames in the video signal and arranged to flag those meeting predetermined criteria relative to detected discontinuities in the audio background level as scene change frames: where the video data is encoded according to MPEG standards, this predetermined criteria might suitably be that a scene change frame is the first I-frame detected following a discontinuity in audio background level.
In accordance with a further aspect of the present invention there is provided video recording apparatus comprising scene change detection apparatus as recited above, together with recording means operable to record received audio and video signals on a removable record carrier.
In such a video recorder, processor means may be provided coupled with the scene change detection apparatus and recording means, with such a processor being operable to generate a list of the scene change frames and their respective recorded locations on the record carrier. Preferably, the processor would be further operable to store the list of scene change frame locations on the record carrier.
The above-mentioned video recording apparatus may be further operable to play back audio and video signals from a record carrier. In such a case, the apparatus may be arranged to identify whether the record carrier contains a list of scene change frames and, if so, to identify one or more scene change frames under user control and present the or each said frame, via display means, to the user. In order to accomplish such presentation, the apparatus may further comprise image data processing means arranged to extract from the record carrier, reduce in size, and display in a predetermined arrangement, a sequence of scene change frames as a static menu screen. In order to enhance the functionality of such a menu screen arrangement, the apparatus may further comprise user-operable input means (for example mouse or joystick controller) by which a user may select one from a plurality of scene change frames on the menu screen, with the apparatus being arranged to play back stored video from the record carrier commencing from the selected scene change frame.
Still further in accordance with the present invention there is provided a record carrier having audio and video data and a list of scene change frames recorded thereon by apparatus as described above. In a preferred embodiment, to take account of the faster access times possible in comparison with video tape, the record carrier would be an optical disc according to DVD-RAM or equivalent standards.