Video footage captured by a wearable video camera provides a continuous, unedited, record of the wearer's experiences in anticipation of capturing memorable or interesting events which it may be desired to review later. However, since the wearable camera may capture many hours of continuous footage, it is a time consuming process to review the footage later in order to find the interesting events.
The review and/or editing of previously unedited footage is quite different from reviewing pre-edited or published material such as a movie distributed on a DVD-video disk. A finished product such as a DVD movie presents the footage in an edited and readily-reviewable format whereas unedited footage may be many hours in length with no easy means of identifying events of interest.
Straight playback and review in real time of the video captured is a certain but laborious way of finding the interesting events. It is advantageous in that the wearer of the camera can themselves perform the review as they will most effectively identify the interesting events. The disadvantage is that it takes at least as long to review the video as the experience itself which was recorded. Straight playback faster than real time means that the audio track cannot be simultaneously reviewed and important events relating to the audio track may be missed.
Video summarisation is a known process in which continuous footage is summarised into a short video summary, maintaining a sense of the “story”. Key frames (i.e. stills “markers”) are identified and then short segments based around each stills marker are put together to make a continuous short video summary of the whole footage. Key frames on their own should provide a fast review mechanism, however with continuous video footage, such as from a wearable camera, the key frames may well be very similar to one another. If this is the case then one can only distinguish the key frames by looking at the video itself since the key frame does not provide a good clue as to what will happen in the video clip or segment.
Furthermore, if the key frames are automatically selected, human input in deciding what should be considered interesting will be absent. An automatically generated video summarisation may be fast, but may miss more subtle interesting moments or, at the other extreme, return too many false alarms. The decision as to what is interesting should ideally be made by a person, and preferably the wearer of the video camera which captured the footage, as their sense of familiarity with the footage brings significant input to the effectiveness of the review.
U.S. Pat. No. 5,805,733 (Wang) describes a method and system of summarising scenes in a video sequence by detecting scene changes and then comparing scenes in a moving window to determine their similarity. Similar scenes are consolidated and represented by a representative frame. Multiple representative frames are displayed to the user who can then select which set of consolidated related scenes to view. This method may be difficult to apply to footage obtained from a wearable camera, where there may be no distinct scene changes.
It is therefore an object of the invention to provide a method and apparatus for reviewing video, which seeks to alleviate the above mentioned disadvantages.