As electronic still and video cameras are progressively developed, they are becoming smaller and easier to use, with improving imaging capabilities. At the same time, the media on which the camera signals are stored are also becoming smaller and cheaper. With increasing battery lifetimes, it is now very easy to capture a very large amount of audio and/or visual data over a relatively short period, possibly a single session or trip, before any downloading of data is necessary. A potential outcome is the use of wearable camcorders which are continuously recording.
Not all of the recorded material will have the same degree of interest to the user, particularly when recording continuously. Accordingly and commonly, it will require editing to retain selections of the material, and possibly to re-order the selected material or edit it in other ways, such as by control of video reproduction speed, selection of key frames therefrom, or the duration of a still shot. However, the amount of time that a user will want to spend on editing the recorded material is not expected to increase in proportion to the storage capability of the camera, and is more likely to remain essentially constant.
In the past, despite the time and effort involved, such a problem has been accommodated by manual editing of the captured material to produce photo albums or edited home videos. During the editing process it is necessary to bear in mind the purpose for which the edited material is being produced, and different sets of edited material may be required for different purposes. Thus would render the editing process even more difficult and time consuming, and in practice, multiple edits from the same source material is done rarely, if ever.
Alternatively the problem has been avoided by judicious recording, as would have been the case when recording capacity was relatively limited, as in early electronic still cameras, or relatively expensive, as in photographic film cameras. Nevertheless, as will be appreciated, a choice in real time of what to record is often difficult, and particularly interesting or desirable “magic moments” are easily missed, which is why the idea of continuous recording for later editing is such a good idea in principle.
Therefore there is a need for an aid to the editing process to shorten the time and to reduce the effort required. Prior art aids may be described as:
(a) Manual editing tools for providing one or more edits of the same source material. These tools include paper based photo albums, their electronic equivalents such as PictureIt (Microsoft), electronic slideshow tools such as ACDSee, and video editing tools such as Adobe Premier, which are incorporated by reference herein.
(b) Automated video summarization or abstraction systems, on which much work has been done. In this context, “summarisation” generally refers to the generation of a set of key stills which represent the video and “abstraction” generally refers to the generation of a shorter video from parts of the source video. An example is the system provided by FXPAL, as described by A Girgensohn et al, “A Semi-Automatic Approach to Home Video Editing”, UIST '00 Proceedings, ACM Press, pp 81-89, 2000, and incorporated by reference herein. This uses a fully automatic heuristic measure of “unsuitability” to break up long video shots into shorter clips. There is also the possibility of breaking clips on the basis of the audio commentary by automatic identification of sentence boundaries. While the user can specify the overall duration of the edited video, the user must also specify manually which clips are to be used and the order in which they are to be viewed. The specified duration apparently controls the threshold of “unsuitability” used to determine in/out points for each clip. Another exemplary system is that of Intel as described by R Lienhart in “Dynamic Video Summarization of Home Video”, Proc. of IS&T/SPIE, vol 3972, pp 378-389, January 2000, and incorporated by reference herein, which groups shots in time based on the time stamp from a digital video camera. Using a technique in which the number of clips required by a fixed sampling rate is estimated, with in/out points being based on the audio content, long shots are sampled or subdivided to generate shorter clips. Again the user can specify the length of the edited video. Based on a hypothesis that all clips are equally important, the system is arranged to select clips in a “controlled random” manner. Depending on the ratio of the specified duration to the duration of the raw material, the system chooses a few “events” at random, and then picks a sequence of clips for each “event” at random.
(c) The use of professionally constructed interactive video material to control content and detail. U.S. Pat. No. 6,278,446 (Liou) describes a “System for Interactive Organization and Browsing of Video” which assumes an unknown, professionally edited, video source (and incorporated by reference herein). This is broken into shots which are then clustered into scenes or some other grouping, and in this instance an interactive method is used to correct an automated shot detection system and to organize the shots into a hierarchic arrangement which can be interactively viewed. The shot boundary detection system assumes that detecting explicit edit points in the source video is sufficient, which might or might not be true for material which has already been edited professionally, but is most unlikely to be the case for raw home video which typically will consist of very long shots which need to be broken up or reduced in some way. The clustering is designed to cater for situations which do not normally occur in home video, such as alternating shots between two camera views of the same event. U.S. Pat. No. 6,038,367 (Abecassis) “Playing a Video Responsive to a Comparison of Two Sets of Content Preferences” discloses an example of a system which selects the displayed content on the basis of user preference, and incorporated by reference herein. It is arranged for processing professionally produced material where the producer has already identified a profile consisting of one or more attributes for each segment of video material, and the viewer specifies a preference profile which is then matched against the profile of each video segment to determine whether or not that segment should be included in the version provided to the viewer. A typical use would be to allow a viewer to control the degree of sex and/or violence which they are shown from the source material.