Home video cameras are becoming more and more common, and can capture image and sound data over increasingly long periods. Recent developments indicate that wearable cameras will soon be available which are capable of recording continuously over very long periods, and it is to be expected that this in turn will lead to the capture of extended events including interactions between different people, for example a discussion between two or more people, or the telling of a tale by one person to one or more others whose expressions change as the tale unfolds.
It is known to use a single camera to record such events, as in U.S. Pat. No. 5,844,599 (Hildin), incorporated herein by reference, where the voice of an active speaker causes the camera pan and tilt to be adjusted to bring the speaker into the field of view; and in U.S. Pat. No. 5,434,617, incorporated herein by reference, where a camera tracks a person talking to an audience. However, it is not possible to switch rapidly between two or more different viewpoints in the resulting raw or edited video, and there is a single camera location. This can lead to the video becoming tedious to watch, for example if a conversation is followed from a single viewpoint. The sound quality may also be very variable when there are speakers at different distances from the camera microphone.
To add to the difficulty for the amateur user, editing of home videos is not a particularly easy process at present, and commonly the whole of the recording needs to be looked at to decide which are the interesting parts to be retained. Systems for the automated editing of outputs from single cameras are becoming available. An exemplary system is described in GB-A-2380348, incorporated herein by reference.
In principle, of course, the situation could be improved by allocating cameras to each of a plurality of participants, with the intention of manually editing the plural camera signals into a single acceptable signal at a later date. Each camera could be operated and controlled by a participant, so that for example the direction of its field of view could follow the conversation, but it is to be expected that a hand held camera could be intrusive and a nuisance to the participant to the extent that it fails to be operated correctly. Alternatively, each camera could be mounted separately from the participants, for example to view an allocated participant, but then the view obtained therefrom is always the same, and the sound quality will be inferior because the microphone is further away from the allocated participant and will also pick up more sounds from the other participants and other noises.
Furthermore, the sheer difficulty, complexity and length of the subsequent editing process is sufficient to deter people from adopting this idea unless it is a necessity. When more than one camera is used to record the same event, decisions are required on which of two or more simultaneously recorded camera image outputs (video streams) are selected at any time, in addition to a decision as to how to deal with the plural accompanying sound signals (audio streams). Nevertheless, it is known to edit multiple recordings, as in U.S. Pat. No. 5,206,929 (Langford), incorporated herein by reference, where although there is a degree of automation, the user does need to view all of the recordings fully to decide on the edited result.
International Patent Application No. WO 00/27125, incorporated herein by reference, (Telecordia) relates to a system in which video signals from different sources are automatically combined or selected to provide an edited result, and which is described in relation to recording an auditorium type of presentation for display to a remote audience. There is also an article “Videography for Telepresentations” by Yong Rui et al, Technical Report MSR-TR-2001-92, Feb. 10, 2001, Microsoft Research, Microsoft Corporation, incorporated herein by reference, which deals with the question of recording both a presenter and members of an audience at a presentation. International Patent Application No. 99/56214, incorporated herein by reference, (Sensormatic) discloses an apparatus in which audio and video streams are analysed for various purposes, including an application in which parts of a stored sequence are highlighted for selective playback.
By contrast, of course, professionally produced material commonly uses a plurality of cameras which are carefully placed, manoeuvred and operated by skilled operators, and the edited result consists of a number of clips from each camera interspersed with clips from the other cameras, to retain and stimulate the viewer's interest by showing what is regarded as the most interesting view all the time. In addition portable microphones can optionally be used for superior sound quality, and the sound can be mixed by conventional well known techniques. In many cases several takes of a scene are required to provide the requisite result, and the flow of conversation is necessarily repeated during each take. This is an expensive option, and one not altogether suited for more informal or domestic productions, where it is desired to record a spontaneous unscripted conversation for example.