In professional television production, it is conventional practice to use a first medium for recording the video pictures and a second medium for recording the sound. Subsequently, editing may be performed separately upon the recorded images and again separately upon the recorded sound, often with additional sounds being added to the sound track recorded during the initial production. Thereafter, the vision and the sound are brought together to produce the final result.
As used herein, pictures recorded or transmitted in electrical form will be referred to as video signals, sound transmitted or recorded in electrical formwill be referred to as audio signals and the combination of the two will be referred to as television signals. Thus, as previously described, the final editing step involves the combining of edited video with edited audio to produce a television signal, in which the sound is synchronised with the pictures.
The majority of editing procedures consist of initially editing the video signals and, thereafter, adding the audio track to the previously edited video track. This may involve combining audio signals from many different sources and audio mixing facilities for performing this combination of different source material are known. However, a problem with editing an audio track to a previously edited video track is that it can take a significant amount of time to locate positions within the video track at which changes are to occur to the audio track. Usually, these changes take place when the video track cuts between scenes, or when something significant happens to the video pictures.
It is an object of the present invention to provide a method of detecting the position of scene changes in video sequences.