Computer systems that support the playback of synchronized digital audio with digital video as digital movies from data stored on either fixed or removable disks, or downloaded over the Internet as streaming media are well known. In addition, computers with a sound card are generally capable of recording sound input from a microphone. Computer systems with “full duplex” sound cards are capable of recording sound input whilst simultaneously playing back sound and video signals from a pre-recorded computer video and audio file.
In FIG. 1 of the accompanying drawings a conventional computer system 100 is shown which consists of a computer 110 with a CPU (Central Processing Unit) 112, RAM (Random Access Memory) 118, user interface hardware typically including a pointing device 120 such as a mouse, a keyboard 125, and a display screen 130, an internal storage device 140 such as a hard disk or further RAM, a device 160 for accessing data on fixed or removable storage media 165 such as a CD ROM or DVD ROM, and optionally a modem or network interface 170 to provide access to the Internet 175. The pointing device 120 controls the position of a displayed screen cursor (not shown) and the selection of functions displayed on the screen 130.
The computer 110 may be any conventional home or business computer such as a PC or Apple Macintosh, or alternatively one of the latest dedicated “games machine” such as a Microsoft® XbOX™ or Sony Playstation 2™ with the pointing device 120 then being a game controller device. Some components shown in FIG. 1 may be absent from a particular games machine. FIG. 2 illustrates software that may be installed in the computer 110.
In the following descriptions, the term “mouse” and “clicking” will be used for convenience as generic terms for a screen cursor control device and screen object selection operation.
A user may obtain from a CD ROM, the Internet, or other means, a digital data file 115 containing an audio and video clip which, for example, could be in a common format such as the avi or QuickTime® movie format and which is, for example, copied and stored on the hard disk 140 or into RAM. The computer 110 has a known operating system 135 such as that provided by any of the available versions of Microsoft® Windows® or Mac® OS, audio software and hardware in the form of a sound card 150 or equivalent hardware on the computers mother board, containing an ADC (Analogue to Digital Converter) to which is connected a microphone 159 for recording and containing a DAC (Digital to Analogue Converter) to which is connected one or more loudspeakers 156 for playing back audio. As illustrated in FIG. 2, such an operating system 135 generally is shipped with audio recording and editing software 180 that supports audio recording via the sound card 150 and editing functions, such as the “Sound Recorder” application program shipped with Windows®. The recording program can use sound card 150 to convert an incoming analog audio signal into digital audio data and record that data in a computer file on the hard disk drive 140. Audio/video player software 190, such as Windows Media Player shipped with Windows®, is used for playing composite digital video and audio files or just audio files through the sound card 150, further built-in video hardware and software, the display screen 130 and the speakers 156. Composite video and audio files consist of video data and one or more parallel synchronized tracks of audio data. Alternatively, audio data may be held as separate files allocated to store multiple streams of audio data. The audio data may be voice data such as dialog or singing, instrumental music, or “sound effects”, or any combination of these three types.
Most current games systems do not provide facilities to make sound recordings. However, even with these facilities, a user would be unable to synchronize and replace the audio signal in the composite video and audio file in a simple manner with audio data recorded on the hard disc.
There also exist commercially-available computer-based digital audio and video editing programs which can be installed in a conventional computer system such as the system 100 and provide the functions of both the Audio Recording and Editing Software 180 and the Audio/Video Player Software 190. Representative examples of such programs are Digidesign's Pro Tools® system, Sound Forge® program from Sony Pictures Digital, or Syntrillium Software Corporation's Cool Edit Pro (now Adobe Audition from Adobe Systems Incorporated). These known editing programs enable a skilled user to import a digital composite audio-video file into the editing program, play the video track and original dialog signals, and optionally play any music and sound effects tracks at the same time together.
With sufficient practice, the skilled user can enable and execute the recording of a new voice at the same time as playing the video track alone. The new audio data produced by this recording can be played back with the video track by the editing program, but with the timing achieved when it was recorded with the video playback. Before doing so, typically, the user must manually mute the original dialog track on the file and enable the new dialog track to play.
It is well known that it is difficult for an actor to perform an exact repetition of a line of dialog in sufficient synchronization with a pre-recorded video representation of the line being spoken, and that an audio track recorded in such circumstances is very unlikely to have its start and detailed acoustic properties synchronized with those of the original audio track.
Synchronization requires a further step of manually editing the detail of the waveform of the newly recorded audio or of obtaining, configuring and applying specialised automatic audio synchronization software, such as that described in GB2117168 and U.S. Pat. No. 4,591,928 (Bloom et al), to create a third audio signal providing a new, aligned audio track. However, even in the latter case, the skilled user must perform the further steps of muting the original audio signal and enabling the new aligned track. To view the final combination of new synchronized audio with the video, the user must control the start of playback of the editing program to ensure its playback starts before the new audio recording and stops at the end. This procedure is painstakingly complex and time-consuming, requires skills and specialist knowledge, and must be repeated for each audio sequence being replaced, for example each line in a video or song. Each of the final selected synchronized recordings must be manually selected and playback enabled, whilst disabling playback of the original or intermediate recordings, in the editing program in order to play back an entire scene with the new recordings. If there are several alternative recordings, typically kept on different tracks in the editing program, the selected one of each of these must be manually moved to a further track or tracks to enable non-interrupted playback of the selected edited and synchronized audio recordings. Furthermore, the user must enable the playback of the original audio in the sections where there is no replacement audio—or where the user has chosen to select the original audio instead of a replacement. Lastly, to achieve the desired objective there must be a means for switching between the multiple desired sources and mixing all of these selected signals with the original background audio, and feeding this mix to the audio output system while the video is played back in sync with the audio.
Even in a professional studio equipped to provide specialised automatic dialog replacement and synchronization services for the film or video industry, most of the above manual procedures must take place, and there is no generally convenient method of selecting a sequence of a plurality of the desired synchronized replacement recordings and playing these back in sequence with the video and, when a replacement recording is not selected, playing the original audio instead.
There exists a need for a system which is capable of creating automatically, for an audio/video programme with existing dialog or singing, a series of new audio recordings replacing the original audio recordings, the new audio recordings being those selected by a user from edited versions of the users recordings synchronized with the video.
It is an object of the present invention to provide, for example, a computer program which runs on a PC or games system and provides simple and immediate means to create and play a digital video simultaneously with user-selected replacement voice recordings which have been automatically edited and sequenced to playback with accurate lip-synchronization with the images in the digital video. To provide the required simplicity, the program firstly should be a single integrated program, rather than having multiple components that its user needs to obtain separately and then assemble together, and secondly should employ familiar and simple controls. To provide further simplicity, there should be an automatic means for indicating to the user when to record the replacement signal and for providing visual cues to the timing of the main acoustical events such as words. It is a further object to provide an end user with means for: a) creating a new replacement audio signal by automatically editing a newly recorded audio signal to synchronize its main acoustic features to the corresponding features in the original pre-recorded audio signal; b) selecting automatically the correct new replacement audio signals; c) automatically switching to and playing the selected new signals at the right times with the digital video; and d) playing any desired background audio such as a music backing or sound effects track in sync with the video. In other words, there should be no need for the end user to manipulate the video and audio signals other than having to take the steps of: selecting the clip of interest in the video programme; operating a few simple and familiar controls to select and optionally rehearse the lines from a part or all of this clip; recording the replacement audio for that section; and playing the selected section or the entire clip of the audio video programme with the automatically edited and synchronized replacement audio clip or clips.