The system for post-synchronization that is used throughout most of the world is based on what is called a “beep-and-wipe” system. In a recording studio, the actor is given earphones, through which the dialog is fed.
An audible beep is sent as a signal to signify the beginning of the line to be re-recorded. A visual indicator, called a wipe, is superimposed on the screen as a visual indication of when to begin and stop. A series of takes are recorded, sometimes as many as 24, and are given to the editor in order to verify by eye or by trying to match the sound waves of the original production take with that of the newly recorded ones. Most of the editing is, in the end, totally dependant on the experienced eye and ear of the human operators. The method used for film dubbing in the greater part of the world is the same, except in the United States where the voice of the translator is fed into one of the earphones while the other carries the mixed track of dialog from the original language. The norm for the recording of dialog using this method is between ten to twelve lines of text per hour of studio time.
The system used in France, Quebec, and South Africa consists in taking the film that is to be post-synchronized (or dubbed) and transferring it to either a three quarter inch or a half inch video tape. The video is fed from a VCR to a special machine, called a detection machine, that links a roll of white 35 mm leader film with the VCR so that they run synchronously with each other. A detection of the scene cuts, and all the lip movements and dialog is then performed of the original language. A highly skilled craftsperson, called a detector, then proceeds to write with a pencil, on the strip of white leader. The detector copies the original language of the film dialog, following the precise movements of the lips and matches them to the spoken word. During this process, a particular emphasis is laid on a precise matching of the labials and semi-labials. A calligrapher then runs a strip of clear 35 mm leader on top, that is matched sprocket to sprocket with the original white strip underneath. The two rolls are then run simultaneously on a small geared table. After the rolls are locked, the calligrapher proceeds to copy the detection on the clear leader using a special pen and India ink. When this is completed, the calligraphied dialog is typed by a typist into a computer and copies of the text are printed for the director, the recording engineer, and the actors. The problems inherent with this system is that they are inefficient in their consumption of time and “man hours”. Approximately 150 “man hours” are needed to complete all the operations for a “feature length film” (i.e. a film ranging from 90 to 100 minutes in running time). Since these operations are dependent upon a number of hands, they are open to errors and inaccuracies in the detection process and the calligraphy. After the recording sessions are completed, an editor works on the dialog tracks, adjusting the synchronization. When that is completed to everyone's satisfaction, a final mix of the tracks is done, and the script is re-conformed and is tabled for distribution.
The U.S. Pat. No. 5,732,184 teaches a system for the editing of video and audio sequences, and relates only to a system for editing video clips, or small portions of video, and sound clips based on short sections of sound waves displayed on a video screen. The cursor is able to display no more than three frames of video and sound at the same time in one direction or the other. The cursor then becomes an aid to identifying the material only.
Published GB Patent application GB2,101,795 relates to dubbing translation of soundtracks on film. This invention depends upon an ability to provide histograms, or a digital representation, of the sound amplitude. Somewhat difficult for the actors, as it is like asking them to learn a whole new alphabet. The invention also suggests that recorded material can be electronically shaped to fit the lip movement in order to produce a more natural speech. Unfortunately, it is known, in light of the current technology, that any reshaping that is not minimal will only distort the sound and will not therefore provide a natural sound. Each section, or loop of film, requires that it is manually operated by a trained user.
In the French patent publication 2,765,354, a system is disclosed and allows dubbing into French from other languages. This invention is also used to match the new French dialog to the images. Unfortunately, the system disclosed is slow and time consuming, as it is not automatic and requires manual input. It provides a maximum of 6 usable lines on a timeline. Furthermore, it also does not allow any modifications to be made since the dialog has already been permanently encrusted on the picture. It requires the performers to learn a whole new language of symbols different from the symbols normally used in the standard manual form of operation.
The international publication WO98/101860 provides a fairly simple device that attempts to use a computerized calligraphy of the dialogs. Its primary market is actually the home-entertainment or classroom games market. This device allows the player to substitute their voice for the one on the screen, using a basic recording device.
The “beep-and-wipe” system (in ADR, or Automatic Dialog Replacement) that is currently used throughout the world, is a system that is learned by performers, who then must develop a proficiency for it. Otherwise, it becomes rather tedious, frustrating, and time consuming. Actors must do it instinctively, i.e. they must learn to anticipate when to begin taking into account the fact that it takes the human brain 1/20th of a second to decode what the eyes have seen and then, the time it takes for the actor to respond to what he or she has just seen would put the synchronization out approximately 1½ frames. The amount of text that can be said by the actor is limited in terms of time because it is based on the individual actor's retentive powers. The actor who begins his line late realizes it, and tries to catch up by the end of the sentence, making it very difficult to edit. This means that many takes have to be recorded, causing the editor to spend large quantities of time piecing together the final take. The time required by, not only the actor but by the director, the studio engineer, the editor, plus the cost of the studio itself will only create a greater expense of both time and money. An expense that could be avoided.
Spot editing is the editing in the studio by the studio engineer, who tries to match or tailor the waveforms of the original dialog with the newly recorded one. While some spot editing can be done in studio by trying to match waveforms, the drawbacks to this are that it requires some training and knowledge in the ability to read the waveforms so as to be able to properly match them, and also if there is too much variation in the tailoring of the waveforms, it will ultimately cause a distortion in the sound.
The human factor is very important in the current post-synchronization methods used around the world. Operators must be highly trained. Experienced operators are therefore needed as such methods rely on the capacity of the operators to interact and to react with the system, therefore the quality of the post-synchronization performed may vary from time to time. Furthermore these methods are very time consuming, and therefore are very costly.
Accordingly, there is a need for a method and apparatus that will overcome the above-mentioned drawbacks in post-synchronization.