1. Field of the Invention
This invention relates in general to a method for synchronizing elementary audio and video streams and in particular to a software system executed on a computer for synchronizing audio and video streams during a video editing process.
2. Description of Related Art
The rapid development of electronic hardware and software has spawned a digital revolution. Video and audio production and transmission are technologies that have certainly benefitted from the effects of the digital age. By converting audio and video files to a digital format, the files can be easily transferred and copied many times with little or no degradation of the original recording quality.
Both audio and video files require large amounts of data to accurately represent the audio or video associated with the files. Since file transfer speed and computer processing speed are usually a concern, it is desired to reduce the file size of audio and video files as much as possible. File reduction is accomplished by using the process of compression. Compression saves storage space and transmission time. Compression processes take advantage of the fact that information exhibits order and patterning. When order and patterning can be extracted from a group of information, the information can be represented and transmitted using less data than needed for the original information.
One of the most straightforward ways to compress data is to recognize pattern structures within the data set and replace the patterns with shorter data sets that express the pattern structure. The most common compression of this sort is called “run-length encoding.” Certain types of data, and in particular visual data, often include long strings of ones (or zeroes), to express an unvarying condition. Run length encoding searches for “runs” of a single data type, and creates a code that expresses the length of the run, as well as the parity of the bits. As an oversimplified example, the data set “0000 0000” could be compressed as “8 0,” signifying eight bits with a parity of zero, while the data set “1111 1111” could be compressed as “8 1”
Video data can also be compressed by recognizing patterns that naturally occur because of the way video is formatted. For example, sometimes video includes scenes where the visual image is unchanged for several frames or more. The data representing the repeated video frame may be too complex for run-length or other forms of compression within the frame, but substantial compression can still be obtained by writing the frame data once, and adding code to represent the number of times the frame is repeated.
Another form of video compression takes advantage of the tendency in video (especially on a frame-by-frame level) to avoid abrupt changes in the visual image that is generated. Rather, each frame is in most cases very similar to the frame that came before and to the one that will follow. Video compression can be achieved by fully representing a first frame and then appending data to represent each bit of data that changed in the next frame. This can be continued for each frame, perhaps until noting the changes in a frame requires more data than writing the frame out fully, at which point the compression process can begin again with the new frame as a starting point.
Compression is often described as being “lossless” or “lossy.” Lossless compression removes redundant information. An example of lossless compression is run-length encoding. As mentioned previously, no information is discarded in run-length encoding; rather, information is just rearranged and represented in a more efficient manner.
The goal of lossy compression is to remove irrelevant information. Lossy compression relies on the fact that some information in an original video stream cannot be perceived by a person viewing the video. A lossy compression algorithm will remove these imperceptible pieces of information. Lossy compression will sometimes also remove information that is “close to irrelevant” if it is determined that the benefit of the data savings outweighs the detriment caused by the perceived loss in quality.
A common compression format for video files is the MPEG-2 standard, which was developed by the Moving Picture Experts Group. FIG. 1 schematically illustrates a segment of an MPEG-2 video file 11 and an AC-3 audio file 13. Video file 11 and its corresponding audio file 13 are representative of elementary video and audio files that have undergone compression. Video files, such as file 11, that have been compressed by the MPEG-2 standard are variable bit rate files. Variable bit rate files are files that may have different amounts of data associated with each second of video. When the frames in a portion of a video steam are very similar to surrounding frames, less memory is needed to accurately represent those frames than when the frames are very different from the surrounding frames. Hence, the allocation of bits to a particular segment of video can vary at different places in the video. Audio compression is accomplished using a constant bit rate process, wherein the same number of bits are allocated to each second of audio.
Video file 11 has twelve seconds of video stored in 1,000,000 bytes. Audio file 13, which is a constant bit rate file, has nine seconds of audio stored in the same 1,000,000 bytes. In order to properly play video file 11 and audio file 13, it is desired to have the sound of the audio file “synchronized” with the video of the video file. Therefore, any sound at the sixth second of the audio file should be played simultaneous with any video at the sixth second of the video file.
As long as both files are started from the beginning, the video and audio are synched. The problem arises when a user attempts to “jump” to a particular portion of the audio and video. Jumping to a particular point in the audio and video files is necessary in any non-linear editing environment. Users attempting to edit video commonly need to fast-forward to a given point in the audio and video streams and play from that point.
Most applications currently available for non-linear editing assume that a given file size yields a given number of seconds of video and audio. When a user attempts to fast-forward to a desired in-point in the video and audio files, the user generally indicates the desired in-point by entering a time position, which represents the amount of time elapsed in the video or audio file. The application then uses a formula to calculate the desired in-point in terms of bytes. The formula takes the time position entered by the user and multiplies it by the length (in bytes) of each second of audio and video. This formula based approach works fine for uncompressed audio and video files and compressed audio files where the files are arranged with a constant bit rate. However, a formula based approach does not work properly on compressed video files, which are variable bit rate.
Referring still to FIG. 1, an arrow 15 illustrates the result of using the formula-based approach to calculate an in-point for a user-defined time position. The prior art software represented in FIG. 1 has attempted to fast-forward both video file 11 and audio file 13 to an in-point just prior to the ninth second of audio and video. Since audio file 13 is a constant bit rate file, the calculation quickly identifies the correct byte location for the ninth second of audio. However, since video file 11 has a variable number of bytes associated with each second of video, the calculation wrongly identifies the eleventh second of video as being the correct in-point. If the files were played from the in-points represented by arrow 15, the ninth second of audio would play simultaneously with the eleventh second of video. As can be appreciated by those persons skilled in the art, this is not a desired result. The playback of the audio and video from these in-points is “unsynched.” Specifically, the video represented in FIG. 1 would appear to be slightly ahead of the audio, which would result in any spoken dialogue lagging behind the movements of a person's mouth.
A need exists, therefore, for a method for synchronizing elementary video and audio steams, where the video stream is represented by a variable bit rate file. A need also exists for software to organize and process the video stream prior to a first playing of the video stream so that the video stream can be easily synchronized with the audio stream.