This invention relates to the field of digital signal processing, and in particular to the field of audio signal synchronization and mixing.
The use of digital encoding of analog signals has increased significantly over the past decade. Laser discs (CDs, DVDs, etc.) are used for the storage of audio and video information in digital form. Digital audio tapes (DATs) are also used to store audio information on magnetic tape. Digital transfer protocols, such as MIDI (Musical Instrument Digital Interface) and others, are used to transfer audio information among equipment such as music synthesizers, as well as to communicate audio recordings via the Internet. Computers commonly contain audio processing systems, such as MWAV, for processing and reproducing audio signals that are recorded in a digital form.
The evolution of digital encoding of audio signals has been diverse. As a result, a number of differing sampling rates are commonly used to encode audio signals. Digital telephone systems, for example, typically sample speech at 8 kHz. European long-haul microwave communications links use a 32 kHz sampling rate. Typically CD recordings have a sampling rate of 44.1 kHz, derived originally from television system frequency relationships. Computer audio processing systems typically support the use of 11.025, 22.05 and 44.1 kHz sampling. Professional audio processing and mixing equipment use a 48 kHz sampling rate.
A combination, or mixing, of audio information from multiple sources requires that the information be synchronized to a common time base. The most straightforward means of effecting such mixing is to decode the digital encodings into audio signals, mix the audio signals as required with an analog audio frequency mixer, then encode the composite result into a digital form. Such a mixer, however, requires a decoder for each digital signal being decoded, and requires that each decoder operate at the appropriate sampling frequency. Also, any noise that is introduced by the analog audio frequency mixer will degrade the quality of the resultant composite signal.
An alternative method of mixing audio information that is encoded in digital form is to convert each of the digital encodings to a common time base by modifying the differing sampling rates to a common sampling rate. Each encoding is up-converted to the highest sampling rate supported by the mixer, because a down-conversion of an encoding to a lower sampling rate results in a loss of high frequency information in the encoding. Each digital encoding that is mixed in a professional audio system, for example, is upsampled to 48 kHz. With each encoding having the same sampling rate, the mixing of signals is effected by a weighted arithmetic sum of the samples from each encoding. Consider, for example, the mixing of an 8 kHz sampled encoding with an 11.025 kHz sampled encoding to produce a 48 kHz sampled composite. Each of the 8 kHz samples will result in 6 samples at the 48 kHz sampling rate. Each of the 11.025 kHz samples will result in 4.3537 samples at the 48 kHz sampling rate. In principle, the 6 samples from the 8 kHz sampled signal and the 4.3537 samples from the 11.025 kHz sampled signals will be added together to produce 6 samples at 48 kHz. However, as is evident to one of ordinary skill in the art, fractional samples are a misnomer. Conventionally, the input and output streams are synchronized to the shortest time period in which they each provide an integer number of samples. This synchronization period is termed a frame period. The frame period is typically the least common multiple of the periods required of each input to produce an integer number of output samples. To allow for the synchronization of the streams at periodic intervals, each of the input sampling functions and the output sampling function must periodically produce an integer number of samples at the same time. In the above example of 8 kHz, 11.025 kHz and 48 kHz sampling functions, 40 milliseconds is the shortest time period in which each of these functions produce an integer number of samples. In 40 milliseconds, 1920 samples at 48 kHz are produced. That is, 1920 is the smallest number of samples at 48 kHz that can be produced by an integer number of samples at 8 kHz and an integer number of samples at 11.025 kHz:
320 samples@8 kHz=1920 samples@48 kHz.
441 samples@11.025 kHz=1920 samples@48 kHz.
This relationship is shown in FIG. 1. Each vertical arrow in FIG. 1 represents a sample. Line 1A represents the samples at 8 kHz, line 1B represents the samples at 11.025 kHz, and line 1C represents the samples at 48 kHz. The frame size of the 8 kHz samples is 320 samples; the frame size of the 11.025 kHz samples is 441 samples; and the frame size of the 48 kHz samples is 1920 samples. The frame period of each of these 8 kHz, 11.025 kHz and 48 kHz frames is 40 milliseconds. As can be seen, at the beginning 100 and end 110 of the 40 millisecond frame period, the 8 kHz and 11.025 kHz input samples, and the 48 kHz output sample are synchronous (occur at the same time). Elsewhere throughout the frame period, the 8 kHz input and the 11.025 kHz input samples are not synchronous. The synchronization among the inputs and output is maintained by defining the number of samples of each input and output corresponding to an equal frame period, and thereafter assuring that each input and output frame begin at the same time.
Conventional mixers include buffers that allow for the collection and processing of input and output samples on a per-frame basis. Each input source in the above example requires, for example, a buffer that is sufficient to hold the incoming samples of a frame, as well as a buffer that is sufficient to hold the 1920 samples that are produced for the frame. The storage of thousands of samples can be cost prohibitive, and can substantially affect the cost and/or feasibility of integrating audio synchronization and mixing techniques into integrated circuits.
To allow for the use of less memory, the output sampling rate can be reduced. In the above example, the buffer requirements can be reduced by half if the output sampling rate is reduced to 24 kHz. However, such an encoding will result in a loss of quality from those inputs that have a sampling rate greater than 24 kHz. By the Nyquist theorem, a sampling rate of 24 kHz can be used to sample an input having a highest frequency of 12 kHz, which is below the conventionally acceptable professional standard of 20 kHz for music and other audio recordings.
Therefore, a need exists for a synchronization method and apparatus that allows for the synchronization and mixing of signals that uses a minimal amount of memory without adversely affecting the processing efficiency.