1. Field of the Invention
This invention relates broadly to telecommunications equipment. More particularly, this invention relates to a system which performs automatic gain control on the audio channels of multimedia telecommunications equipment.
2. State of the Art
With the increase of throughput (data rate) available in the telecommunications industry, and in association with the improvement of compression and decompression algorithms, the number of telecommunication applications available to individuals and businesses has increased dramatically. One of these applications is called "multimedia communications" which permits video, audio, and in some cases other data to be transported from one party to another or others. Multimedia communications can be utilized for a number of applications, and in different configurations. One configuration of recent interest has been multimedia conferencing, where several parties can communicate in a conference style.
In multimedia conferencing, the audio and video data is handled such that each party can see and hear one, several, or all of the other parties. In fact, various telecommunications recommendations and standards are presently being adopted by the ITU-T, ISO, and Bellcore which govern the protocols of multimedia conferencing (see, e.g., ITU-T T.120). In the multimedia conferencing systems of the art (as represented by prior art FIG. 1), the audio, video, and other data streams generated by a user's system 12a are multiplexed together directly in the encoder section of a multimedia encoder/decoder (codec) 14 located at the source/terminal 16, and transported together through the transport network 20 (now proposed in ATM format) to a similar "peer" codec at a remote location. The peer codec is either another codec 14 at the remote user site for a point-to-point conference, and/or a codec/switch 24 at a multimedia bridge 26 (also called a multimedia multipoint server or MMS) for a multipoint conference. The MMS 26, which typically includes a codec/switch 24, a controller 28, an audio processing unit (APU) 30, and a video processing unit (VPU) 31, provides conference control (e.g., it determines the signal to be sent to each participant), audio mixing (bridging) and multicasting, audio level detection for conference control, audio level switching, video mixing (e.g., a quad split, or "continuous presence device" which combines multiple images for display together) when available and/or desirable, and video multicasting. Specifically, the audio processing unit (APU) 30 controls the audio level detection for audio mixing, audio multicasting, and voice activated video switching. The audio and video data exiting the MMS are multiplexed, and continue through the transport network 20 to the desired multimedia source terminals 12b, 12c.
As stated above, multimedia systems are often provided with voice activated video switching. Voice activated video switching operates to display on a monitor a particular video signal based upon the power level of a speech-based audio signal multiplexed with the video signal. When a source provides an audio signal having a higher power level than the currently active signal, the APU 30 of the controller 28 automatically switches which video signal is displayed at the other terminals. As a result, the party speaking loudest typically has his or her image displayed on the monitor at the other terminals.
While, audio level, or voice activated, video switching is commonplace in the art, it suffers from several problems. First, audio signals from different sources do not share a common reference power level and, as such, the voice activated switching can be prevented from correctly switching to the appropriate party. Second, the power level of a channel varies according to several factors including signal attenuation due to analog transport losses (caused by the distance between a source terminal, i.e., a microphone, and its digital encoding station), and microphone sensitivity. Thus, switching based on an audio level received at the switch can be flawed due to power level variation. Third, the APU 30 of the multimedia bridge 26 is typically provided with a channel selection system which compares the power level of each of the channels joined in a multimedia teleconference with an experimental threshold power level. When the power level of a channel exceeds the threshold power level for a period of time (e.g., for greater than three seconds) and exceeds the power level of all of the other channels, then that channel is selected as the "loudest speaker" and the other parties to the teleconference in voice activated video switching mode receive the video associated with that loudest speaker. However, a problem occurs when the noise on one or more channels is particularly high, i.e., higher than the threshold power level. In such a case, the channel selection system is prevented from correctly selecting the appropriate channel, as the noisiest channel will always be the selected channel. As a result, it is evident that the power level of an audio channel by itself is not a reliable enough value on which to identify the loudest speaker.
Microphone calibration has been used to attempt to reduce the effect of differing microphone sensitivity. Calibrated microphones do generally provide a more consistent power level on the several channels. However, calibration should be done using the actual circuit connecting each microphone to the MMS so that a predetermined audio signal cue reaches the MMS at a given power level. This process is laborious and error prone. Depending on the technicians performing the calibrations, some microphones may be calibrated to a different sensitivity than others, failing to even adequately correct for the portion of unreliability rooted in the microphones.