In a communication network, it is desirable to provide conference arrangements whereby many participants can be bridged together on a conference call. A conference bridge is a device or system that allows several communication endpoints to be connected together to establish a communications conference. Modern conference bridges can accommodate both voice and video data, thereby allowing, for example, collaboration on documents by conference participants.
Historically, however, the conferencing experience has been less than adequate, especially for conferences with many attendees. Each attendee of a conference call is typically connected with the conference bridge by a different type of endpoint. Some endpoints may be older phones that do not utilize any relatively new technologies. Other endpoints might be state-of-the-art communication devices. This disparity between communication devices leads to a lesser quality communication experience for the endpoints that could otherwise have received higher quality signals.
Devices and methods exist that can transform analog voice signals into digital signals suitable to pass across a communication network and then retransform the digital signals back into an analog signal for another person to hear. The device or program that is capable of performing the above-described compression/decompression and encoding/decoding steps is known as a codec (or endec in the case of hardware). Codecs can both put the stream or signal into an encoded form (often for transmission, storage, or encryption) and retrieve, or decode, that form for viewing or manipulation in a format more appropriate for these operations.
Codecs are generally used in teleconferencing and videoconferencing and streaming media applications. In a videoconferencing environment, a video codec converts analog video signals from a video camera to digital signals for transmission over digital circuits. It then converts a digital signal back to an analog signal for display. In an audio conferencing environment, an audio codec converts an analog audio signal to a digital signal for transmission over a digital circuit. It then converts the digital signals back to analog signals for reproduction. Codecs may also be used in both cases to further compress the signal for transmission. This additional compression saves bandwidth on a communication network for other signals to pass across. The purpose of this type of codec is to reduce the size of digital audio samples and video frames to speed up transmission and save storage space.
Codec algorithms may be implemented entirely in software, in which case a server, or the like, does all of the processing or this may also be done in hardware/firmware for faster processing. As audio and video signal processing techniques have developed, so too have the codec algorithms used to transmit audio and video signals. The result is that many different codecs exist and each of these codecs have unique operating parameters that affect the quality of a signal as heard by an end user.
One example of these codecs is the G.711 codec which samples at 8 kHz and is a standard to represent either a 13 bit or 14 bit, depending whether the mu-law or a-law variant is used, as 8 bit compressed Pulse Code Modulation (PCM). The G.711 codec creates a 64 kbit/s bitstream. The G.711 codec has been the industry standard and typically signals are transcoded to the G.711 codec because almost all network components are G.711 compatible.
Other examples of codecs include the G.722 codec which operates at about 64 kbit/s and samples audio data at a rate of 16 kHz; the G.729 codec which operates at 8 kbit/s and samples at 8 kHz; and the G.723.1 which operates at 8 kHz and uses a bandwidth between 5.3 and 6.3 kbit/s. Many codecs have several variants that provide different types of compression/decompression schemes. By way of example, there are extensions to the G.729 codec, which provide also 6.4 bkit/s and 11.8 kbit/s rates for marginally worse and better speech quality, respectively. As can be appreciated, there exist many other types of codecs that are available for commercial use, including speech codecs like, GSM, DV Audio, G.728, G.726, ACELP.net, and ACELP.wide audio codecs like AAC, WMA, MP3, ACELP.live, and AIFF and video codecs like MPEG, AVI, WMV, H.261, H.263, and H.264.
The quality of most speech codecs are rated according to Mean Opinion Scores (MOS). The MOS scores range from 1 to 5, with 5 being the highest possible quality of a voice signal. According to the MOS scores the G.711 codec is better than the G.729 codec, which are both better than the G.723.1 codec. The G.722 codec is an extremely high quality codec due to its wide bandwidth and faster sampling frequency. The G.722 codec is not yet rated by a MOS score, but the quality of the G.722 codec is considered superior to all of the above listed codecs.
Typical conference bridges support multiple types of communication devices by setting the conference quality equal to the lowest quality codec or by transcoding each signal to the G.711 codec. The bridge essentially requires every communication device to agree on a common operating codec. This ensures that every communication device can hear and be heard during the conference call. Unfortunately, this may require the highest quality enabled participants of a conference call to participate using a lower quality codec. Essentially, the new technology in the higher quality communication device is superfluous during a conference with a communication device only enabled with lower quality codecs.
The real problem is that one low quality communication device can degrade the conference call experience for every other participant of the conference, even if every other participant can use a higher quality communication device. Under typical conference call setups, if a communication device using a high quality codec is speaking, and one participant in the conference call is using a lower quality codec, all other participants, regardless of the codec their communication device uses, must listen to the signal at the lowest quality codec. There exists a need to allow participants of a conference call to experience the conference call according to the best available codec, not according to the participant with the worst quality codec or transcoding to a common lower quality codec.
There have been some attempts to address this problem. Namely, in U.S. Pat. No. 6,731,734 to Shaffer et al., which is herein incorporated by this reference, a multipoint control unit is described that allows for dynamic codec selection. Essentially, codecs are assigned to each endpoint in a conference call based upon optimizing objectives as determined by the multipoint control unit or its operator. When a conference call is set up, the multipoint control unit forces every participant to participate using a particular codec. If the objectives change, or another caller joins the conference, the multipoint control unit initiates a renegotiation of codecs for every single endpoint. This configuration may lead to a misuse of processing capabilities, because every time a new person joins the conference, the multipoint control unit is required to re-compute every endpoint's codec that should be used based upon the objectives of the multipoint control unit. This can become quite cumbersome if there are many participants and several of them are joining and leaving the conference at random. The computational load placed on the conference bridge has not decreased due to this configuration and every participant is not guaranteed of receiving the highest quality voice signal it can.