Prior to the use of packet-based voice communications, telephone conferences were a service option available within standard non-packet-based telephone networks such as Pulse Code Modulation (PCM) telephone networks. As depicted in FIG. 1A, a standard telephone switch 15 is coupled to a plurality of telephone terminals 16 to be included within a conference session as well as a conference bridge 17. It is noted that these telephone terminals 16 are coupled to the telephone switch 15 via numerous other telephone switches (not shown) The telephone switch 15 forwards any voice communications received from the terminals 16 to the conference bridge 17, which then utilizes a standard algorithm to control the conference session.
One such algorithm used to control a conference session, referred to as a “party line” approach, comprises the steps of mixing the voice communications received from each telephone terminal 16 within the conference session and further distributing the result to each of the telephone terminals 16 for broadcasting. A problem with this algorithm is the amount of noise that is combined during the mixing step, this noise comprising a background noise source corresponding to each of the telephone terminals 16 within the conference session.
An improved algorithm for controlling a conference session is disclosed within U.S. patent application Ser. No. 08/987,216 entitled “Method of Providing Conferencing in Telephony” by Dal Farra et al, filed on Dec. 9, 1997, assigned to the assignee of the present invention, and herein incorporated by reference. This algorithm comprises the steps of selecting primary and secondary talkers, mixing the voice communications from these two talkers and forwarding the result of the mixing to all the participants within the conference session except for the primary and secondary talkers. The primary and secondary talkers receive the voice communications corresponding to the secondary and primary talkers respectively. The selection and mixing of only two talkers at any one time can reduce the background noise level within the conference session when compared to the “party line” approach described above.
In a standard PCM telephone network as is depicted in FIG. 1A, all of the voice communications are in PCM format when being received at the conference bridge 17 and when being sent to the individual telephone terminals 16. Hence, in this situation, the mixing of the voice communications corresponding to the primary and secondary talkers is relatively simple with no conversions of format required.
Currently, packet-based voice communications are being utilized more frequently as Voice-over-Internet Protocol (VoIP) becomes increasingly popular. In these standard VoIP communications, voice data in PCM form is being encapsulated with a header and footer to form voice data packets; the header in these packets has, among other things, a Real Time Protocol (RTP) header that contains a time stamp corresponding to when the packet was generated. One area that requires considerable improvement is the use of packet-based voice communications to perform telephone conferencing capabilities.
As depicted within FIG. 1B, a plurality of packet-based voice communication terminals, terminals A,B,C 22,24,26 in this case, are coupled to a packet-based network 20. Currently, in order for the users of these terminals 22,24,26 to communicate within a voice conference, a packet-based voice communication central bridge 28 must be coupled to the packet-based network 20. This conference bridge 28 has a number of problems. These problems include the latency inherently created within the conference bridge 28, the considerable amount of signal processing power required, the cost of the conference bridge, the limited input/output capacity of the conference bridge, and the maintenance and management of the conference bridge that is required. It should be noted that the high signalling power required is partially due to the conference bridge 28 having to compensate for a variety of problems that typically exist within current packet-based networks. These problems include possible variable delays, out-of-sequence packets, lost packets, and/or unbounded latency.
FIG. 2 is a logical block diagram of a well-known conference bridge design that could be implemented within the network of FIG. 1B. In this design, the conference bridge 28 comprises an inputting apparatus 30, an energy detection, talker selection and mixing block 32 and an outputting apparatus 34. Typically all three of these blocks are implemented in software.
The inputting apparatus 30 performs a number of functions on the packets that are received at the conference bridge 28 from the terminals within a voice conference. These functions include protocol stack, jitter buffer and decompression operations. During the protocol stack operation, the inputting apparatus 30 receives packets comprising compressed voice signals, hereinafter referred to as voice data packets, and strips off the packet overhead required for transmitting the voice data packets through the packet-based network 20. During the jitter buffer operation, the inputting apparatus 30 receives the compressed voice signals, ensures that the compressed voice signals are within the proper sequence (i.e. time ordering signals), buffers the compressed voice signals to ensure smooth playback and ideally implements packet loss concealment. During the decompression operation, the inputting apparatus 30 receives the buffered compressed voice signals, converts them into standard PCM format and outputs the resulting voice signals (that are in Pulse Code Modulation) to the energy detection, talker selection and mixing block 32.
The energy detection, talker selection and mixing block 32 performs almost identical functionality to the conference bridge 17 within FIG. 1A. The key to the design of a conference bridge 28 as depicted in FIG. 2 is the inputting block 30 transforming the packet-based voice communications into PCM voice communications so the well-known conferencing algorithms can be utilized within the block 32. As described previously, in one conferencing algorithm, primary and secondary talkers are selected for transmission to the participants in the conference session to reduce the background noise level from participants who are not talking and to simplify the mixing algorithm required. The selection of primary and secondary talkers is performed with an energy detection operation to determine the voice conference participants that are speaking, followed by a talker selection operation to choose the primary and secondary talkers and a mixing operation to mix the voice communications received from the primary and secondary talkers. The resulting output from the block 32 is a voice communication consisting of a mix between the voice communications received from the primary and secondary talkers. Further outputs from the block 32 include the unmixed voice communications of the primary and secondary talkers that are to be forwarded, as described previously, to the secondary and primary talkers respectively.
The outputting apparatus 34 performs a number of functions on the outputs from the block 32, these functions including compression and transmission operations. During the compression operation, the outputting apparatus 34 receives and compresses respective ones of the three outputs from the energy detection, talker selection and mixing block 32. During the transmission operation, the outputting apparatus 34 performs a protocol stack operation on the compressed voice signals, encapsulates the compressed voice signals within the packet-based format required for transmission on the packet-based network 20 and transmits voice data packets comprising the compressed voice signals to the appropriate terminals 22,24,26 within the conference session. It is noted that, in the case of the talker selection algorithm described above, the mixed voice signal is forwarded to all the terminals with the exception of the primary and secondary talkers while the primary and secondary talkers are sent the appropriate unmixed voice signals.
One problem with the setup depicted within FIG. 2 is the degradation of the voice signals as the voice signals are converted from PCM format to compressed format and vice versa within the conference bridge 28, these conversions together being referred to generally as transcoding. A further problem results from the considerable latency that the processing within the conference bridge 28. The latency of this processing can result in a significant delay between when the talker(s) speaks and when the other participants in the conference session hear the speech. This delay can be noticeable to the participants if it is beyond the perceived real-time limits of human hearing. This could result in participants talking while not realizing that another participant is speaking. Yet another key problem with the design depicted in FIG. 2 is the considerable amount of signal processing power that is required to implement the conference bridge 28. As stated previously, each of the components shown within FIG. 2 are normally simply software algorithms being run on DSP components(s). This considerable amount of required signal processing power is expensive. Even further, another key problem within current conference bridge designs is their limited input/output capacity. This limited capacity is not always significant but could be exceeded in cases where there are large numbers of participants within the conference session. As well, a large number of participants within a conference session could put a strain on the capacity of the packet-based network 20 itself due to the concentration of traffic that occurs with the use of packet-based conference bridges.
Hence, a new design within a packet-based voice communication network is required to implement voice conferencing functionality. In this new design, a reduction in transcoding, latency and/or required signal processing power within the conferencing network is needed.