Prior to the use of packet-based voice communications, telephone conferences were a service option available within standard non-packet-based telephone networks such as Pulse Code Modulation (PCM) telephone networks. As depicted in FIG. 1, a standard telephone switch 20 is coupled to a plurality of telephone handsets 22 to be included within a conference session as well as a central conference bridge 24. It is noted that these telephone handsets 22 are coupled to the telephone switch 20 via numerous other telephone switches (not shown). The telephone switch 20 forwards any voice communications received from the handsets 22 to the central conference bridge 24, which then utilizes a standard algorithm to control the conference session.
One such algorithm used to control a conference session, referred to as a “party line” approach, comprises the steps of mixing the voice communications received from each telephone handset 22 within the conference session and further distributing the result to each of the telephone handsets 22 for broadcasting. A problem with this algorithm is the amount of noise that is combined during the mixing step, this noise comprising a background noise source corresponding to each of the telephone handsets 22 within the conference session.
An improved algorithm for controlling a conference session is disclosed within U.S. patent application Ser. No. 08/987,216 entitled “Method of Providing Conferencing in Telephony” by Dal Farra et al, filed on Dec. 9, 1997, assigned to the assignee of the present invention, and herein incorporated by reference. This algorithm comprises the steps of selecting primary and secondary talkers, mixing the voice communications from these two talkers and forwarding the result of the mixing to all the participants within the conference session except for the primary and secondary talkers; the primary and secondary talkers receiving the voice communications corresponding to the secondary and primary talkers respectively. The selection and mixing of only two talkers at any one time can reduce the background noise level within the conference session when compared to the “party line” approach described above.
In a standard PCM telephone network as is depicted in FIG. 1, all of the voice communications are in PCM format when being received at the central conference bridge 24 and when being sent to the individual telephone handsets 22. Hence, in this situation, the mixing of the voice communications corresponding to the primary and secondary talkers is relatively simple with no conversions of format required.
Currently, packet-based voice communications are being utilized more frequently as Voice-over-Internet Protocol (VoIP) becomes increasingly popular. In these standard VoIP voice communications, voice data in PCM form is being encapsulated with a header and footer to form voice data packets; the header in these packets having, among other things, a Real Time Protocol (RTP) header that contains a time stamp corresponding to when the packet was generated. One area that requires considerable improvement is the use of packet-based voice communications to perform telephone conferencing capabilities.
As depicted within FIG. 2, a plurality of packet-based voice communication terminals, VoIP handsets 26 in this case, are coupled to a packet-based network, an IP network 28 in this case. Currently, in order for the users of these VoIP handsets 26 to communicate within a voice conference, a packet-based voice communication central bridge, in this case a VoIP central conference bridge 30, must be coupled to the IP network 28. This VoIP central conference bridge 30 has a number of problems, the key problems being the latency inherently created within the conference bridge 30 and the considerable amount of signal processing power required. It should be noted that the high signalling power required is partially due to the conference bridge having to compensate for a variety of problems that typically exist within current IP networks; these problems including possible variable delays, out-of-sequence packets, lost packets, and/or unbounded latency.
FIG. 3A is a logical block diagram of a well-known VoIP central conference bridge design while FIG. 3B is a logical block diagram of a well-known VoIP handset design. In the design of FIG. 3A, the conference bridge 30 comprises an inputting block 32, a talker selection and mixing block 34, and an outputting block 36. Typically all three of these blocks are implemented in software.
The inputting block 32 comprises, for each participant within the voice conference, a protocol stack (P.S.) 38 coupled in series with a jitter buffer (J.B.) 40 and a decompression block (DECOMP.) 42, each of the decompression blocks 42 further being coupled to the talker selection and mixing block 34. The protocol stacks 38 in this design perform numerous functions including receiving packets comprising compressed voice signals, hereinafter referred to as voice data packets; stripping off the packet overhead required for transmitting the voice data packet through the IP network 28; and outputting the compressed voice signals contained within the packets to the respective jitter buffer 40. The jitter buffers 40 receive these compressed voice signals; ensure that the compressed voice signals are within the proper sequence (i.e. time ordering signals); buffer the compressed voice signals to ensure smooth playback; and ideally implement packet loss concealment. The output of each of the jitter buffers 40 is a series of compressed voice signals within the proper order that are then fed into the respective decompression block 42. The decompression blocks 42 receive these compressed voice signals, convert them into standard PCM format and output the resulting voice signals (that are in Pulse Code Modulation) to the talker selection and mixing block 34.
The talker selection and mixing block 34 preferably performs almost identical functionality to the central conference bridge 24 within FIG. 1. The key to the design of a VoIP central conference bridge 30 as depicted in FIG. 3A is the inputting block 32 transforming the packet-based voice communications into PCM voice communications so the well-known conferencing algorithms can be utilized within the block 34. As described previously, in one conferencing algorithm, primary and secondary talkers are selected for transmission to the participants in the conference session to reduce the background noise level from participants who are not talking and to simplify the mixing algorithm required. Hence, the resulting output from the talker selection and mixing block 34 is a voice communication consisting of a mix between the voice communications received from a primary talker and a secondary talker; the primary and secondary talkers being determined within the block 34. Further outputs from the talker selection and mixing block 34 include the unmixed voice communications of the primary and secondary talkers that are to be forwarded, as described previously, to the secondary and primary talkers respectively.
The outputting block 36 comprises three compression blocks 44 and a plurality of transmitters 46. The compression blocks 44 receive respective ones of the three outputs from the talker selection and mixing block 34, compress the received voice signals, and independently output the results to the appropriate transmitters 46. In this case, the mixed voice signals, after being compressed, are forwarded to all the transmitters 46 with the exception of the transmitters directed to the primary and secondary talkers. The transmitters directed to the primary and secondary talkers receive the appropriate unmixed voice signals. Each of the transmitters 46, after receiving a compressed voice signal, subsequently encapsulates this compressed voice signal within the packet-based format required for transmission on the IP network 28 and transmits a voice data packet comprising the compressed voice signal to the appropriate VoIP handset 26 within the conference session.
The well-known handsets 26, as depicted in FIG. 3B, each comprise a protocol stack 47 coupled in series with a jitter buffer 48 and a decompression block 49, these blocks typically being implemented in software. Voice data packets sent from the central conference bridge 30 are received at the protocol stack 47 which subsequently removes the packet overhead from the received voice data packets, leaving only the compressed voice signal sent from the packet-based central conference bridge 30. The jitter buffer 48 next performs numerous functions similar to those performed by the jitter buffers 40 including ensuring that the compressed voice signals are within the proper sequence, buffering the compressed voice signals to ensure smooth playback, and ideally implementing packet loss concealment. Subsequently, the decompression block 49 receives the compressed voice signals, decompresses them into PCM format, and forwards the voice signals to the speaker within the particular handset 26 for broadcasting the voice signals audibly.
One key problem with the setup depicted within FIGS. 3A and 3B is the degradation of the voice signals as the voice signals are converted from PCM format to compressed format and vice versa, these conversions together being referred to generally as transcoding. A further problem results from the considerable latency that the processing within the VoIP central conference bridge 30 and the processing within the individual handsets 26 create. The combined latency of this processing can result in a significant delay between when the talker(s) speaks and when the other participants in the conference session hear the speech. This delay can be noticeable to the participants if it is beyond the perceived real-time limits of human hearing. This could result in participants talking while not realizing that another participant is speaking. Yet another key problem with the design depicted in FIGS. 3A and 3B is the considerable amount of signal processing power that is required to implement the conference bridge 30. As stated previously, each of the components shown within FIG. 3A are normally simply software algorithms being run on DSP components(s). This considerable amount of required signal processing power is expensive.
Hence, a new design within a packet-based voice communication network is required to implement voice conferencing functionality. In this new design, a reduction in transcoding, latency, and/or required signal processing power within the central conference bridge is needed.