The present invention relates to an apparatus in method for a implementing a packet based teleconferencing bridge and, more particularly, to a teleconferencing bridge that avoids vocoding for packet-based communications.
In teleconferencing applications, typically a teleconferencing bridge is used to combine all received audio sources into a single audio source prior to transmission to destination points (i.e., the end user participating in a teleconference call). In the past, most teleconferencing bridges have received audio sources from public switched telephone network (PSTN) utilizing compressed pulse code modulated (PCM) format. With the rapid deployment, however, of mobile and Internet telephony, increasingly larger number of conference call participants present their audio source to the teleconferencing bridge in the form of a speech packet rather than in a compressed PCM formnat. Conventional bridges, however, simply decode all packets received from mobile or voice over IP (VoIP) users and sum them to produce a single audio source. Moreover, if there are any PSTN landline callers connected to the teleconferencing bridge, their compressed PCM speech samples are expanded and subsequently added to the single audio source so as to produce a final single audio source, which is, in turn, subsequently encoded for transmission to a packet based destination or, alternatively, may be PCM compressed for a landline destination point. The packet based destination points (e.g., mobile or Internet users), however, experience compromised speech quality due to the decode-encode-decode process, also known as tandem vocoding. Additionally, this tandem vocoding increases the round trip delay for the transmission of speech packets.
Attempts in the prior art to mitigate this compromised speech quality due to tandem vocoding have included utilizing a state machine that gives priority to active speech packets based on the speech rate of the packet from a multitude of received speech packets. An underlying assumption of this methodology is that one conferencing participant will be active in the span of several speech frames. If all or a number of the parties are talking simultaneously, however, the methodology will introduce packet loss. Additionally, voice quality in terms of maintaining the continuity of respective background noises when switching from one priority party to another may be annoying and lead to listener fatigue.
Another technique known in the prior art attempts to alleviate the associated problems of the above-described prior art methodology. This technique consists of allowing multiple decoders to run at the destination point and tying each decoder into a separate mobile supplemental channel (i.e., one decoder per supplemental channel). If a new participant wants to join a conference call, however, the infrastructure must free up an additional supplemental channel to add to the existing pool of supplemental channels, thereby utilizing greater system resources. Furthermore, it is required that all vocoders be of the same type. Hence, this technique in the prior art is very taxing on resources of the infrastructure and has less flexibility as all decoders must be of the same type.