The present invention relates, generally, to the transmission of voice over packet networks and, more particularly, to techniques for improving voice-over-IP (VoIP) conference bridges and transcoders.
The explosive growth of the Internet has been accompanied by a growing interest in using this traditionally data-oriented network for voice communication in accordance with voice-over-packet (VoP) or voice-over-IP (VoIP) technology.
In traditional switched networks, conference callsxe2x80x94where multiple participants engage in simultaneous conversation with each otherxe2x80x94are enabled by a conference bridge which typically resides within the central office. In a switched network, all conference participants are simply connected to the conference bridge, which mixes the speech from the various speakers and feeds the mixed signal back to the participants.
In the context of packet networks, the various packets from the participants are routed to the IP-based conference bridge. The speech information from the speakers is obtained, de-packetized, and decoded. The mixed speech is then re-encoded, packetized, and sent back over the packet network to the conference call participants.
Known conference bridge solutions are inadequate in a number of respects. For example, the decoding and re-encoding of the speech signal (a xe2x80x9ctandemxe2x80x9d process), reduces the quality of the speech. More particularly, the tandem operation of the post-filter, common in low bit-rate speech decoders, generates objectionable spectral distortion. This is especially noticeable in cases where different speech coding standards are used for the various input speech channels.
Known conference bridge solutions are also inadequate due to the limitations of the mixing scheme used to combine the multiple input channels. Conventional systems sum the decoded speech signals and then re-encode the mixed speech for output. This can be a problem in cases where several participants attempt to talk at the same time, as the limited order of the representation is typically not suitable for the representation of mixed speech. Furthermore, even in the case of a single speaker, the re-estimation of the spectrum during re-encoding generations a significant degradation in the second encoding. Furthermore, the re-estimation of the spectrum requires additional buffering of speech samples, resulting in an additional speech delay at the conference bridge.
Known bridge designs are also unsatisfactory in that, while the background noise level from a single participant may be relatively low, the addition of multiple channels, each having their own noise component, can result in a combined noise level that is intolerable.
Typical conference bridge systems are also inadequate in that the speech of each participant is mixed without any priority assignment. When a number of participants attempt to speak at the same time, the resulting output can be unintelligible. Furthermore, handling returned echo from multiple participants can be a major problem in conference bridges operating in a frame-based packet network environment.
Systems and methods are therefore needed to overcome these and other limitations of the prior art.
The present invention provides a conference bridge or transcoder configured to intelligently handle multiple speech channels in the context of a packet network, wherein the various speech channels may adhere to a variety of speech encoding standards. In general, the conference bridge establishes framing and alignment of multiple incoming speech channels associated with multiple participants, extracts parameters from the speech samples, mixes the parameters, and re-encodes the resulting speech samples for transmission back to the participants. In accordance with other aspects of the present invention, priority assignment and speech enhancement (e.g., noise reduction, reshaping, etc.) are performed.