In multipoint conferencing, three or more endpoint terminals communicate with each other across a network. In a packet network, there are two methods for multipoint conferencing. These are the “centralized” mode and the “de-centralized” mode, as mentioned in the H.323 standard of the International Telecommunications Union (ITU). A centralized resource, often referred to as a Multipoint Control Unit (MCU), provides the control and processing functions to each of the endpoint terminals in the conference. The Multipoint Control Unit (MCU) is an H.323 endpoint that provides the services necessary for three or more endpoints to participate in a multipoint conference known as a “conference call.” A Multipoint Control Unit (MCU) comprises a Multipoint Controller (MC), which is required, and may (or may not) comprise one or more Multipoint Processors (MP). The Multipoint Controller (MC) provides the call control functionality required for a multipoint conference, including the negotiation of common endpoint capabilities. The Multipoint Processor (MP), if one is present, provides processing (mixing or switching) of the media streams (i.e., audio, video and/or data streams).
In the centralized mode, the MCU unicasts the processed information streams to each endpoint. The unicast nature of the centralized mode uses more bandwidth as the number of participants in the conference increases. Moreover, in prior art systems the unicast streams that are sent from the MCU are likely to contain the same audio or video information.
In the de-centralized mode, the endpoint terminals multicast their information streams to all other endpoint terminals, rather than through an MCU. Each endpoint terminal is then responsible for selecting among the incoming streams and performing its own processing functions. If an MCU is included in a de-centralized system, the MCU handles only the MC function and is used as a bridge between the multicast environment and a separate unicast environment, and the MP function is left to the endpoints.
Though the H.323 standard mentions that the MC and MP functionality may also be incorporated into other H.323 entities (i.e., terminals, gateways, or gatekeepers), the H.323 does not mention how to implement this. Also, the method that is used to bridge the multicast/unicast environment is proprietary to different implementations.
An apparatus and method described in U.S. Pat. No. 6,404,745 tries to solve the bandwidth efficiency problems associated with a centralized mode multipoint conferencing arrangement. This approach involves transmitting multimedia streams from endpoint terminals to an MCU using unicast transmission. The multimedia streams are processed in the MCU and transmitted back to the endpoint terminals using multicast transmission. But as mentioned in U.S. Pat. No. 6,404,745, this method also requires the central resource to unicast other multimedia streams to selected endpoints. In the selected endpoint terminals, processing of the multimedia stream is inhibited in favor of the unicast streams, or an additional control command from the central resource to the selected endpoint terminals is required to inhibit processing of the multicast streams. This method reduces the required bandwidth a little but imposes the condition that the endpoints participating must implement the additional control processing in order to participate.
A de-centralized, receiver-based audio packet management system that intelligently selects which packets to mix is described in U.S. Pat. No. 6,418,125. In such a system, each endpoint transmits packets (multicasts or unicasts) which are received by all other endpoints participating in the conference. Each receiver uses a speaker management scheme that also identifies the speakers who are associated with the audio packets and identifies which speakers are currently “active.” Of the speakers who are identified as “active,” each speaker independently decides which speakers to mix together to produce an output audio stream. In addition, each receiver has to perform “adaptive jitter buffer processing” of all other participants. Though the bandwidth requirement may be significantly reduced in this method, the processing overhead in each of the endpoints has increased tremendously. Also, as in the previous case, this implementation also requires all the endpoints that are participating to have the capability of speaker management and mixing.
Consider, for example, the prior art network topology 100 illustrated in FIG. 1. Prior art network topology 100 comprises a multipoint control unit (MCU) 150 and a plurality of endpoints that are separate units. Prior art network topology 100 may include several local area networks connected to a wide area network or Internet through routers. In the network topology 100 shown in FIG. 1, network 120, network 130, and network 140 are coupled to and communicate through network 110 (e.g., Internet 110). Network 120 comprises endpoints (EP) 122, 124, 126 and router 128. Network 130 comprises endpoints (EP) 132, 134, 136 and router 138. Network 140 comprises endpoints (EP) 142, 144, 146, router 148, and multipoint control unit (MCP) 150.
The networks 120, 130 and 140 each comprise a plurality of endpoint terminals (EP) that are capable of sourcing and receiving information streams. This may be done, for example, by establishing logical channels in accordance with the ITU-T H.245 standard and the H.323 standard.
In addition to the endpoint terminals in local network 140, a multipoint control unit (MCU) 150 is connected in local network 140. MCU 150 provides the capability for two or more endpoint terminals to communicate in a multipoint conference. MCU 150 provides conference control and centralized processing of audio, video and data streams, which include mixing and/or switching of the streams.
In a typical prior art system, the endpoints receive a mix of audio sources selected from among the conference endpoints based upon a comparison of voice levels of the conference endpoints, which are designated audio broadcasters. The broadcast audio is then transmitted by the multipoint control unit (MCU) on a single multicast address to all of the endpoints in the conference. But the problem is that this cannot be the audio that is to be received by the endpoints that are identified as the broadcasters. If this is the case, the endpoint will hear a delayed version of its own speaker's voice resulting in an echo-like situation. This problem is avoided in U.S. Pat. No. 6,404,745 by sending separate and distinct audio streams to those endpoints, for which the broadcast mix is not appropriate, using a unicast audio stream. But then in these endpoints there must exist a decision making mechanism to choose either the multicast stream or the unicast one. Some methods to do this are also presented in U.S. Pat. No. 6,404,745. The problem with this approach is that the delay in arrival of the unicast stream could have an adverse affect on the decision making mechanism.
There is therefore a need in the art for an improved system and method for providing a multipoint control unit (MCU) for use in a multipoint audio conference in a packet network. There is also a need in the art for an improved system and method for implementing a multipoint control unit (MCU) in an endpoint that is participating in and managing a multipoint audio conference in a packet network.