This invention is related to multimedia communications systems, and in particular to a method for controlling a multipoint multimedia teleconference and a controller therefor.
Multimedia multipoint conferences, commonly called multimedia teleconferences, are becoming more and more widespread. A multimedia teleconference allows a three or more of participants at a plurality of locations to establish bi-directional multimedia communication, while sharing the audio-visual environment, in order to give the impression that the participants are all at the same place.
Typical prior art multipoint conferences use a Multipoint Control Unit (MCU). Prior art MCUs are typically complex and require significant computational power because the MCU functionality includes a function typically requiring transcoding—including decoding and re-encoding—the incoming encoded media streams—including both audio and video when both are used. The decoding and re-encoding is typically to create the mixing affects to create new content to send to meet the bandwidth requirements. Thus there is a need for an alternative to prior art MCUs that include transcoding.
The invention is described herein using International Telecommunication Union (ITU, ITU-T) Recommendations H.323 and H.320 as an example. The invention, however, is not limited to H.323 or H.320.
ITU-T Recommendation H.323 titled “Packet-based multimedia communications systems” (International Telecommunication Union, Geneva, Switzerland) describes the technical requirements for multimedia communications services in a packet-switched network. The packet-switched networks may include local area networks (LANs), wide area networks (WANs), public networks and internetworks such as the Internet, point-to-point dial up connections over PPP, or using some other packet-switched protocol.
H.323 specifies four major components: Terminals, Gateways, Gatekeepers, and Multipoint Control Units (MCU). Terminals, Gateways, and MCUs are classified as Endpoints. Endpoints are devices that can initiate and receive calls. Other components associated with H.323 are the codecs used to encode, i.e., compress and decode, i.e., de-compress audio and video transmissions.
H.323 terminals use codecs to encode (compress) audio and/or video signals in order to reduce the network bandwidth required for communication. Codecs differ in a number of characteristics, including speech or picture quality, bandwidth required for signal transmission, and processor (CPU) utilization. According to H.323, all endpoints must support the G.711 voice codec standard (ITU-T Recommendation G.711 titled “Pulse code modulation (PCM) of voice frequencies”). Most endpoints also support the G.723.1 low-bandwidth voice codec standard (ITU-T Recommendation G.723.1 titled “Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s”). A H.323 endpoint may also include, but does not need to include video capabilities. If video is provided, the endpoint must support the H.261 video codec standard (ITU-T Recommendation H.261 titled “Video codec for audiovisual services at p×64 kbit/s”). Support for other standards such as H.263 (ITU-T Recommendation titled “Video coding for low bit rate communication”) may be included, but are not required. Most commercial video conferencing systems today support H.263.
H.323 specifies a call setup process that includes negotiating the capabilities of the participants, including for example which codec(s) will be used by each participant.
Terminals: H.323 terminals are client endpoints that provide real-time, two-way communications. A terminal provides at least real-time audio communications. A terminal may also provide video and/or data conferencing. Data conferencing provides capabilities such as text chat, shared white boarding, and data exchange. If data conferencing is included, such data conferencing needs to conform to ITU-T Recommendation T.120 titled “Data protocols for multimedia conferencing.”
A terminal may be a stand-alone device, or implemented in software—including a “H.323 stack”—running on a computer such as a personal computer (PC). Stand-alone devices include video telephones and Internet telephones. Today, the vast majority of terminals are PCs running terminal software programs that include a H.323 stack. While not specifically addressed by Recommendation H.323, PC-based terminals typically use a sound card, typically a full duplex sound-card, and a microphone with speakers, or a headset.
Gateways: An H.323 gateway is an endpoint that provides a real-time, two-way connection between a H.323 network and a non-H.323 network. A gateway thus provides a connection between H.323 terminals and other ITU terminals, e.g. telephones, or between H.323 terminals and another H.323 gateway. An H.323 gateway performs the translation of call control and call content necessary to convert a call from a packet-switched format, e.g., H.323 to another format such as a circuit-switched format, e.g. PSTN or a private voice network, and vice versa. Gateways are optional components in a H.323 network. They are only needed when connecting to other types of terminals such as telephones or H.320 (ISDN videoconference) terminals (ITU-T Recommendation H.320 titled “Narrow-band visual telephone systems and terminal equipment”).
Gatekeepers: A gatekeeper is an optional H.323 component that provides several important services. Most H.323 networks typically include a gatekeeper. When present, a gatekeeper provide services such as address zone-management, call-routing services, bandwidth management, and admissions control to limit conferencing bandwidth to some fraction of the total available bandwidth so other data services such as e-mail and file transfers can still function. Additionally, Gatekeepers provide address translation services between LAN aliases for terminals and gateways and IP or IPX addresses. Gatekeepers also provide accounting, billing, and charging services, when needed.
Multipoint Control Units: The Multipoint Control Unit (MCU) is an optional H.323 endpoint that provides the services necessary for three or more terminals to participate in a multipoint conference, also called a conference call or a teleconference. All terminals participating in the conference establish communication with the MCU. The MCU ensures that multipoint conference connections are properly set up and released, that audio and video streams are properly switched and/or mixed, and that the data are properly distributed among the conference participants. By using a central multipoint topology, each terminal at a different location sends its data to a MCU. The MCU negotiates between terminals for the purpose of determining which codec the MCU needs to use, and then may handle the media stream. After processing all the data, MCU sends back the mixed and switched data to each participant.
The function(s) of a MCU may be handled by a central multi-media conference server (centralized MCU), or alternately by a network of conference servers that operate co-operatively to act like one central multi-media conference server (distributed MCU). The MCU functions may be integrated in other H.323 components.
An MCU includes a Multipoint Controller (MC) and optionally one or more Multipoint Processors (MP). An MC takes care of the required call set up messages and the required messages that are used to set up the terminal media capability and to negotiate the functions for audio and video processing. Such messages and negotiations conform to H.245 (ITU-T Recommendation H.245 titled “Control Protocol for multimedia communication”). The MP(s) when present in the MCU each switches, mixes, and translates video, audio, and data streams. Thus, each MP in a MCU receives media streams from one or more conference participants, and processes and distributes the media streams to the terminals in a conference. The MC controls resources by determining what data flows are to be transmitted by the MP(s) in the MCU.
Switching ensures that a certain data flow is sent if several data flows are available (for example with the matching video sequences, if the speaker in a conference changes identified by an audio signal, or if a change is requested via H.245). Mixing allows several data flows to be combined. Mixing and switching includes splitting a created image into several segments and re-coding so that each party of the conference may be continuously present.
The one or more MPs 207 of MCU 203 each handles the required video and audio mixing and switching. The mixing typically requires transcoding. Transcoding typically includes decoding all the incoming video signals for every video conferencing terminals, scaling the signals for all other terminals, reformatting the signals for all the terminals, and organizing each of the image and mixing them into a designated position, then re-encoding the mixed audio and video signals and sending the encoded audio and video streams to each of the terminals in communication with the MCU.
Such processing is typically computationally complex, particularly when video is included, and requires a significant amount of processing power. Prior art MCU architectures thus have several disadvantages, including the following:                An MCU is a relatively complicated device that requires significant processing power to operate. For example, more and more new video compression standards emerge. Because a MCU device needs to be able to handle all such standards, the performance deteriorates dramatically as more and more terminals adopt more and more of the emerging video compression standards.        An MCU needs to transcode video and audio streams. Transcoding includes decoding, scaling, reformatting, and re-coding the incoming video signal to the different formats of the different output video signals that are required for the connected terminals. Transcoding typically includes decoding and re-coding that introduces additional quantization error that causes picture quality to deteriorate.        Because of the time required for switching, matching and transcoding, a MCU may introduce a significant amount of delay to the incoming signals.        
Thus, there is a need in the art for an improved MCU that does not require the switching, matching and/or transcoding of streams such as media streams. Such processing is usually carried out by one or more MPs, thus there is a need in the art for an improved MCU that does not require any MPs. There also is a need in the art for a MCU that does not require the computational power of prior art MCUs, that does not introduce as much delay as do prior art MCUs, and that does not deteriorate picture quality as might a prior art MCU. There further is a need in the art for a method of controlling a multipoint conference that when possible, avoids the media streams sent by any participant needing to be decoded and re-encoded en route to the other participants.