Many commercially available video conferencing systems, including those video units which use the H.320, H.323 and H.324 envelope-protocols for call set up, call control plus audio and video coding-decoding or codec formats (H.320 is the protocols for ISDN network, H.323 for the LAN network and H.324 for the standard phone or POTS connections), only provide point-to-point video conferencing. Multi-point video conferencing requires the use of an MCU (multi-point control or conference unit). A MCU can operate either in a switched presence mode or continuous presence mode. In switched presence mode, only one video stream is selected and transmitted to all the participants based either on the audio signal or “chairman” switch control. In continuous presence mode, the MCU receives video signals from each participant in a video conference and combines the signals to produce a single hybrid signal, and sends the hybrid signal back to each participant. The hybrid signal enables each participant to view on one screen the pictures of the other participants along with his or her own picture on a real time basis using a split-screen. The sophisticated structure and large computation power of an MCU presently ordinarily require that it resides on a central server. Some providers of MCU systems claim that their MCU software can be operated on a desktop personal computer (PC). However, such MCU systems apparently support only the switched presence multi-point operation or they produce a video stream in proprietary formats which require each participant to install special video conferencing software or apparatus.
Some of the factors that have made conventional MCU systems complicated follow:                1. The H.263 codec format permits the continuous presence mode. In the continuous presence mode, a MCU receives four video streams from the participants, makes some header changes, and sends them back without combining them. The computer or other apparatus of each participant needs to decode and display all four video streams to see the pictures of all the participants. The H.261 codec format does not, however, permit the continuous presence mode. The H.261 is the required codec format for the H.323 video unit. H.263 is an optional codec format. In addition, some existing systems that run H.263 do not support the continuous presence mode which is optional in H.263.        2. Most existing video conferencing systems provide only point-to-point video conferencing.        3. An MCU system can provide continuous presence multi-point video conferencing only if it can combine several incoming video channels into a single outgoing video stream that can be decoded by the equipment which receives the outgoing video stream.        4. When an MCU system combines several incoming video channels, difficulties arise:                    a. Incoming streams may use different codec formats, e.g., H.261 or H.263.            b. Even if incoming streams have the same codec format, they may have different picture types, e.g., I picture or P picture.            c. Even if incoming streams have the same codec format and the same picture type, they each may have or utilize different quantizers. This makes the adjustment of the DCT coefficients necessary and at the same time introduces errors.            d. Video frames in each of the video channels ordinarily arrive at different times. When the MCU awaits the arrival of a frame or frames from each video channel, a time delay results.            e. If the MCU waits for the arrival of a frame or frames from each video channel, operation of the MCU is, in substance, controlled by the channel with the slowest frame rate.            f. An existing technique for solving the non-synchronized frame rate problem mentioned above is to substitute the slower channels with the previous images, so that the faster channels are updated while the slower ones remain the same. But this practice takes a significant amount of memory for buffering the images and it may mean each image has to be fully decoded and encoded.                        
Accordingly, it would be highly desirable to provide an improved video conferencing system which could, in essence, provide continuous presence multi-point video conferencing while avoiding some or all of the various problems in prior art MCU systems.
Therefore, it is a principal object of the instant invention to provide an improved video conferencing system.
A further object of the invention is to provide an improved method and apparatus for providing a continuous presence multi-point video conferencing system.
Another object of the invention is to provide an improved continuous presence multi-point video conferencing system which significantly simplifies and reduces the expense of existing multi-point video conferencing systems.
During a video conference, video data are segmented into packets before they are shipped through the network. A packet is an individual object that travels through the network and contains one or a fraction of a picture frame. The header of each packet provides information about that packet, such as whether the packet contains the end of a frame. With this end-of-frame packet and the previous packets, if it applies, the MCU gets all the data for a new picture frame. Therefore, a MCU can tell if a new frame is received in a video channel just by reading the packet header. Also, at the very beginning of a video conference, before any video packet can be sent, there is a call setup process which checks each participant's capabilities, such as what kind of video codec is used. Once the call setup is done, each video channel carries video streams only in a certain standard codec format, i.e., H.261 or H.263.