1. Field of the Invention
The invention relates to a video conferencing system which allows participants to hold a video conference through their respective terminals, and more specifically, to a multi-location video conferencing system in which video conferencing terminals at a plurality of locations are linked together by a multi-location video conferencing control unit.
2. Description of the Related Art
FIG. 1 illustrates an arrangement of a multi-location video conferencing system in which video conferencing terminals at a plurality of locations are linked to a multi-location video conferencing control unit through a network. In this figure, a plurality of video conferencing terminals 1 are linked through an ISDN 2 to a multi-location video conferencing control unit (MCU) 3, which, in turn, is connected to a multi-image combiner 4.
The system of FIG. 1 is arranged such that video images sent from the video conferencing terminals 1 at the plurality of locations are combined by the multi-image combiner 4 and the resulting composite image is then returned to the video conferencing terminals 1 through an MCU3 and ISDN2, thereby allowing participants to hold a video conference while watching the composite image.
FIGS. 2A and 2B show examples of composite images in such a video conferencing system. FIG. 2A shows an example of a composite image in a video conference among nine locations. Of participants in the conference, the participant at location 6 is speaking and so the image from that location is displayed larger than images from the other locations to thereby put emphasis on the speaker.
FIG. 2B shows an example of a composite image in a conference among four locations. In this case, that the participant at location 4 is speaking is emphasized by contrasting the image from location 4 with the others. Conventionally, video conferences are held in such a way that participants at all locations watch the same composite image.
FIG. 3 is a circuit block diagram of the multi-image combiner 4 in the conventional system in which the participants at all locations watch the same composite image as described above. The multi-image combiner is constructed from an MCU interface 10 that receives and sends image signals to the multi-location video conferencing control unit (MCU) 3, a controller 11 that receives and sends control signals to the MCU 3 to control the combination of images, reduced image creation units 12 each of which corresponds to a respective one of the video conferencing terminals and creates a reduced image that is a part of a composite image, and a readout mediation unit 13 that reads reduced images from the respective locations to create a composite image.
A number n of reduced image creation units 12 are identical in arrangement. Each of the reduced image creation units 12 comprises a CODEC (CODER/DECODER) 15 which decodes and encodes an image from a corresponding terminal and a composite image to be sent to the terminals, two image reduction units (1) 16 and (2) 17 for reducing an image sent from a corresponding terminal, and two frame memories FM (1) 18 and FM (2) 19 for storing reduced images output from the respective image reduction units 16 and 17.
As described previously in conjunction with FIG. 2A, since an image from a terminal at a site where the current speaker is located is displayed larger than images from other terminals, each of the image reduction units 16 and 17 has a different ratio of reduction. Supposing that the image reduction unit 16 reduces an image from a terminal on the speaker side, the image reduction unit (2) 17 reduces an image from a terminal on the non-speaker side. That is, an image reduced by the image reduction unit (1) 16 becomes larger than an image reduced by the image reduction unit 17. The images reduced by the image reduction units 16 and 17 are stored in the frame memories 18 and 19, respectively. The reading of images from the frame memories is controlled by the readout mediation unit 13.
For example, suppose that the reduced image creation unit #1 corresponds to location 6 shown in FIG. 2A. Then, the reduced image of the speaker at location 6 is read from the frame memory 18 so that it will be brought into a specific position in the composite image. The reduced image stored in the frame memory (2) 19 is not used in the composite image. The composite image is encoded by the CODEC 15 in the corresponding reduced image creation unit 12. The composite images corresponding to the n terminals are multiplexed by the MCU interface 10, which is in turn sent to the MCU 3.
As an alternative, there is a system in which an image from a broadcasting terminal is distributed to all other locations by the multi-location video conferencing control unit 3 instead of combining images from the locations by the image combiner as described in conjunction with FIG. 1. FIG. 4 shows an arrangement of such a multi-location video conferencing system.
In this figure, three video conferencing terminals A, B and C are linked to MCU 3 and an image from the terminal A is distributed to the terminals B and C. In such a system, the MCU3 may detect the level of a voice signal from each terminal to thereby determine the terminal having the maximum voice level as the speaker end, or the speaker end may be determined by a suitable command or control from the chairperson's terminal. Further, each terminal is allowed to select an image from a terminal other than the speaking end. Such control is implemented by image switching, distribution and control by the MCU3.
Thus, conventional multi-location video conferencing systems include systems in which a multi-image combiner is used to combine images from a number of terminals at different locations and a resulting composite image is returned to the terminals, and systems in which a multi-location video conferencing control unit (MCU3) is used to distribute only an image from a location where the current speaker is located to all other locations. Problems with the systems in which the multi-image combiner is used will be described first.
In the multi-image-combiner based systems, images from all the terminals of participants in the conference are combined and the same composite image is distributed to all the terminals. For this reason, a problem arises in that a participant in the conference is not generally able to specify a participant or participants he or she wants to watch, so as to be able to watch a composite image in which images are arranged as he or she desires.
It is by no means impossible with the conventional systems to fulfill the demand from each participant in the conference to watch a participant or participants he or she wants to watch in a desired image arrangement. To fulfill such a demand, however, a huge amount of hardware would be required. That is, for n locations, n sets of n reduced image creation units 12 and one readout mediator 13 would be required.
With the conventional systems using the multi-image combiner, as described in conjunction with FIGS. 2A and 2B, a speaker and other participants are displayed in a distinguishable manner by displaying the speaker's image a little larger than the images of the other participants or making the frame of the speaker's image more noticeable. However, the portion in which the speaker is displayed is not separated definitely from other portions in which other participants are displayed. Thus, a problem arises in that it is not necessarily easy to understand by intuition a change in the speaker.
In the system that does not use the multi-image combiner as shown in FIG. 4, display is made on the basis that an image from a broadcasting terminal at a site where the current speaker is located is distributed to the other terminals. Thus, images sent to the multi-location video conferencing control unit (MCU3) from terminals other than the terminal of the speaker are not used in the MCU and consequently discarded, resulting in a waste of image transmission from terminals other than a broadcasting terminal.