(a) Field of the Invention
The present invention relates to a multipoint video-meeting control system and, more particularly, to a multipoint video-meeting control system capable of smoothly switching video data and voice data without causing sense of incongruity in the switching. The present invention also relates to a method for operating such a multipoint video-meeting control system.
(b) Description of the Related Art
A multipoint video-meeting control system is known in the art which allows participants in a plurality of remote locations (local points) to have a meeting between the participants in the remote locations by using TV monitor or display units. A conventional technique for switching video data in the video-meeting control system is described in JP-A-2000-83229. The video-meeting control system generally includes a plurality of meeting terminals disposed in the remote locations, and a multipoint control unit (MCU) for controlling transmission of video data and voice data between the same and the meeting terminals.
The meeting terminal generally includes an image (video data) input/output block, a voice (voice data) input/output block, and a document data input/output block. The MCU is connected to each of the meeting terminals in the remote locations via a public communication line, thereby receiving video, voice and document data from the meeting terminals, processing these data therein and transmitting the processed data to the meeting terminals.
In the video-meeting control system, the MCU generally specifies a speaker (or speaker terminal) by detecting the voice data of a meeting terminal having a maximum volume level among all the voice data received from the meeting terminals. The MCU then transmits the video data to the other terminals received from the meeting terminal that transmits the voice data having the maximum volume level, thereby displaying the video data of the speaker on the screens of the meeting terminals.
It is difficult, however, to display without a delay the video data of a speaker that speaks for only a short time interval, because the MCU should have a marginal time between the time instant of the maximum volume detection and the time instant of switching of the video data so that the video data is not switched based on a wrong input voice such as a cough of another participant.
In addition, since the input gain of the voice input block is not uniform among the meeting terminals, the detection of the maximum volume level does not necessarily specify the speaker terminal. More specifically, for example, the voice data from a meeting terminal having a highest input gain of the voice input block may be incorrectly specified as the voice data of the speaker.
In another scheme, the MCU transmits the vide data received from the meeting terminals to display the video data in the divided area of the screen of each meeting terminal. It is expected that the participants judge the speaker on the screen by recognizing the movement of the mouth of each participant in the divided area of the screen.
In this scheme, however, it is relatively difficult for the participants to recognize the speaker on the screen because the divided areas of the screen reduce the image size of the participants in the remote locations.
In a modification of this scheme, the MCU may display an indicator on each divided area to show the volume level of the each divided area, thereby allowing the participants to easily recognize the speaker terminal based on the volume level thus displayed. In this modification, however, there still remains the problem of the reduced sizes of the speaker wherein the looks of the speaker are not received by the participants, thereby loosing the most of the merits of the video meeting.
In another scheme, an operator determines, based on the mimic voice of the speaker, which video data from the meeting terminal should be transmitted, and thereby switches the video data on the screens by hand. In this scheme, however, the switching is not achieved timely and accurately depending on the skill of the operator, especially in the case of presence of similar mimic voices in the participants.
In view of the problems in the conventional techniques, it is important for the MCU to transmit the voice data having a suitable volume level to the meeting terminals, as well as to detect the speaker and switch the video data of the speaker without a delay.
Patent Publication JP-A-9-168058 describes a technique for transmitting the voice data in a multipoint video-meeting control system, wherein the MCU transmits voice data to a meeting terminal after mixing a plurality of voice data received from the other terminals, in order for allowing the voice data to be exchanged between the plurality of terminals.
One of the video data switching techniques described before can be used in the control system described in the above publication, JP-A-9-168058, in view that it is useless to simply mix the video data from a plurality of terminals during transmission of data between the plurality of terminals. The MCU in the publication uses one of the above switching techniques wherein the video data corresponding to the maximum volume level of the meeting terminal is transmitted to all the meeting terminals.
In the described technique, the MCU normalizes the volume levels of the voice data from the meeting terminals before the MCU mixes the voice data and transmits the mixed voice data. This solves the above problem caused by the different input gains. In addition, a correct speaker can be detected to switch the video data while using the normalized volume level.
However, the improvements by the technique described in the publication, JP-A-9-168058, are not sufficient for switching the video data of the speaker at a suitable timing and for transmitting the mixed voice data having suitable mixing levels to the meeting terminals.
In view of the problems of the conventional technique, it is an object of the present invention to provide a multipoint video-meeting control system which is capable of transmitting mixed voice data having a suitable volume level to the meeting terminals and switching the video data on the screens of the meeting terminals without causing a sense of incongruity.
The present invention provides a multipoint video-meeting control system including: a plurality of meeting terminals each including a voice data input/output block for inputting/outputting voice data, a video data input/output block for inputting/outputting video data, and a local data transceiver connected to the voice data input/output block and the video data input/output block; and a multipoint control unit (MCU) including a central data transceiver, connected to each the local transceiver via a network, for transmitting/receiving voice data and video data to/from each of the meeting terminals through each the local transceiver, a speaker terminal selection block for selecting one of the meeting terminals as a speaker terminal, a voice data mixer for mixing the voice data from the meeting terminals at specified mixing ratios to generate mixed voice data, the specified mixing ratios allocating at least a predetermined mixing ratio to the voice data from the speaker terminal, the central data transceiver transmitting the video data of the speaker terminal and the mixed voice data to the meeting terminals.
The present invention also provides a method for controlling a video meeting, including the steps of: receiving voice data from a plurality of meeting terminals; selecting a speaker terminal among the meeting terminals based on volume levels of the voice data from the meeting terminals; mixing the voice data from the meeting terminals at specified mixing ratios to generate mixed voice data, the specified mixing ratios allocating at least a predetermined mixing ratio to the voice data from the speaker terminal; and transmitting the video data of the speaker terminal and the mixed voice data to the meeting terminals.
In accordance with the multipoint video-meeting control system and the method of the present invention, since a predetermined mixing ratios is allocated to the voice data from the speaker terminal, the mixed voice data transmitted to the meeting terminal has a suitable mixing level.
In a preferred embodiment of the present invention, the video data of the speaker terminal can be switched at a suitable timing without a sense of incongruity.