The present invention relates to a multi-point video conference system for holding a video conference by linking multiple points and, more particularly, to a multi-point video conference system in which the voice levels of the respective entrant CODECs as participants in a video conference are adjusted.
With the use of a video conference system, each participant in a video conference can see pictures of other participants through a television set even if the participants are in different places. A great deal of attention has therefore been given to a video conference system as an effective means for realizing smooth communication between people in remote places. In a video conference system designed to hold a video conference between two points, smooth communication can be realized when each participant transmits video and voice data from the home station to the distant station. In a multi-point video conference system designed to hold a video conference among three or more points, a video conference is generally performed by using an MCU (Multi-point Control Unit) with the respective entrant CODECs being connected to the MCU in the form of a star connection configuration.
In this case, the MCU is an apparatus for collecting video and voice data from entrant CODECs and distributing the video and voice data to the respective entrant CODEC such that the respective participants can feel as if they were in the same conference room. In many cases, a picture from the CODEC to which the last talker belongs (to be referred to as the last talk CODEC hereinafter) is transmitted to the remaining entrant CODECs. As for voice data, voice data from talk CODECs are mixed and distributed to the respective entrant CODECs. In this case, if the mixed voice data are returned to all the entrant CODECs without any change, the voice data from each talk CODEC is output from its own speaker and input to its own microphone. As a result, howling occurs.
In such a conventional multi-point video conference system, to prevent howling, the following processing is performed. After mixing voice data from talk CODECs to obtain a mixed voice signal to be transmitted to each entrant CODEC, the MCU extracts a voice signal received from a CODEC as a transmission destination from the mixed voice signal, and transmits the resultant signal. That is, the MCU removes a voice signal received from a given talk CODEC from a mixed voice signal, and transmits the resultant signal to the talk CODEC. In this manner, each entrant CODEC receives the mixed voice signal obtained by removing its own voice data from the mixed voice signal obtained by mixing the voice data from the remaining entrant CODECs. For example, such a multi-point video conference system is disclosed in "Multi-point Control Unit", NEC technical report, Vol. 44, No. 6/1991, pp. 32-38 and Japanese Patent Laid-Open No. 60-94572.
The levels of voice data transmitted from the respective entrant CODECs to the MCU vary greatly depending on the magnitudes of the voices of the respective talkers at the talk CODECs, the distances from the talkers and the microphones, or system environments including the amount of amplification of the voice signals and the like. If, therefore, voice data from a given talk CODEC is mixed with voice data from the remaining entrant CODECs, and the mixed data is returned to the talk CODEC, the voice output level of the talk CODEC can be adjusted by comparing its voice level with the voice levels of the remaining entrant CODECs.
In the conventional multi-point video conference system, voice data from each talk CODEC is not returned thereto in consideration of howling. Voice signals input from the respective talk CODECs to the MCU vary considerably in level.
To switch pictures, the MCU specifies the last talk CODEC of a plurality of talk CODECs. Assume that an entrant CODEC A talked first, and an entrant CODEC B then talked in response to the talk given by the entrant CODEC A. In this case, the entrant CODEC B is the last talk CODEC. This last talk CODEC is specified by specifying a talk CODEC that outputs the maximum voice level at each time point, and detecting that the specified talk CODEC is replaced. As described above, however, voice data input from talk CODECs to the MCU may vary considerably in level.
If, therefore, a talk CODEC with a low voice level talks when a talk CODEC with a high voice level stops talking, the MCU may not recognize the replacement of the talk CODEC. As a result, the MCU may erroneously recognize the last talk CODEC to be specified. In this case, a picture of the participant at the entrant CODEC with a high voice level may be erroneously displayed as a picture of the participant at the current talk CODEC in spite of the fact that this talk CODEC is not the last talk CODEC.