For backing up broad-band networks and small and high-performance video/audio input/output devices, multi-channel voice interface conference systems for connecting between multi-points, as illustrated in FIG. 1, have been popular.
In such a conference system, it is anticipated that the number of connecting points, such as mobile-phones and the like, in a mobile environment will increase in the future, and accordingly channels of varying quality are mixed and also background noise increases.
A case where in such an environment, inter-four-point communications are conducted using a receiver device including two stereo-speakers 203-1 and 203-2 for example, as illustrated in FIG. 2, will be considered. In this case, receiving units 201-1, 201-2 and 201-3 each receive respective voices S1, S2 and S3 from the other three points. Then, a channel allocation/mixing unit 202 allocates/mixes three received channels of voices S1, S2 and S3 and allocates them to/into the speakers 203-1 and 203-2.
In this case, when for example the channel allocation/mixing unit 202 allocates the voices S1, S2 and S3 without taking into account their voice quality, sometimes a poor-quality voice S3 is normally positioned and heard between the two loud speakers 203-1 and 203-2. As a result, the articulation of the other good-quality voices S1 and S2 decreases due to the poor-quality voice S3, which is a problem.
In other words, receiving quality is dispersed by the influence of the distortion of a CODEC mounted on a terminal and, depending on a mixing method, the deterioration factor of a received voice affects the quality of voices from the other points, which is a problem.
The following Patent document 1 discloses a technique for comparing the number of voice data transmitting devices with the number of output speakers on the receiving side for each point and mixing voices when the number of output speakers is smaller than the number of voice data transmitting devices. However, this publicly known example does not take the quality of a received signal into consideration.
The following Patent document 2 discloses a technique for exercising sound-image normal position control in which a frequency band for generating upward direction auditory perception is focused in a voice conference. More specifically, by this technique, a frequency band is divided for each audio signal every certain plural number of channels. For a band from which directional perception can be obtained (the second and third frequency bands), a sound image is normally positioned using a plurality of speakers, while for a band from which directional perception cannot be obtained (the first and fourth frequency bands), sound is reproduced by a single speaker. This publicly known example is a technique for targeting the maximization of a sound-image normal position effect and commonly applying a frequency band process to each input channel, which does not also take the quality of a received signal into consideration.
The following Patent document 3 discloses a technique for generating a correspondence table between the horizontal coordinate position and a sound-image normal position area at the center of an image window in advance in a television conference system and distributing voices corresponding to a target image to each speaker according to a speaker output ratio peculiar to each sound-image normal position area, on the basis of this correspondence table. This publicly known example is a technique for determining the specification of a sound-image normal position using a table conversion based on the display position of simultaneously transmitted image data (horizontal coordinates) and which does not also take the quality of a received signal into consideration.    Patent document 1: Japanese Laid-open Patent Publication No. 2004-208051    Patent document 2: Japanese Laid-open Patent Publication No. 02-059000    Patent document 3: Japanese Laid-open Patent Publication No. 06-311511