Three-dimensional video technologies can provide pictures with depth information that complies with a stereoscopic principle. Three-dimensional technologies use a microphone array to pick up a sound, and can obtain an enhanced sound and information about direction and distance of the sound by using a beam-forming method. A speaker array is used to replay the sound, and methods such as wave field synthesis are used to replay the sound with a sense of direction and a sense of distance. Some experimental systems regarding three-dimensional videos or three-dimensional audios are already available in the prior art.
FIG. 1A is a horizontal view corresponding to an original site layout diagram in the prior art. As shown in FIG. 1A, seven persons attend the conference. Participant P1 is seated at the first row, and participant P2 is seated at the last row. FIG. 1B illustrates a scene of the site shown in FIG. 1A on a screen at a reproduction site in the prior art. If a participant is seated at point O at the reproduction site, it should be noted that point O, P1, and P2 are exactly located on the same straight line. During the reproduction of the sound field, if the distance of the sound at the reproduction site is not processed or is poorly processed, the voices of P1 and P2 do not match the positions of P1 and P2. In this case, when P1 and/or P2 speaks, the participant seated at point O is unable to distinguish who is speaking. In addition, a similar problem occurs when the scene is reproduced by using a three-dimensional video. FIG. 2 is a planform of a site layout in the prior art. According to the stereoscopic imaging and display principles, when an object at site 1 is displayed at site 2 by using the three-dimensional display technology, participants at site 2 may look like being seated before display 21, for example, at position C, or after display 21, for example, at position B. Supposing the object in FIG. 2 is a participant who is seated at position A at site 1, when the object is reproduced at site 2, if it is displayed at a position before the display, for example, position C, but the sound is sent from position B, communications and discussions between participants at site 2 and participants at site 1 may be affected.
During the implementation of the present invention, the inventor discovers the following problems in the prior art: To obtain more accurate information about the direction and distance of a sound, the number of microphones and/or the space between microphones in the microphone array is generally increased. For a microphone array, the greater the number of microphones is, the larger the space between microphones is, and the more accurate the judgment about the direction and distance of the sound is. However, the size of the microphone array is also increased. Conversely, if the number of microphones and the space between microphones are decreased, the accuracy of the direction and distance of the sound obtained by the microphone array may be reduced. Thus, to replay the sound in a scene where the distance of the sound needs to be considered, for example, in a scene where the speechmaker is allowed to move freely, or in a conferencing system with multiple rows as shown in FIG. 1A, or in a three-dimensional video display system as shown in FIG. 1B, listeners are unable to determine the position of the speechmaker immediately and accurately. Thus, the eye to eye effect of the communication is affected.