There has been conventionally known a wavefront synthesis technique for reproducing a sound field by use of a planar speaker array or linear speaker array. Such a wavefront synthesis technique can be used for next-generation bidirectional communication or the like as illustrated in FIG. 1, for example.
In FIG. 1, next-generation bidirectional communication is made between a space P11 in which a talker W11 is present and a space P12 in which a talker W12 is present.
Specifically, in the space P11, a sound field A made of the audio mainly issued by the talker W11 is picked up by a linear microphone array MCA11 configured of a plurality of longitudinally-arranged microphones as illustrated and a resultant sound source signal is transmitted to the space P12.
In the example, the illustrated arrow indicates a direction in which the audio of the talker W11 as sound source propagates, and the audio of the talker W11 arrives at an angle θ viewed from the linear microphone array MCA11 and is picked up. In the following, the angle θ, or an angle formed between the direction in which the audio propagates from the sound source and the direction in which the microphones configuring the microphone array are arranged will be denoted as arrival angle θ.
In the space P12, a speaker drive signal for reproducing the sound field A is generated from the sound source signal transmitted from the space P11. Then, the sound field A is reproduced on the basis of the speaker drive signal generated by a linear speaker array SPA11 configured of a plurality of longitudinally-arranged speakers in the space P12 as illustrated.
In the example, the illustrated arrow indicates a direction in which the audio output from the linear speaker array SPA11 and directed to the talker W12 propagates. An angle formed between the propagation direction and the linear speaker array SPA11 is the same as the arrival angle θ.
Incidentally, though not illustrated here, a linear microphone array is provided also in the space P12, a sound field B configured of the audio mainly issued by the talker W12 is picked up by the linear microphone array and a resultant sound source signal is transmitted to the space P11. Further, in the space P11, a speaker drive signal is generated from the sound source signal transmitted from the space P12 and the sound field B is reproduced by the linear speaker array (not illustrated) on the basis of the resultant speaker drive signal.
Incidentally, when a sound field is reproduced by use of a microphone array or speaker array in this way, an infinite number of speakers and microphones need to be arranged in order to reproduce the sound field in a physically accurate manner. For example, when the speakers or microphones are discretely arranged as in the example illustrated in FIG. 1, spatial aliasing is caused.
The highest spatial frequency (which will be denoted as upper limit spatial frequency below) klim, which is not violated by the spatial aliasing, is determined by a lower spatial Nyquist frequency calculated by an interval of the speakers configuring the speaker array or an interval of the microphones configuring the microphone array.
That is, assuming the interval of the microphones as dmic and the interval of the speakers as dspk, the upper limit spatial frequency klim is found in the following Equation (1).
      [          Mathematical      ⁢                          ⁢      formula      ⁢                          ⁢      1        ]                                            k            lim                    =                      min            ⁡                          (                                                π                                      d                    mic                                                  ,                                  π                                      d                    spk                                                              )                                                            (          1          )                    
The thus-acquired upper limit spatial frequency klim has an effect on localization of a sound image, and preferably takes a higher value generally.
Further, a relationship between a frequency (which will be denoted as temporal frequency below) f of a sound source signal and a spatial frequency k is as indicated in the following Equation (2). Note that c indicates a sound speed in Equation (2).
      [          Mathematical      ⁢                          ⁢      formula      ⁢                          ⁢      2        ]                                f          =                                    c                              2                ⁢                π                                      ⁢            k                                                (          2          )                    
Therefore, when no solution is particularly taken, the highest temporal frequency (which will be denoted as upper limit temporal frequency below) flim, which is not violated by the spatial aliasing, can be found in Equation (2). The upper limit temporal frequency flim has an effect on sound quality, and is assumed to generally demonstrate high reproducibility or high fidelity (HiFi) at a higher value.
The spatial aliasing will be described herein. FIG. 2 illustrates a spatial spectrum based on a difference in arrival angle of a planar wave of the audio from the sound source, which is also called angle spectrum since a position of a spectrum peak of the spatial spectrum changes depending on the arrival angle of the planar wave. Note that, in FIG. 2, the vertical axis indicates a temporal frequency f and the horizontal axis indicates a spatial frequency k. Further, a line L11 to a line L13 indicate a spectrum peak, respectively.
There is illustrated, on the left side in the Figure, how the angle spectrum is at the arrival angle θ=0 of the original planar wave before spatial sampling is performed or before the planar wave is picked up by the microphone array. In the example, the spectrum peak appears in the positive direction of the spatial frequency k as indicated by the line L11.
To the contrary, there is illustrated, on the right side in the Figure, how the angle spectrum of the sound source signal acquired by performing the spatial sampling on the planar wave at the arrival angle θ=0 or picking up the planar wave by the microphone array configured of the discretely-arranged microphones is.
In the example, the line L12 corresponds to the line L11 and indicates a spectrum peak to essentially appear. Further, the line L13 indicates a spectrum peak appearing due to the spatial aliasing, and in the example, the spatial aliasing is remarkably caused in an area in which the temporal frequency f is higher than the upper limit temporal frequency flim and the spatial frequency k is negative.
Except for the spatial aliasing, the spectrum peak should essentially appear in the area in which the spatial frequency k is negative when the arrival angle θ of the planar wave is at π/2≤θ≤π.
Therefore, in the example illustrated on the right side in the Figure, when the picked-up planar wave (sound field) is reproduced, an audio image is localized as if planar wave signals are mixed at various different angles due to an effect of the spectrum peak caused by the spatial aliasing.
Thus, even when the speaker drive signal for wavefront synthesis is generated from the sound source signal acquired by the sound pickup and the planar wave is reproduced by the speaker array on the basis of the speaker drive signal, a listener cannot feel the correct planar wave. Additionally, when the talkers approach each other in the next-generation bidirectional communication by way of example, not a planar wave but a spherical wave is caused, but is similar to a planar wave
As described above, an audio image cannot be accurately localized when spatial aliasing is caused. Thus, there is proposed a technique for further enhancing the upper limit temporal frequency flim, which is not violated by spatial aliasing, by use of two speaker arrays including high-tone speaker unit and low-tone speaker unit having mutually-different speaker intervals for reduction in spatial aliasing (see Patent Document 1, for example). With the technique, it is possible to accurately reproduce a signal having a higher temporal frequency.