Conventionally, an audio reproduction technique (referred to as “virtual acoustic image localization technique” from now on) has been known which uses only two speakers, and has a listener perceive as if a sound source were present at any desired position in space.
A method of carrying out the virtual acoustic image localization technique is shown in the following Non-Patent Document 1, for example. FIG. 9 shows a configuration thereof.
According to Non-Patent Document 1, the virtual acoustic image localization technique measures (or estimates) a transfer characteristic from a desired position in a space to ears of a person at any desired position in the same space in advance, and generates signals considered to reach the ears by a convolution of the transfer characteristic into an input sound source.
The signals thus generated are called “binaural signals”, and can make a listener feel as if the sound source were present at any given position by providing the binaural signals to the ears using a reproduction device such as headphones.
However, when the reproduction device is speakers, the following cross-talk cancellation processing becomes necessary to bring the binaural signals to the ears properly.
For example, in the speaker reproduction, if the signal to be provided to a first ear (right ear, for example) is reproduced with a first speaker (right speaker, for example) directly, “crosstalk” will occur in which the sounds produced from the right speaker reach not only to the right ear via a space transfer function G11, but to the left ear via the space transfer function G12, thereby being unable to provide the binaural signals to both ears properly.
In the case of being unable to provide the binaural signals to both ears properly, a problem occurs in that the acoustic image is not localized at a target position.
To solve the problem, the virtual acoustic image localization technique based on the speaker reproduction generally carries out cross-talk cancellation processing to suppress the crosstalk.
As for the example shown in FIG. 9, it carries out cross-talk cancellation processing using filters H11, H12, H21 and H22 so that audio signals z1 and z2 received by the listener's ears agree with dummy head outputs x1 and x2. This makes it possible to provide the right and left binaural signals accurately.
However, the foregoing cross-talk cancellation processing often causes deterioration in the sound quality because center-localized components (such as speech or vocal components) to be localized at the center are perceived to be pulled back, and hence cannot be heard clearly or are perceived as having echoes.
In addition, since it weakens low frequency components, it detracts impressive low frequency feeling.
Here, as for the center-localized components and low frequency components, in-phase components are dominant in both of them. In the following, a reason why the in-phase components are dominant will be described.
When generating the binaural signals for causing a particular sound source to be localized at the center, it is natural that the binaural signals are generated on the assumption that the sound source is placed in front of the listener.
When the sound source is placed in front of the listener, the sounds will arrive at the left ear and right ear of the listener almost at the same time. This can be understood from a reason that since a human face is almost symmetrical, the transfer characteristic from the frontal sound source position to the right ear is nearly equal to the transfer characteristic from the frontal sound source position to the left ear.
Not only in the binaural signals, but also in an ordinary stereo sound source, the center-localized components are recorded in a nearly right and left in-phase manner.
Accordingly, in the binaural signals and ordinary stereo signals, the in-phase components are dominant in the center-localized components. There are many cases where they are completely in-phase signals.
Next, when generating binaural signals that will cause a sound source to be localized at a 90-degree right side of a listener, the binaural signals are generated on the assumption that the sound source is placed at a 90-degree right side of the listener.
When the sound source is placed at the 90-degree right side of the listener, sounds will arrive at the right ear, first, and then at the left ear with a delay corresponding to the width of the face (difference in distance between the right and left ears).
It is known that a low frequency component is apt to diffract in comparison with middle to high frequency components. Thus, sounds with their amplitude intensity being little attenuated as compared with the sounds arriving at the right ear bend around and arrive at the left ear, as well.
In other words, the binaural signals become signals in which the signal for the right ear is output first, and then the signal for the left ear is output after a fixed time period. As for the low frequency components, the amplitude intensity difference between the right and left is small.
Here, the fixed time period, which is a delay time of a sound wave of about the face width, corresponds to the delay time of about 20-30 samples in a DVD audio signal sampled at 48000 Hz, for example.
Consider the case where the low frequency signal is 100 Hz or less. Then, its wavelength becomes 480 samples or more for one period.
Accordingly, even if delaying the low frequency signal of 100 Hz by 30 samples corresponding to the delay time of the face width, its phase is delayed only 1/16λ or less (where λ is a wavelength), which can be considered to be almost an in-phase signal without any problem.
At angles other than the right side 90 degrees, it is natural that the phase delays at the right and left become smaller than that.
Thus, as for the binaural signals, the low frequency components can be considered nearly in-phase components. In ordinary stereo sources, although the low frequency components are sometimes recorded while providing amplitude difference between the right and left, they are usually recorded as nearly in-phase components.
For the foregoing reasons, as for the center-localized components and low frequency components, the in-phase components are dominant in both of them.
Here, a case where in-phase component signals are input to the foregoing cross-talk cancellation processing will be described.
FIG. 10 shows diagrams illustrating time responses of signals output from the audio device when the in-phase component signals are input to the cross-talk cancellation processing.
Here, they are schematic diagrams when approximating a transfer characteristic Hd by impulses and when approximating a transfer characteristic Hx by impulses involving delay and attenuation. Even if such approximation is not made, a rough inclination of the time response is the same.
When the in-phase components are input, the output signals of the audio device have the same time response for the right and left as shown in FIG. 10: Their signs are inverted at fixed time intervals, and the response continues with attenuation.
In FIG. 10, each positive side impulse at time zero (see (a)) is a component arriving at an ear closer to the speaker, and the entire response portion following (a) (see (b)) operates as a signal for cancellation.
About the ears of the listener sitting at the position supposed in the design stage of the cross-talk cancellation processing (referred to as “standard position” from now on), the response portions (b) cancel out each other, and the crosstalk is canceled completely.
However, when the listener shifts from the standard position even slightly, the response portions (b) do not cancel out each other so that the listener perceives deterioration in the sound quality with echoes.
In an actual listening environment, a listener is seldom sitting at the standard position so that in many cases the center-localized component signals have echoes. Thus, the acoustic image is pulled back, and the sound quality deteriorates as well.
FIG. 11 is a diagram showing a result of the frequency analysis of FIG. 10.
The frequency characteristics of the output signals of the cross-talk cancellation processing to which the in-phase components are input have a peak in a middle range component of about 1000 Hz-3000 Hz as shown in FIG. 11. Thus, it is found that the low frequency component is greatly attenuated compared with the peak portion.
It is found in FIG. 11 that the low frequency signal of 100 Hz is attenuated by about 18 dB as compared with the middle to high frequency signal of 2000 Hz.
As described above, in the conventional cross-talk cancellation processing, the center-localized components are pulled back theoretically, which causes the sound quality deterioration such as provided with echoes and the sound quality deterioration such as a weakened low frequency signal.
Besides the cross-talk cancellation processing disclosed in Non-Patent Document 1, the cross-talk cancellation processing is disclosed in the following Patent Documents 1 and 2, for example.
However, since the cross-talk cancellation processing operates in the completely same trend when the in-phase signals are input, the center-localized components are pulled back theoretically, which causes the sound quality deterioration such as provided with echoes and the sound quality deterioration such as a weakened low frequency signal.    Non-Patent Document 1: “Acoustic System and Digital Processing”, Corona Publishing Co., Ltd., March 1995, p. 233-p. 237.    Patent Document 1: Japanese Patent Laid-Open No. 2000-506691.    Patent Document 2: Japanese Patent Laid-Open No. 7-46700/1995.
With the foregoing configuration, the conventional audio device can bring, when the reproduction device is speakers, the binaural signals to the ears properly by carrying out the cross-talk cancellation processing. However, it has a problem of bringing about the sound quality deterioration of the center-localized components or low frequency components.
The present invention is implemented to solve the foregoing problem. Therefore it is an object of the present invention to provide an audio device capable of achieving good quality cross-talk cancellation processing which does not bring about the sound quality deterioration of the center-localized components or low frequency components.