The present invention relates generally to a system for reproducing input audio signals via a plurality of speakers after having applied predetermined delay-involving signal processing to the audio signals, to thereby localize sound images of direct sounds in a desired range including areas outside a space surrounded by the speakers. More particularly, the present invention relates to a technique to, while realizing a good sound image localization effect, achieve a spatial impression and a feeling of depth as if sound images were in a real sound field space.
The sound image localization techniques are generally intended for freely controlling sound images to be localized beyond the positional restrictions of speakers, and one such technique is known which is based on cancellation of the so-called "cross talks" between the two ears of a listener (inter-ear cross talk cancellation method, e.g., U.S. Pat. No. 4,118,599 and U.S. Pat. No. 5,384,851) as will be described below.
According to the conventional stereophonic reproduction, as shown in FIG. 2, sound images are localized in a sectorial plane extending from speakers 10 and 12 away for a listener 14 within an included angle .alpha. (i.e., the range denoted by hatching in the figure). The reason why the sound image localization is limited to the range within the included angle .alpha. is the presence of interear cross talk components. Namely, as shown in FIG. 3, the sound output from the right speaker 12 reaches the right ear of the listener 14 and also reaches the listener's left ear slightly later than the right ear. In this case, the part or component of the right-speaker sound reaching the left ear is called the inter-ear cross talk. Similarly, the sound output from the left speaker 10 has a cross talk component reaching the listener's right ear.
In the example of FIG. 3, it is possible to cancel the cross talk component and localize the sound image outside the right speaker 12, by outputting via the left speaker 10 a reverse-phase signal at appropriate timing to cancel out the sound reaching the left ear from the right speaker 12, as shown in FIG. 4. Complete cancellation of the cross talk component permits a sound image to be localized just on the right-hand side of the listener 14 as depicted at R'. If the listener 14 is in the middle between the speakers 10 and 12, the distances between the ears and speakers 10, 12 equal, and time delay of the cross talks with respect to the main sounds, at the most, falls within a time corresponding to the inter-ear distance. Thus, assuming that the listener's inter-ear distance is 20 cm, the cross talk time delay will be about 0.6 ms. This means that the cross talks can be cancelled out by generating reverse-phase cancelling signals 0.6 ms later than the original or main signals.
Various other sound localization techniques than the above-mentioned are also known, such as one simulating a transfer function between ears of a listener and left and right loudspeakers and (disclosed in, for example, U.S. Pat. No. 5,046,097 and U.S. Pat. No. 5,105,462), and another simulating an auditory frequency sensitivity in a vertical direction so as to localize a sound image in a position above a speaker.
Although the known sound image localization control can localize a sound image of a direct sound outside a space surrounded by a plurality of speakers, spatially reflected sounds of the localized sounds can not be produced by such control alone, so that the localized sounds would unavoidably present some unnaturalness as if only one sound were in a non-acoustic room and a feeling of a sound field could never be obtained in the past. Theoretically, it may be possible to impart the sound field effect by providing a multiplicity of sound image localization control systems to localize reflected sound images in different positions to thereby produce multiple spatially reflected sounds around the listener. But, this approach requires an increased size and cost of the device employed and never allows a multiplicity of like sounds to be aurally differentiated from one another, thus making it unrealistic to attain the effect of causing the listener to feel spatially reflected sounds through processes based on the above-mentioned principle. This is because any cross talk signals must be completely removed in order to achieve cancellation of the inter-ear cross talks for a sound image localization effect. Namely, there arises no problem with signals to be used for localization of a single sound source. Also, a good localization effect can be obtained even with signals to be used for two or more sound sources as long as they are sufficiently different in nature, because these signals are so independent of each other to cause no significant interferences therebetween. However, where sound images of a plurality of signals of similar nature are to be localized simultaneously, respective cross talk signals would inevitably resemble each other to bring about unwanted interferences therebetween, thus increasing the possibility of impairing the cross talk cancellation effect. Further, where a plurality of spatially reflected sounds originating from a given sound source are to be localized one by one on the principle of the above-mentioned sound image localization processing, the reflected sounds tend to be generally similar in nature since they are from the same original sound. By contrast, cancelling signals responsive to subtle differences in time and direction are highly correlated to each other so that they cause interferences therebetween which impair the cross talk cancellation effect.