The present invention relates to a sound localization control apparatus which controls a sound-image location in a sound field in which several kinds of artificial sounds are sounded.
Conventionally, several kinds of sound localization methods are proposed in order to obtain a desired sound-field effect which simulates a sound-field effect of a theater or an auditorium. FIG. 1 shows one of measuring methods by which the sound-field effect of the theater to be simulated is experimentally measured by use of a dummy head DH. On the basis of results of the measurements, sounding data are processed so as to obtain a sound localization effect which is similar to that of the real theater. The dummy head DH shown in FIG. 1 has a predetermined shape which is similar to the shape of a human head. At positions where right and left ears are located in the human head, microphones MR and ML are respectively attached to the dummy head DH.
In FIG. 1, a location of a sound source can be defined by a horizontal angle .phi., a vertical angle .theta. and a distance D (which is fixed at 1 m, for example). The dummy head DH detects the sounds produced from the above sound source in form of the waveforms which are transmitted to the left and right ears, thus measuring a difference between the waveform detected and an original waveform representing the sound produced from the sound source. Such measurement is carried out with respect to the sounds to be respectively produced from the sound sources which are respectively arranged in a virtual space as shown in FIG. 1. On the basis of data representing the results of the measurements, a so-called head-related transfer function is computed with respect to each of the locations of the sound sources. Herein, the head-related transfer function is used to convert the waveform of the sound produced from the sound source into another waveform corresponding to the sound which is transmitted to the right ear or left ear of the dummy head DH.
Next, an electronic configuration of a finite-impulse response filter (i.e., FIR filter) is determined responsive to the head-related transfer function computed. Then, acoustic data corresponding to the sound produced is applied to the FIR filter corresponding to a desired sound-image localization (hereinafter, referred to as a target sound-image location). In the FIR filter, the acoustic data is processed and is subjected to digital filtering. When hearing the sound which is created from the output of the FIR filter, a person (i.e., listener) who listens to the sound produced may feel as if the sound is actually produced from the target sound-image location.
When configuring the FIR filter corresponding to the head-related transfer function, it is possible to compute the head-related transfer function as described above. Or, an impulse (or tone burst) is produced from the sound source, and then, an amplitude of its impulse-response waveform is used as a coefficient, by which the FIR filter is configured.
According to an example of the sound localization control apparatus which employs the aforementioned method of measuring the sounding effects, a mixing ratio of reverberation sounds is controlled so as to simply control the sound-image localization.
FIG. 2 is a block diagram showing a diagrammatical configuration of an example of the sound localization control apparatus. In FIG. 2, a numeral 1 designates an input terminal to which the acoustic data is applied; and numerals 2a and 2b designate multipliers to which the acoustic data is supplied through the input-terminal 1. The multipliers 2a and 2b function to divide the acoustic data by use of multiplication coefficients 2ak and 2bk which are supplied from a control portion (not shown). These multiplication coefficients 2ak and 2bk are determined such that a sum of them becomes equal to "1". Thus, a part of the acoustic data is outputted from the multiplier 2a and is supplied to multipliers M1 to M12, while another part of the acoustic data is outputted from the multiplier 2b and is supplied to a reverberation circuit RV.
Incidentally, a mixing ratio by which the acoustic data is mixed with reverberation data is set small when the target sound-image location is relatively close to the listener, while it is set large when the target sound-image location is relatively far from the listener.
The reverberation circuit RV forms the reverberation data on the basis of the acoustic data which is supplied thereto through the multiplier 2b. The reverberation data is divided into two components, i.e., a right-channel component and a left-channel component. The right-channel component of the reverberation data is supplied to an adder 3R, while the left-channel component of the reverberation data is supplied to an adder 3L. On the basis of multiplication coefficients C1 to C12 given from the aforementioned control portion, the multipliers M1 to M12 respectively carry out multiplications on the acoustic data which is outputted from the multiplier 2a.
Symbols "dir1" to "dir12" designate sound-directing devices, which respectively perform convolution operations based on the head-related transfer function on the output data of the multipliers M1 to M12. Thus, each of the sound-directing devices eventually produces a right-channel component and a left-channel component with respect to the acoustic data. Then, the right-channel component of the acoustic data is supplied to the adder 3R, while the left-channel component of the acoustic data is supplied to the adder 3L. Each of the sound-directing devices is configured as shown in FIG. 3, in which two FIR filters are connected in parallel. Herein, the FIR filter can be embodied by a LSI circuit exclusively used for performing the convolution operation or a digital signal processor (i.e., DSP), while a coefficient ROM storing coefficients which are used for the convolution operation is externally provided.
In order to simplify the description, each of the sound-directing devices dir1 to dir12 is configured with respect to the horizontal direction only. For example, the sound-directing device dir1 corresponds to a front direction of the listener, in other words, the horizontal angle of the sound-directing device dir1 is set at 0.degree., while the sound-directing device dir2 corresponds to a certain right-side direction which deviates from the front direction of the listener by 30.degree., in other words, the horizontal angle of the sound-directing device dir2 is set at 30.degree.. Similarly, the horizontal angles of the adjacent sound-directing devices are deviated from each other by 30.degree.; therefore, the last sound-directing device dir12 corresponds to a certain left-side direction which deviates from the front direction of the listener by 30.degree., in other words, the horizontal angle of the sound-directing device dir12 is set at 330.degree.. Each of the sound-directing devices performs the convolution operation based on the head-related transfer function corresponding to the sound source whose sound-image location corresponds to the horizontal angle thereof.
Now, the acoustic data whose sound-image location must be fixed at the location defined by the horizontal angle 30.degree. is applied to the input terminal 1, through which the acoustic data is supplied to the multipliers 2a and 2b. The multipliers 2a and 2b receive the multiplication coefficients 2ak and 2bk respectively, which correspond to the distance between the listener and the target sound-image location. By use of the multiplication coefficients 2ak and 2bk, the multipliers 2a and 2b respectively perform the multiplications on the acoustic data. The results of the multiplications are delivered to the multipliers M1 to M12 and the reverberation circuits RV as described before. In this case, a direction in which the sound corresponding to the acoustic data is to be localized (hereinafter, simply referred to as a target sound-image direction) corresponds to the horizontal angle 30.degree.. Thus, the aforementioned control portion automatically selects the sound-directing device dir2 performing the convolution operation based on the head-related transfer function corresponding to the sound source which is located in a direction of horizontal angle 30.degree.. In other words, only the multiplication coefficient C2 which is supplied to the multiplier M2 is set at "1", while the other multiplication coefficients for the multipliers M1 and M3 to M12 are all set at "0".
In the sound-directing device dir2 to which the acoustic data outputted from the multiplier M2 is only supplied, the convolution operation is performed on the acoustic data so as to produce the right-channel component and left-channel component for the acoustic data, which are respectively supplied to the adders 3R and 3L.
Meanwhile, the output data of the multiplier 2b is converted into the reverberation data by the reverberation circuit RV, so that the right-channel component and left-channel component for the reverberation data are respectively supplied to the adders 3R and 3L.
Thereafter, a sum of the acoustic data outputted from the sound-directing device dir2 and the reverberation data outputted from the reverberation circuit RV is outputted from the sound localization control apparatus shown in FIG. 8.
In the meantime, when locating the sound image in a direction of horizontal angle 45.degree., the multiplication coefficients C2 and C3 for the multipliers M2 and M3 are set at the same value, while the other multiplication coefficients for the multipliers M1 and M4 to M12 are all set at "0". Since the multipliers M2 and M3 are only activated, the sound-directing devices dir2 and dir3 which correspond to the horizontal angles 30.degree. and 60.degree. respectively are only activated.
More specifically, the acoustic data is supplied to the multiplier 2a in which the multiplication using the multiplication coefficient 2ak is performed, and then, the output data of the multiplier 2a is delivered to the multipliers M1 to M12. In this case, however, only the sound-directing devices dir2 and dir3 receive the acoustic data through the multipliers M2 and M3 which are activated, while the other sound-directing devices do not receive the acoustic data. In the sound-directing device dir2, the convolution operation is performed on the acoustic data on the basis of the head-related transfer function corresponding to the sound source which is located in a direction of horizontal angle 30.degree.. In another sound-directing device dir3, another convolution operation is performed on the acoustic data on the basis of another head-related transfer function corresponding to another sound source which is located in a direction of horizontal angle 60.degree.. Then, the right-channel components for the acoustic data respectively outputted from the sound-directing devices dir2 and dir3 are supplied to the adder 3R, while the left-channel components for the acoustic data respectively outputted from the sound-directing devices dir2 and dir3 are supplied to the adder 3L.
On the other hand, the multiplier 2b performs the multiplication using the multiplication coefficient 2bk on the acoustic data, so that the output data of the multiplier 2b is supplied to the reverberation circuit RV. In the reverberation circuit RV, the right-channel component and left-channel component for the reverberation data are computed, and then, they are respectively supplied to the adders 3R and 3L.
In the adders 3R and 3L, the acoustic data outputted from the sound-directing devices dir2 and dir3 are added with the reverberation data outputted from the reverberation circuit RV; and finally, two-channel data corresponding to the original acoustic data are obtained.
In the sound localization control apparatus described above, a distance between the listener and the sounding point (i.e., sound source) is controlled by the mixing ratio with respect to the reverberation sounds. Therefore, it may be possible to obtain a weak impression by which the listener may feel as if the size of the room is changed in response to the above mixing ratio. However, the distance between the listener and the sound source cannot be controlled well so that the sound-image location cannot be fixed well.
The above-mentioned drawback may be eliminated by changing the aforementioned distance D (which has been previously fixed at 1 m) and re-designing the electronic configuration of the apparatus such that the sound-directing devices are further provided with respect to the predetermined distances as well as the predetermined directions. In such case, however, a large number of the sound-directing devices should be required, resulting that a system size of the apparatus must become extremely large.
According to the results of the experiments which are carried out with respect to sampling frequencies ranging from 40 kHz to 50 kHz, when embodying the head-related transfer function with respect to each of the distances as well as each of the directions, the FIR filter must be configured by hundreds of operational circuits (more specifically, thousands of operational circuits), and such large-scale FIR filter should be provided for each of the right channel and left channel.
And, it is also required that the sound localization control apparatus utilizing the above-mentioned large-scale FIR filter should cover the space having a semi-spherical shape as shown in FIG. 1, the radius of which is set at 10 m, for example. In this case, the apparatus should control the sound-image localization with respect to twelve directions (i.e., every 30-degree direction in 360.degree.) as well as one-hundred distance stages (i.e., every 100 mm distance in 10 m). In order to do so, the apparatus should have an operating capacity by which the multiplications and additions can be performed by one-hundred and twenty million times per one second, wherein such number of "one-hundred and twenty million" is calculated as follows: 2 (representing a number of the FIR filters to be required).times.12 (representing a number of the directions).times.100 (representing a number of the distance stages).times.50000 (Hz).
As the method which controls the sound-image location to be moved arbitrarily by use of the sound-directing devices, there are provided two methods, i.e., a coefficient time-varying method and a virtual speaker method, for example. FIG. 4 is a block diagram showing an example of the sound localization control apparatus employing the coefficient time-varying method. In FIG. 4, acoustic data S1 (e.g., digital data representing the sounds of the car running) is supplied to a time-varying sound-directing portion 1S.sub.1 and is divided into the left-channel component and right-channel component, which are respectively supplied to sound-directing devices 2L and 2R.
A control portion 3 outputs a pair of the coefficients, corresponding to the target sound-image location, which are respectively supplied to the sound-directing devices 2L and 2R. Thus, the acoustic data S1 is subjected to signal processing corresponding to the convolution operation using a pair of coefficients. Then, the right-channel component and left-channel component for the acoustic data S1 are respectively produced. Incidentally, a pair of the coefficients to be respectively supplied to the sound-directing devices 2L and 2R is read from a coefficient memory 4 in response to the target sound-image location by the control portion 3.
If there exists any other acoustic data (e.g., digital data representing the musical sounds produced from the musical instrument such as the trumpet) the sound image of which is to be localized, another time-varying sound-directing portion can be provided, in other words, a plurality of time-varying sound-directing portions can be provided in the apparatus. If another acoustic data S2 is supplied to another time-varying sound-directing portion 1S.sub.2, it is subjected to the signal processing as described above. Thereafter, the left-channel component of the acoustic data S1 and the left-channel component of the acoustic data S2 are added together by an adder 5L, while the right-channel component of the acoustic data S1 and the right-channel component of the acoustic data S2 are added together by an adder 5R. Thus, added data for the left channel is obtained from a terminal "L", while another added data for the right channel is obtained from a terminal "R".
Under the operation of the above-mentioned apparatus, it may be possible to smoothly move the target sound-image location with respect to the acoustic data S1 so that the listener may feel as if the car is running away. In this case, however, every time the target sound-image location is changed, the control portion 3 should read out a pair of coefficients, corresponding to the target sound-image location changed, from the coefficient memory 4 so as to supply the coefficients to the sound-directing devices 2L and 2R respectively. In such case, there is a possibility in that noises may be occurred at each time when the coefficients to be read from the coefficient memory 4 are changed. In order to avoid an occurrence of noises, the coefficient memory 4 should store plenty of coefficients, each pair of which corresponds to each of the locations which are arranged to cover the predetermined space as a whole. If a number of the coefficients, each pair of which corresponds to each of the sound-image locations actually measured in the predetermined space, is limited, it is necessary to perform an interpolation operation on plural pairs of the coefficients when computing a pair of coefficients corresponding to the sound-image location which is not actually measured. Incidentally, the control portion 3 is designed to change a pair of coefficients at each sampling period.
The above-mentioned coefficient time-varying method accurately works in accordance with a principle of the sound localization. Thus, it is expected that the sound image obtained is accurately and clearly localized at the target sound-image location. However, in order to obtain an ability to sufficiently control the sound localization, hundreds of or thousands of coefficients must be required for the sound-directing devices 2L and 2R respectively. In other words, it is necessary to provide a super-high-speed processor which can change over the hundreds of or thousands of coefficients while performing the interpolation operations at each sampling period (e.g., 20 .mu.s if the sampling frequency is 50 kHz). Further, the above super-high-speed processor must be provided for each of the sounds whose sound images are respectively localized at different locations. Since such super-high-speed processor is relatively expensive, the system cost required for the apparatus becomes extremely high. For this reason, the apparatus employing the coefficient time-varying method has not been manufactured.
Different from the above-mentioned coefficient time-varying method, the virtual speaker method does not vary the coefficients in real time so that the virtual speaker method uses the fixed coefficients, whereas this method requires a plenty of sound-directing devices. Each of the sound-directing devices corresponds to each of the locations which are tightly arranged in the predetermined space. Thus, instead of varying a plenty of coefficients in each sampling period, the virtual speaker method switches over the sound-directing device to which the acoustic data is supplied.
FIG. 5 is a block diagram showing an example of the sound localization control apparatus employing the virtual speaker method. Herein, twelve locations are determined in advance so that twelve pairs of the sound-directing devices (i.e., 9L.sub.1, 9R.sub.1, . . . , 9L.sub.12, 9R.sub.12). The acoustic data (S1, S2, . . . ) are supplied to the sound-directing devices in which they are subjected to signal processing corresponding to the convolution operation using a selected pair of the coefficients, so that two-channel data are eventually produced. When hearing the sounds corresponding to the two-channel data, the listener may feel as if the sounds are actually produced from a speaker which is located at a desired location corresponding to the selected pair of the coefficients. This speaker is called a virtual speaker which is not actually existed but from which the sounds are virtually produced.
When using two virtual speakers, the acoustic data can be allocated to the virtual speakers respectively by a predetermined ratio so that the sound-image location can be fixed at a desired point which exists between two virtual speakers. If the same amount of the acoustic data is allocated to each of the virtual speakers, the sound-image location can be fixed at a mid-point between two virtual speakers. Under the consideration of the above operating principle, by changing an allocation ratio by which the acoustic data is allocated to the virtual speakers respectively, it is possible to smoothly move the sound-image location between the virtual speakers.
FIG. 5 is a block diagram showing an example of the sound localization control apparatus employing the virtual speaker method. In FIG. 5, an allocating unit 6S1 contains multipliers 7L1 to 7L12 and 7R1 to 7R12, each of which performs a weighed multiplication when allocating a series of acoustic data represented as acoustic data S1. Another allocating unit 6S2 has a similar configuration of the allocating unit 6S1, so that each multiplier performs a weighted multiplication when allocating another series of acoustic data represented as acoustic data S2. Then, each of the pieces of the acoustic data S1 outputted from the allocating unit 6S1 is added with the corresponding one of the pieces of the acoustic data S2 outputted from the allocating unit 6S2 by each of adders 8L1 to 8L12 and 8R1 to 8R12 which are respectively coupled with sound-directing devices 9L1 to 9L12 and 9R1 to 9R12. Each of the sound-directing devices 9L1 to 9L12 and 9R1 to 9R12 performs a convolution operation corresponding to a location of its virtual speaker. Thus, the sound-directing devices 9L1 to 9L12 eventually output left-channel components for the acoustic data S1 and S2 mixed together, while the sound-directing devices 9R1 to 9R12 eventually output right-channel components for the acoustic data S1 and S1 mixed together. Finally, those left-channel components are added together by an adder 10L, while the right-channel components are added together by an adder 10R. As a result, two-channel data are eventually outputted from the adders 10L and 10R.
However, even when performing the virtual speaker method, it is not possible to clearly fix the sound-image location at the desired location. Because, the virtual speaker method basically functions to merely adjust an tone-volume balance between the virtual speakers when determining the sound-image location. Although a delay-time difference between the right-channel sound and left-channel sound should be adjusted in connection with the target sound-image location, the virtual speaker method merely adjusts such delay-time difference between the adjacent virtual speakers. Therefore, in order to obtain a clear sound-image localization fixed between the virtual speakers, it is necessary to reduce the delay-time difference between two virtual speakers which are arranged closely adjacent to each other such that the delay-time difference may be negligible.
In order to do so, however, it is necessary to provide an extremely large number of sound-directing devices, which eventually raise up the system cost for the apparatus. In the virtual speaker method, even if the number of the sounds to be localized (i.e., the number of the acoustic data applied) is increased, the sound localization control can be simply performed by merely increasing the number of the allocating units without increasing the number of the sound-directing devices. Thus, the virtual speaker method is advantageous in that the system cost may not be increased so much when increasing the number of the sounds to be localized.
As described before, the coefficient time-varying method is not realistic because the super-high-speed processors are required so that the system cost must be extremely increased.
Moreover, the virtual speaker method is not realistic because so many number of the sound-directing devices (e.g., hundreds of or thousands of sound-directing devices) are required in order to obtain a clear sound localization. If the number of the virtual speakers are reduced so that the density of the virtual speakers provided in the predetermined space is reduced, it is not possible to clearly put the sound-image location at a desired location between the virtual speakers.