Audio signals of each channel of the two right and left channels carrying stereo music signals recorded on records, compact discs, and so forth, often are made up of audio signals from multiple sound sources. Such stereo audio signals are often provided with level differences and recorded in the respective channels so as to realize sound image localization of the multiple sound sources between speakers when played using two speakers.
For example, if we say that we have five sound sources MS1 through MS5, the signals of which are S1 through S5, which are to be recorded as audio signals SL and SR in the form of the two channels left and right, the signals S1 through S5 of the sound sources MS1 through MS5 are each given level differences between the two left and right channels, so as to be added and mixed into the audio signals of the respective channels, as shown here.SL=S1+0.9S2+0.7S3+0.4S4SR=S5+0.4S2+0.7S3+0.9S4
Playing stereo audio signals recorded with the signals of the sound sources MS1 through MS5 having been panned to the two left and right channels with level difference through two speakers, 1L and 1R, as shown in FIG. 32 for example, gives the listener 2 the perception of the sound images A, B, C, D, and E, corresponding to the sound sources MS1, MS2, MS3, MS4, and MS5. Also, these sound images A, B, C, D, and E are known to be localized between the speaker 1L and the speaker 1R.
Also, in the event that the listener 2 wears a headphone set 3 as shown in FIG. 33, and plays the above stereo audio signals of the two left and right channels with a left speaker unit 3L and right speaker unit 3R of the headphone set 3, the listener 2 can be given the perception that the sound images A, B, C, D, and E, corresponding to the sound sources MS1, MS2, MS3, MS4, and MS5, are within the head or nearby.
However, with such a playing method, sound images are localized only in a narrow area between the two speakers or speaker units, and further, sound images are often perceived to be overlapping each other.
An arrangement may be conceived with the case of FIG. 32 wherein the spacing between the two speakers 1L and 1R is spread in order to avoid overlapping sound images, but in such cases, clear sound image localization has not been obtainable, with the center area sound image (sound image C in FIG. 32) being unclear. Of course, the sound images corresponding to the sound sources could not be localized at positions freely, or behind or to the side of the listener.
There has also been a problem in that in the event of playing the same stereo audio signals with the headphone set 3, the sound images A through E are localized within the head from nearby the left ear to nearby the right ear as shown in FIG. 33, leading to sound images being localized in a range even narrower than with speaker output, and furthermore in an overlapped state, resulting in an unnatural-sounding sound field.
With regard to such a problem, the three or more channels of audio signals from the original sound sources can be separated and synthesized from the two-channel stereo audio signals for example, and the separated and synthesized multi-channel audio signals played by speakers corresponding to each of the multiple channels, thereby yielding a natural sound field. This also enables sound images to be synthesized behind the listener and so forth, for example.
As for methods for achieving such an object, there is a method using a matrix circuit and directivity enhancing circuits. This principle will be described with reference to FIG. 34.
Signals L, C, R, and S, of four types of sound sources, are prepared, and these sound source signals are used to obtain two sound source signals Si1 and Si2 by encoding processing with the following synthesizing equations.Si1=L+0.7C+0.7S Si2=R+0.7C−0.7S 
The two signals Si1 and Si2 (two channels) generated in this way are recorded in a recording media such as a disk or the like, played from the recording media, and input to input terminals 11 and 12 of a decoding device 10 shown in FIG. 34. The four channels of sound source signals L, C, R, and S are separated from the signals Si1 and Si2 at the decoding device 10.
Specifically, the input signals Si1 and Si2 from the input terminals 11 and 12 are supplied to an addition circuit 13 and subtraction circuit 14, added to and subtracted from each other, thereby generating an addition output signal Sadd and Sdiff, respectively. At this time, the signals Si1 and Si2, and signals Sadd and Sdiff, are expressed as follows.Si1=L+0.7C+0.7S Si2=R+0.7C−0.7S Sadd=1.4C+L+R Sdiff=1.4S+L−R 
Accordingly, in signal Si1 the signal L, in signal Si2 the signal R, in signal Sadd the signal C, and in signal Sdiff the signal S, each have a level 3 dB higher than the other sound source signals, so each channel audio has preserved the characteristics of the respective sound source the best. Thus, taking each of the signal Si1, signal Si2, signal Sadd, and signal Sdiff, as the respective output signals, enables the sound source signals L, C, R, and S, of the four original channels, to be separated and output.
However, in this state, separation of sound image between the channels is insufficient. Accordingly, in the example shown in FIG. 34, the signal Si1, signal Si2, signal Sadd, and signal Sdiff, are output to output terminals 161, 162, 163, and 164, via directivity enhancing circuits 151, 152, 153, and 154 which increase the output levels.
Each of the directivity enhancing circuits 151, 152, 153, and 154 work to dynamically increase a channel signal of the signal Si1, signal Si2, signal Sadd, and signal Sdiff with a level which is greater than the other channel signals, so as to realize apparent improvement in separation from other channels.
Next, another conventional example will be described with reference to FIG. 35 through FIG. 37D. In this example, as shown in FIG. 35, decorrelation processing units 171, 172, 173, and 174 are provided instead of the directivity enhancing circuits 151, 152, 153, and 154 in the example in FIG. 34.
The decorrelation processing units 171 through 174 are each configured of filers having properties such as shown in, for example, FIG. 36A, FIG. 36B, FIG. 36C, and FIG. 36D, or FIG. 37A, FIG. 37B, FIG. 37C, and FIG. 37D.
With FIG. 36A, FIG. 36B, FIG. 36C, and FIG. 36D, decorrelation of the channels is realized by mutually shifting the phase at the hatched frequency bands. With FIG. 37A, FIG. 37B, FIG. 37C, and FIG. 37D, decorrelation of the channels is realized by removing bands differing among the channels.
Playing the pseudo 4-channel signals generated at the decoding device 10 shown in the example in FIG. 35 and output from the output terminals 161 through 164, from different speakers each, ensures noncorrelation among the channels, so sound field reproduction with a good spread can be realized.
The Patent Document to reference for this is PCT Japanese Translation Patent Publication No. 2003-515771.
However, with the method in FIG. 34 described above, while separation of sound sources of three or more encoded channels from the signals Si1 and Si2 can be realized to a certain extent, there are the following problems.
(1) While good separation can be obtained in a state where only one sound source is present, there is no difference in level among the channels in a state wherein all sound sources are present at generally the same level at the same time, so the directivity enhancement circuits 151 through 154 do not operate, and accordingly only 3 dB of separation can be ensured among the channels.
(2) The signal levels of the sound sources dynamically change due to the directivity enhancement circuits 151 through 154, and accordingly unnatural increases/decreases in sound readily occur.
(3) When two adjacent sound sources are present, one sound source may be dragged by the other.
(4) There are little separation effects except with sound sources encoded with separation in mind.
Also, the method described above with FIG. 34 also has the following problems. That is to say, with the method using the decorrelation processing in the example in FIG. 34, frequency band phases are shifted or bands are removed regardless of the type of sound source, so while a sound field with a good spread can be obtained, sound sources cannot be separated, and accordingly a clear sound image cannot be made.
In the event of attempting to separate sound sources from 2-channel stereo signals, the method using directivity enhancement circuits has problems in that separation among sound sources in the event of multiple sound sources being present at the same time is insufficient, there are unnatural volume changes, unnatural sound source movements, and further, sufficient advantages cannot be easily obtained unless pre-encoded sound sources are prepared.
Also, with the pseudo-multi-channel method using decorrelation processing, there has been the problem that the sound image of a sound source is not clearly localized.
It is an object of the present invention to provide an audio signal processing device and method, whereby, from two systems of audio signals in which audio signals of multiple audio sources are included, the audio signals of the multiple audio sources can be suitably separated.