A signal demultiplexing method based on ICA (Independent Component Analysis) is exemplified as one of methods to analyze input signals which are plural sounds collected via a plurality of microphones, and to demultiplex the input signals into each sound source signal. The signal demultiplexing method based on ICA optimizes a demultiplexing matrix under the condition that the sound sources are independent statistically each other, and carries out a filtering process to the input signals by use of the optimized demultiplexing matrix, and demultiplexes the input signals into each sound source signal. With regard to an art related to the signal demultiplexing method, an art disclosed in a non-patent literature 1 is exemplified.
The non-patent literature 1 discloses a signal demultiplexing method which can track an environmental change, such as a case that a sound source moves, through carrying out a learning process to the demultiplexing matrix by use of the input signals of plural frames which continue from the current frame to the past frames.
FIG. 29 is a block diagram showing an exemplified configuration of a signal processing device based on the method described in the non-patent literature 1. As shown in FIG. 29, the exemplified signal processing device includes a frequency transformation unit 100, a data memory unit 105, a demultiplexing matrix generation unit 102, a demultiplexed signal generation unit 103 and an inverse frequency transformation unit 104.
The exemplified signal processing device, which is shown in FIG. 29 and which is based on the method described in the non-patent literature 1, operates as shown in the following.
The frequency transformation unit 100 carries out a frequency transformation to the input signal in a frame unit which has a predetermined time length, and generates a frequency-domain input signal. The frequency transformation unit 100 outputs the generated frequency-domain input signal to the data memory unit 105 and the demultiplexed signal generation unit 103. DFT (Discrete Fourier Transform) is used in the frequency transformation. The data memory unit 105 stores the frequency-domain input signals of the plural frames. In the case that the frequency-domain input signal of the current frame is inputted, the data memory unit 105 deletes the frequency-domain input signal of the oldest frame, and stores the frequency-domain input signal of the current frame. As a result, the data memory unit 105 holds the frequency-domain input signals of the plural frames which continue from the current frame to the past frames. The demultiplexing matrix generation unit 102 reads the frequency-domain input signals of the plural frames which are held by the data memory unit 105. The demultiplexing matrix generation unit 102 carries out a learning and calculation process to the demultiplexing matrix by use of the frequency-domain input signals. The demultiplexing matrix generation unit 102 outputs the calculated demultiplexing matrix to the demultiplexed signal generation unit 103. The demultiplexed signal generation unit 103 generates frequency-domain demultiplexed signals on the basis of the frequency-domain input signals and the demultiplexing matrix. The demultiplexed signal generation unit 103 outputs the generated frequency-domain demultiplexed signal to the inverse frequency transformation unit 104. The inverse frequency transformation unit 104 transforms the frequency-domain demultiplexed signal to a demultiplexed signal through carrying out an inverse frequency transformation. IDFT (Inverse Discrete Fourier Transform) is used as the inverse frequency transformation.
Moreover, a patent literature 1 exemplifies a voice demultiplexing device to generate a demultiplexed signal, which is corresponding to each of plural sound sources, on the basis of plural mix-voice signals which are inputted sequentially through a plurality of voice input means and which include mixture of voice signals outputted by a plurality of sound sources.
The voice demultiplexing device described in the patent literature 1 includes an A/D (Analog/Digital) converter to convert the mix-voice signals, which are inputted through a plurality of microphones and which include mixture of the plural (n) sound source signals, to digital signals, a plurality of (n) DSPs (Digital Signal Processor) to input a plurality of (n) mix-voice signals which are digitalized, and to carry out signal processing to the mixed voice signals which are inputted, and a D/A (Digital/Analog) converter to convert a plurality of (n) demultiplexed signals, which are outputted sequentially by one DSP out of the plural DSPs and to which a sound source demultiplexing process has been carried out, to analog signals. The voice demultiplexing device operates as shown in the following.
Through carrying out the discrete Fourier transform to n time-domain input signals (frame signal) which are digitalized by the A/D converter and have a predetermined time length, n DSPs transform the n input signals to the frequency-domain mix-voice signals, and buffer the frequency-domain mix-voice signals. Moreover, in parallel to carrying out the transformation to the frequency-domain signal, each of n DSPs handles a signal per a frequency band which is generated through dividing the mix-voice signal into a plurality of signals per the frequency band, and carries out a learning and calculation process to a demultiplexing matrix W (f) according to the FDICA (Frequency-Domain ICA) method. Furthermore, in parallel to carrying out the transformation process into the frequency-domain signal and the learning process to the demultiplexing matrix, one DSP generates the demultiplexed signal corresponding to each of the sound sources on the basis of the buffered frequency-domain frame signal through carrying out a matrix calculation by use of the demultiplexing matrix W(f) which is updated through the learning process. Furthermore, each DSP carries out the inverse discrete Fourier transformation to each of the generated demultiplexed signals.
With regard to the learning process applied to the demultiplexing matrix W (f), an initial matrix for the first learning process, which uses a signal of the first frame, is predetermined. Then, the learning process, which uses a signal of the second frame or the frame following the second frame, uses the demultiplexing matrix W(f) updated by the learning process which uses the previous frame. The mixed-voice signal, to which the sound source demultiplexing process is carried out by use of the updated demultiplexing matrix, may be the same as or may be different from the signal which is used in the learning process for the demultiplexing matrix.
A patent literature 2 exemplifies a sound source demultiplexing system which, on the basis of a mixed signal which is generated through multiplying N acoustic signals different each other, and a N+1′th acoustic signal different from the N acoustic signals by weighting coefficients which are equal to 1 respectively, and adding the weighted N acoustic signals and the weighted N+1′th acoustic signal, demultiplexes the N acoustic signals and outputs the N acoustic signals which are demultiplexed. The sound source demultiplexing system described in the patent literature 2 includes an encoder and a decoder. The encoder includes a mixed signal generation means, a judgment means and an output means. The decoder includes a sorting means, a pseudo-mixed signal generation means and a demultiplexing means. The sound source demultiplexing system described in the patent literature 2 operates as shown in the following.
The mixed signal generation means of the encoder of the sound source demultiplexing system described in the patent literature 2 generates a first mixed signal through multiplying the N acoustic signals different each other, and the N+1′th acoustic signal different from the N acoustic signals by the weighting coefficients which are equal to 1 respectively and adding the weighted N acoustic signals and the weighted N+1′th acoustic signal. Moreover, the mixed signal generation means generates a mixed signal through assigning a predetermined value (α), which is almost equal to 1, as the weighting coefficient to one acoustic signal selected in turn out of the N+1 acoustic signals, and assigning the weighting coefficients, which are equal to 1, to other N acoustic signals, and multiplying the N+1 acoustic signals by the weighting coefficients respectively, and adding the weighted N+1 acoustic signals. Then, the mixed signal generation means repeats the above-mentioned mixed signal generation process N times with changing one selected acoustic signal in turn, and generates N kinds of the mixed signals. Next, the judgment means carries out the independent component analysis to the first mixed signal and the N mixed signals, and judges whether it is possible to demultiplex the N acoustic signals. In the case that the judgment means judges that it is possible to demultiplex the N mixed signals, the encoder makes the output means output the first mixed signal and the predetermined value (α).
The sorting means of the decoder of the sound source demultiplexing system described in the patent literature 2 carries out the Fourier transform to the first mixed signal which is outputted by the encoder, and obtains a time-dependent change of a spectrum. Moreover, the sorting means analyzes the time-dependent change by the auditory scene analysis and carries out classification into N+1 groups. Next, the pseudo-mixed signal generation means selects one group out of the N+1 groups which the sorting means classifies, and multiplies an amplitude of the spectrum, which belongs to the selected group, by the predetermined value (α). After the multiplication, the pseudo-mixed signal generation means carries out the inverse Fourier transform to the spectrum which belongs to each group, and generates a pseudo-mixed signal. The pseudo-mixed signal generation means carries out the multiplication and the pseudo-mixed signal generation N times with changing the selected group in turn, and generates N kinds of the pseudo-mixed signals. Moreover, the demultiplexing means of the decoder demultiplexes the N acoustic signals out of the first mixed signal and N kinds of the pseudo-mixed signals.
In the case the judgment unit of the encoder judges that it is possible to demultiplex the N acoustic signals, that is, in the case that the demultiplexed signal is coincident with the input signal, a demultiplexing matrix is coincident with an inverse matrix of a matrix which is corresponding to the mixed signal generation process carried out by the mixed signal generation means and which includes α as a parameter. The demultiplexing means of the decoder calculates the demultiplexing matrix, which is the inverse matrix, on the basis of the predetermined value α which is transferred by the encoder, and demultiplexs the signal.
A patent literature 3 exemplifies a sound signal processing device to optimize a demultiplexing matrix by use of a mixed sound which includes mixture of a sound from a detection target sound source and a sound from a noise source, and demultiplexes the sound from the detection target sound source and the sound from the noise source on the basis of the mix sound by use of the optimized demultiplexing matrix.
The sound signal processing device described in the patent literature 3 includes a first and second framing unit, a first and second frequency analysis unit, a demultiplexing processing unit, a demultiplexing matrix optimization calculation unit, an utterance period judgment unit, a demultiplexing process on/off control unit, and an optimization calculation on/off control unit, and operates as shown in the following.
The first and second framing unit samples two channel voice signals, which the first and second framing unit inputs through a first and a second microphones, at a predetermined time interval to generate one frame, which includes predetermined number of the samples, on the basis of the time division multiplexing method, and outputs the frame to the first and second frequency analysis unit. The first and second frequency analysis unit carries out FFT (Fast Fourier Transform) to the voice signal, which is inputted in a unit of the frame, to generate an observation signal, and outputs the observation signal to the demultiplexing process on/off control unit.
In the case that the utterance period judgment unit, which will be described later, judges that it is within an utterance period, the demultiplexing process on/off control unit outputs the inputted observation signal to the demultiplexing processing unit. On the other hand, in the case that the utterance period judgment unit does not judge that it is within the utterance period, the demultiplexing process on/off control unit does not output the observation signal. The demultiplexing processing unit demultiplexes and extracts a demultiplexed signal from the observation signal by use of the demultiplexing matrix which is optimized by the demultiplexing matrix optimization calculation unit.
The utterance period judgment unit judges the utterance period on the basis of degree of a correlation of the input signal from the microphone, or degree of a correlation of the signal which is framed by the first and second framing unit, or on the basis of a power spectrum or a cross spectrum of the observation signal which is generated by the frequency analysis unit. In the case of the judgment on the basis of the degree of the correlation or the power spectrum, it is necessary that noise is included in both the input signals, and the uttered voice to be demultiplexed is included in any one of the input signals so that the utterance period judgment unit may judge the utterance period correctly. Moreover, in the case that the utterance period judgment unit carries out the judgment on the basis of the cross spectrum, it is necessary that the uttered voice to be demultiplexed is included in both the input signals.
The demultiplexing matrix optimization calculation unit optimizes the demultiplexing matrix on the basis of the demultiplexed signal which is outputted by the demultiplexing processing unit.
In the case that the utterance period judgment unit judges that it is within the utterance period, the optimization calculation on/off control unit makes the demultiplexing matrix optimization calculation unit carry out the optimization process, and in the case that the utterance period judgment unit does not judge that it is within the utterance period, the optimization calculation on/off control unit makes the demultiplexing matrix optimization calculation unit suspend the optimization process.