In a conventional noise suppressing apparatus, an input signal including a speech signal and noises superimposed on the speech signal is received, the noises denoting a non-object signal are suppressed to remove the noises from the input signal, and the speech signal denoting an object signal is emphasized. This conventional noise suppressing apparatus is, for example, disclosed in Published Unexamined Japanese Patent Application No. 2000-347688. The conventional noise suppressing apparatus is operated according to a so-called spectral subtraction method. This spectral subtraction method is introduced in a document (Steven F. Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”, IEEE Trans. ASSP, Vol. ASSP-27, No. 2, April 1979). In this document, an average noise spectrum is assumed, and the assumed average noise spectrum is subtracted from an amplitude spectrum to suppress noises.
FIG. 1 is a block diagram showing the configuration of a conventional noise suppressing apparatus disclosed in the Published Unexamined Japanese Patent Application No. 2000-347688. In FIG. 1, 1 indicates an input terminal, 2 indicates a time-to-frequency converting unit, 3 indicates a noise-likeness analyzing unit, 4 indicates a noise spectrum estimating unit, 5 indicates a frequency band signal-to-noise ratio calculating unit, 6 indicates a perceptual weight calculating unit, 7 indicates a perceptual weight correcting unit, 8 indicates a spectrum subtracting unit, 9 indicates a spectrum suppressing unit, 10 indicates a frequency-to-time converting unit, and 11 indicates an output terminal. Also, in the noise-likeness analyzing unit 3, 12 indicates a low pass filter, 13 indicates an inverted filter, 14 indicates an auto-correlation analyzing unit, 15 indicates a linear prediction analyzing unit, and 16 indicates an updating rate determining unit.
Next, an operation will be described below.
An input signal s[t] having noises is sampled at a prescribed sampling frequency (for example, 8 kHz), the input signal s[t] is divided into a plurality of frames at a prescribed frame cycle (for example, 20 ms), and the input signal s[t] is received in the conventional noise suppressing apparatus. In the time-to-frequency converting unit 2, the frequency of the input signal s[t] is, for example, analyzed by using a 256-point fast Fourier transformation (FFT), and the input signal s[t] is converted into an amplitude spectrum S[f] and a phase spectrum P[f]. Here, because the FFT is well known, the description of the FFT is omitted.
In the noise-likeness analyzing unit 3, the filter processing is first performed for the input signal s[t] in the low pass filter 12 to obtain a low pass filter signal sl[t]. Thereafter, a linear predictive analysis is performed for the low pass filter signal sl[t] in the linear prediction analyzing unit 15, and both a linear predictive coefficient of a tenth-order a parameter and a frame power POWfr are, for example, obtained. In the inverted filter 13, the inverted filter processing is performed for the low pass filter signal sl[t] by using the linear predictive coefficient, and a low pass linear predictive residual signal (hereinafter, called a low pass residual signal) res[t] is output. Thereafter, in the auto-correlation analyzing unit 14, an auto-correlation analysis is performed for the low pass residual signal res[t] to obtain a positive peak value of an auto-correlation coefficient from an auto-correlation coefficient train rac[t], and the positive peak value is set as RACmax.
In the updating rate determining unit 16, a noise-likeness signal Noise is determined, for example, by using the positive peak value RACmax of the auto-correlation coefficient, a power POWres of the low pass residual signal res[t] and the frame power POWfr, and a noise spectrum updating rate coefficient r corresponding to the determined noise-likeness signal Noise is determined and output. FIG. 2 is a view showing the relation between the noise-likeness signal Noise and the noise spectrum updating rate coefficient r. In the updating rate determining unit 16, the noise-likeness signal Noise is, for example, determined as one level selected from five levels shown in FIG. 2, the noise spectrum updating rate coefficient r corresponding to the determined noise-likeness signal Noise is determined and output.
In the noise spectrum estimating unit 4, a noise spectrum N[f] is updated according to an equation (1) by using the noise spectrum updating rate coefficient r output from the noise-likeness analyzing unit 3, and the amplitude spectrum S[f] output from the time-to-frequency converting unit 2 and an average noise spectrum Nold[f] of preceding noise spectrums N[f] held inside.N[f]=(1−r)×Nold[f]+r×S[f]  (1)
In the frequency band signal-to-noise ratio calculating unit 5, a signal-to-noise ratio (or a frequency band SN ratio) SNR[f] is calculated according to an equation (2) for each frequency band f by using both the amplitude spectrum [f] output from the time-to-frequency converting unit 2 and the noise spectrum N[f] output from the noise spectrum estimating unit 4. Here, the frequency band SN ratio SNR[f] is set to zero in a case where the frequency band SN ratio SNR[f] is negative.
                                                                        SNR                ⁡                                  [                  f                  ]                                            =                            ⁢                              20                ×                log                ⁢                                                                  ⁢                10                ⁢                                                                  ⁢                                  (                                                            S                      ⁡                                              [                        f                        ]                                                              /                                          N                      ⁡                                              [                        f                        ]                                                                              )                                                                                                      ⁢                                                (                  dB                  )                                ;                                                                                      ⁢                                                S                  ⁡                                      [                    f                    ]                                                  >                                  N                  ⁡                                      [                    f                    ]                                                                                                                          =                            ⁢              0                                                                        ⁢                                                (                  dB                  )                                ;                                                                                      ⁢                              other                ⁢                                                                  ⁢                cases                                                                        (        2        )            
In the perceptual weight calculating unit 6, prescribed constants α, α′ (for example, α=1.2, α′=0.5), β, β′ (for example, β=0.8, β′=0.1), γ′ and γ (for example, γ=0.25, γ′=0.4) are received, and a first perceptual weight αw(f), a second perceptual weight βw(f) and a third perceptual weight γw(f) respectively weighted in a frequency direction are calculated according to an equation (3). Here, fc in the equation (3) denotes a Nyquist frequency.αw(f)=(α′−α)×f/fc+αβw(f)=(β′−β)×f/fc+βγw(f)=(γ′−γ)×f/fc+γ  (3)
In the perceptual weight correcting unit 7, the first perceptual weight αw(f) and the second perceptual weight βw(f) are corrected according to an equation (4) by using the band frequency SN ratio SNR[f] output from the frequency band signal-to-noise ratio calculating unit 5. The first perceptual weight αw(f) and the second perceptual weight βw(f) are corrected according to each band frequency SN ratio. For example, in a case where the band frequency SN ratio SNR[f] is low, the first perceptual weight αw(f) and the second perceptual weight βw(f) are corrected to low values. As the band frequency SN ratio SNR[f] becomes higher, the first perceptual weight αw(f) and the second perceptual weight βw(f) become higher together. A first corrected perceptual weight αc(f) and the third perceptual weight γw(f) are output to the spectrum subtracting unit 8, and a second corrected perceptual weight βc(f) is output to the spectrum suppressing unit 9.αc(f)=αw(f)×SNR[f]−MIN_GAINαβc(f)=βw(f)×SNR[f]−MIN_GAINβ  (4)Here, in the equation (4), MIN_GAINα and MIN_GAINβ denote prescribed constants respectively, MIN_GAINα indicates a maximum suppression quantity [dB] of the first perceptual weight αw(f), and MIN_GAINβ indicates a maximum suppression quantity [dB] of the second perceptual weight βw(f).
FIG. 3 is a view showing an example of frequency-directional weighting control for the first perceptual weight αc(f) and the second perceptual weight βc(f) used for both the spectral subtraction and the spectral amplitude suppression described later. In FIG. 3, 101 indicates a spectral subtraction quantity αc(f) denoting the first perceptual weight, 102 indicates a spectral amplitude suppression quantity βc(f) denoting the second perceptual weight, 103 indicates a speech spectrum, and 104 indicates a noise spectrum. In the perceptual weight correcting unit 7, as is formulated in an equation (5), in a case where an average SN ratio SNRave of a current frame is high, the spectral subtraction quantity αc(f) is set so as to increase the difference between αc(f) and αc(0). That is, the inclination of αc(f) in FIG. 3 becomes large. Also, in the perceptual weight correcting unit 7, in a case where the average SN ratio SNRave is high, the spectral amplitude suppression quantity βc(f) is set so as to decrease the difference between βc(f) and βc(0). That is, the inclination of βc(f) in FIG. 3 becomes small. Also, as the average SN ratio SNRave of the current frame becomes lower, the difference between αc(f) and αc(0) is set to be a smaller value. That is, the inclination of αc(f) becomes small. In contrast, the difference between βc(f) and βc(0) is set to be a larger value. That is, the inclination of βc(f) becomes large.SNRave=Σ(SNR[f])/fc, f=0, . . . , fc  (5)
In the spectrum subtracting unit 8, as is formulated in an equation (6), the noise spectrum N[f] is multiplied by the first corrected perceptual weight αc(f), and the obtained product is subtracted from the amplitude spectrum S[f] to obtain a noise subtracted spectrum Ss[f]. The noise subtracted spectrum Ss[f] is output. Also, in a case where the noise subtracted spectrum Ss[f] becomes negative, the noise subtracted spectrum Ss[f] is, for example, replaced with a product obtained by multiplying the amplitude spectrum S[f] of the input signal by the third perceptual weight γw(f). That is, the back filling processing is performed to set the product as the noise subtracted spectrum Ss[f].
                                                                                          Ss                  ⁡                                      [                    f                    ]                                                  =                                ⁢                                                      S                    ⁡                                          [                      f                      ]                                                        -                                      α                    ⁢                                                                                  ⁢                                          c                      ⁡                                              (                        f                        )                                                              ×                                          N                      ⁡                                              [                        f                        ]                                                                                                        ;                                                                        ⁢                                                S                  ⁡                                      [                    f                    ]                                                  >                                  α                  ⁢                                                                          ⁢                                      c                    ⁡                                          (                      f                      )                                                        ×                                      N                    ⁡                                          [                      f                      ]                                                                                                                                                              =                                ⁢                                  γ                  ⁢                                                                          ⁢                                      w                    ⁡                                          (                      f                      )                                                        ×                                      S                    ⁡                                          [                      f                      ]                                                                                  ;                                                                        ⁢                              other                ⁢                                                                  ⁢                cases                                                                        (        6        )            
In the spectrum suppressing unit 9, as is formulated in an equation (7), the noise subtracted spectrum Ss[f] is multiplied by a value relating to the second corrected perceptual weight βc(f) to obtain a noise suppressed spectrum Sr[f] in which an amplitude of noises is decreased. The noise suppressed spectrum Sr[f] is output.Sr[f]=10^(−βc(f))×Ss[f]  (7)Here, 10^(−βc(f)=10−βc(f) is satisfied.
In the frequency-to-time converting unit 10, the inverted procedure to that of the processing performed in the time-to-frequency converting unit 2 is performed. For example, the inverse FFT is performed to convert both the noise suppressed spectrum Sr[f] and the phase spectrum P[f] output from the time-to-frequency converting unit 2 into a time signal, and a time signal component of a preceding frame is superimposed on a portion of this time signal to obtain a noise suppressed signal sr[t]. The noise suppressed signal sr[t] is output from the output signal terminal 11.
As is described above, in the conventional noise suppressing apparatus, the first corrected perceptual weight αc(f) and the second corrected perceptual weight βc(f) respectively weighted in a frequency direction are obtained by performing the correction according to the frequency band SN ratio SNR[f], the spectral subtraction and the spectral amplitude suppression are performed for the amplitude spectrum S[f] of the input signal according to the average SN ratio SNRave of the current frame by using the first corrected perceptual weight αc(f) and the second corrected perceptual weight βc(f). That is, the first corrected perceptual weight αc(f) and the second corrected perceptual weight
βc(f) are controlled to be heightened in a frequency band in which the band frequency SN ratio SNR[f] is high, and the first corrected perceptual weight αc(f)and the second corrected perceptual weight βc(f) are controlled to be lowered in a frequency band in which the band frequency SN ratio SNR[f] is low. Therefore, in the spectral subtraction processing, noises are largely subtracted from the amplitude spectrum S[f] in a frequency band (mainly, a low frequency band) in which the SN ratio is high, and noises are slightly subtracted from the amplitude spectrum S[f] in a frequency band (mainly, a high frequency band) in which the SN ratio is high. Accordingly, noises having a major component in a low frequency band and generated in the running of a motor vehicle can be effectively suppressed, and an excess subtraction from the amplitude spectrum S[f] can be prevented. Also, in the spectral amplitude suppression, the amplitude suppression is slightly performed in a low frequency band, and the amplitude suppression becomes stronger as the frequency band approaches a high frequency band. Accordingly, the occurrence of unnatural and unpleasant residual noises called a musical noise can be prevented.
Because the conventional noise suppressing apparatus has the configuration described above, for example, even in a case where the noise subtraction based on the first perceptual weight αc(f) exceeds a prescribed quantity, the conventional noise suppressing apparatus has no mechanism to limit the noise amplitude suppression based on the second corrected perceptual weight βc(f), and the first corrected perceptual weight αc(f) and the second corrected perceptual weight βc(f) are independently controlled. Therefore, a following problem has arisen. That is, a total quantity of the noise suppression (hereinafter, called a total noise suppression quantity) based on both the first corrected perceptual weight αc(f)and the second corrected perceptual weight βc(f) is not set to a constant value for each frame, unstable feeling in a time direction occurs in the output signal, and the output signal is not preferable with respect to the feeling in the hearing sensation.
The present invention is provided to solve the above-described problem, and the object of the present invention is to provide a noise suppressing apparatus in which noises are preferably suppressed with respect to the feeling in the hearing sensation and the deterioration of a speech quality is low even in a high noise circumstance.