A noise suppressor is an apparatus which suppresses noise superposed on a desired speech signal. A noise suppressor operates to estimate the power spectrum of a noise component using an input signal that has been transformed into a frequency-domain signal, and subtracts the estimated noise power spectrum from the input signal thereby suppressing the noise mixed with the desired speech signal. A noise suppressor can be used to suppress nonstationary noise by detecting a silent section of speech and updating the power spectrum of a noise component.
A noise suppressor is described in IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, Vol. 32, No. 6, pp. 1109-1121, DECEMBER 1984, (Reference 1). In this paper, the noise suppressor uses a technique known as a minimum mean-square error short-time spectral amplitude process. FIG. 1 shows the structure of the noise suppressor described in Reference 1. A signal including a desired speech signal and noise mixed therewith will hereinafter be referred to as a noisy speech signal.
The noise suppressor shown in FIG. 1 comprises input terminal 11, frame decomposition unit 1, windowing unit 2, Fourier transform unit 3, voice activity detector 4, noise estimation unit 51, frequency-dependent SNR (signal-to-noise ratio) calculator 6, a-priori SNR estimator 7, spectral gain generator 8, inverse Fourier transform unit 9, frame synthesis unit 10, output terminal 12, counter 13, and multiplexed multipliers 16, 17. In the noise suppressor, input terminal 11 is supplied with a noisy speech signal as a sequence of samples. Samples of the noisy speech signal are then supplied to frame decomposition unit 1, which divides the noisy speech signal into frames with K/2 samples where K represents an even number. The noisy speech signal samples which are divided into frames are supplied to windowing unit 2 in which they are multiplied by a window function w(t). A signal yn(t) produced by windowing the nth-frame of the input signal yn(t) (t=0, 1, . . . , K/2−1) with w(t) is expressed by the following equation: yn(t)=w(t)yn(t)  (1)
In the noise suppressor, successive two frames are generally overlapped and windowed. If it is assumed that 50% of the frame length is used as the overlap length, then windowing unit 2 outputs yn(t) (t=0, 1, . . . , K−1) expressed by (2), (3): yn(t)=w(t)yn−1(t)  (2) yn(t+K/2)=w(t+K/2)yn(t)  (3)
In the following description, 50% overlap is assumed. A Hanning window expressed by equation (4), for example, may be used as w(t):
                              w          ⁡                      (            t            )                          =                  {                                                                                          0.5                    +                                          0.5                      ⁢                                              cos                        ⁡                                                  (                                                                                    π                              ⁡                                                              (                                                                  k                                  -                                                                      K                                    /                                    2                                                                                                  )                                                                                                                    K                              /                              2                                                                                )                                                                                                      ,                                                                              0                  ≤                  t                  <                  K                                                                                                      0                  ,                                                                              otherwise                                                                                        (        4        )            
The windowed output yn(t) is supplied to Fourier transform unit 3, which converts the windowed output yn(t) into a noisy speech spectrum Yn(k). The noisy speech spectrum Yn(k) is separated into a phase and an amplitude. The noisy speech phase spectrum arg Yn(k) is supplied to inverse Fourier transform unit 9, and the spectral amplitude of noisy speech |Yn(k)| is supplied to voice activity detector 4, multiplexed multiplier 16, and multiplexed multiplier 17.
Voice activity detector 4 determines whether there is speech or not based on the spectral amplitude of noisy speech |Yn(k)|, and transmits a voice activity detection flag that is set in accordance with the determined result to noise estimation unit 51. Multiplexed multiplier 17 calculates a noisy speech power spectrum using the supplied spectral amplitude of noisy speech |Yn(k)|, and provides the calculated noisy speech power spectrum to noise estimation unit 51 and frequency-dependent SNR calculator 6.
Noise estimation unit 51 estimates a power spectrum of the noise using the voice activity detection flag, the noisy speech power spectrum, and a count value supplied from counter 13, and transmits the estimated power spectrum to frequency-dependent SNR calculator 6 as an estimated noise power spectrum. Frequency-dependent SNR calculator 6 calculates an SNR for each frequency by using the noisy speech power spectrum and the estimated noise power spectrum which have been supplied thereto, and supplies the calculated SNR as an a-posteriori SNR to a-priori SNR estimator 7 and spectral gain generator 8.
A-priori SNR estimator 7 estimates an a-priori SNR using the a-posteriori SNR supplied thereto and a spectral gain supplied from spectral gain generator 8, and supplies the estimated a-priori SNR as feedback to spectral gain generator 8.
Spectral gain generator 8 generates a spectral gain using the a-posteriori SNR and the estimated a-priori SNR which are supplied thereto as inputs, and supplies the spectral gain to a-priori SNR estimator 7 as feedback and also transmits the generated noise spectral gain to multiplexed multiplier 16.
Multiplexed multiplier 16 weights the spectral amplitude of noisy speech |Yn(k)| supplied from Fourier transform unit 3 with the spectral gain Gn(k) supplied from spectral gain generator 8, thus determining a spectral amplitude of the enhanced speech | Xn(k)|, and transmits the spectral amplitude of the enhanced speech | Xn(k)| to inverse Fourier transform unit 9. The spectral amplitude of the enhanced speech | Xn(k)| is expressed by equation (5):| Xn(k)|= Gn(k)|Yn(k)|  (5)
Inverse Fourier transform unit 9 multiplies the spectral amplitude of the enhanced speech | Xn(k)| supplied from multiplexed multiplier 16 by the noisy speech phase spectrum arg Yn(k) supplied from Fourier transform unit 3 by each other, thus determining enhanced speech Xn(k). That is, inverse Fourier transform unit 9 carries out a calculation according to equation (6): Xn(k)=| Xn(k)|arg Yn(k)  (6)
Inverse Fourier transform unit 9 performs an inverse Fourier transform on the produced enhanced speech Xn(k), producing a time-domain sequence of samples xn(t) (t=0, 1, . . . , K−1) where one frame is made up of K samples, and transmits the time-domain samples xn(t) to frame synthesis unit 10. Frame synthesis unit 10 takes out K/2 samples from adjacent two frames of xn(t), and overlaps the K/2 samples, producing enhanced speech {circumflex over (x)}n(t) according to equation (7). The produced enhanced speech {circumflex over (x)}n(t) (t=0, 1, . . . , K−1) is transmitted as an output from frame synthesis unit 10 to output terminal 12.{circumflex over (x)}n(t)= xn−1(t+K/2)+ xn(t)  (7)
Reference 1 discloses no details about how to implement voice activity detector 4 included in the noise suppressor shown in FIG. 1. However, one example of the voice activity detector that can be used in the noise suppressor is available in “Proceedings of National Convention of the Acoustical Society of Japan, March 2000, pages 321-322 (Reference 2).” The voice activity detector shown in Reference 2 will be described below as a conventional implemented example of voice activity detector 4. As shown in FIG. 2, voice activity detector 4 comprises threshold memory 401, comparator 402, multiplier 404, logarithmic calculator 405, power calculator 406, weighted adder 407, weight memory 408, and NOT circuit 409.
In voice activity detector 4, the spectral amplitude of noisy speech supplied from the Fourier transform unit 3 (FIG. 1) is supplied to power calculator 406. Power calculator 406 calculates the sum of powers |Yn(k)|2 of the spectral amplitude of noisy speech from k=0 to K−1, and transmits the calculated sum to logarithmic calculator 405. Logarithmic calculator 405 determines a logarithm of the supplied noisy speech spectrum power, and supplies the logarithm to multiplier 404. Multiplier 404 multiplies the supplied logarithm by a constant to determine a noisy speech power Qn, and supplies the noisy speech power Qn to comparator 402 and weighted adder 407. Specifically, noisy speech power Qn in the nth-frame is expressed by the following equation:
                              Q          n                =                  10          ⁢                                    log              10                        ⁡                          (                                                ∑                                      k                    =                    0                                                        K                    -                    1                                                  ⁢                                                                                                                        Y                        n                                            ⁡                                              (                        k                        )                                                                                                  2                                            )                                                          (        8        )            
The voice activity detector disclosed in Reference 2 determines Qn according to equation (9), using time-domain samples yn(t).
                              Q          n                =                  10          ⁢                                    log              10                        ⁡                          (                                                ∑                                      t                    =                    0                                                        K                    -                    1                                                  ⁢                                                                            y                      _                                        n                    2                                    ⁡                                      (                    t                    )                                                              )                                                          (        9        )            
As described in “Digital Signal Processing”, 1985, Corona, pages 75-76 (Reference 3), it is known that the equations (8) and (9) are equivalent by the Parseval's Theorem.
Comparator 402 is supplied with a threshold value THn from threshold memory 401. Comparator 402 compares the output from multiplier 404 with the threshold value THn. If THn>Qn, then comparator 402 outputs “1” representing a speech section, and if THn≦Qn, then comparator 402 outputs “0” representing a silent section, as a voice activity detection flag. The output from comparator 402 is used as the voice activity detection flag, and is also supplied to NOT circuit 409. NOT circuit 409 supplies its output as weighted adder control signal 905 for weighted adder 407. Weighted adder 407 is also supplied with threshold value 902 from threshold memory 401 and weight 903 from weight memory 408.
Weighted adder 407 selectively updates threshold value 902 supplied from threshold memory 401 based on weighted adder control signal 905, and supplies updated threshold value 904 as feedback to threshold memory 401. The updated threshold value THn is determined by performing weighted addition of a threshold value THn−1 and noisy speech power 901 using weight 903 from weight memory 408. The updated threshold value THn is calculated only when weighted adder control signal 905 which is the output from NOT circuit 409 is equal to “1”, i.e., only during a silent section. Updated threshold value 904 thus updated is supplied as feedback to threshold memory 401.
As shown in FIG. 3, power calculator 406 has demultiplexer 4061, K multipliers 40620 to 4062K−1, and adder 4063. The multiplexed spectral amplitude of noisy speech supplied from Fourier transform unit 3 (FIG. 1) is separated by demultiplexer 4061 into frequency-dependent K samples, which are supplied respectively to multipliers 40620 to 4062K−1. Multipliers 40620 to 4062K−1 square the supplied input signals, respectively, and transmit the squared signals to adder 4063, which determines the sum of the input signals and outputs the determined sum.
As shown in FIG. 4, weighted adder 407 has multipliers 4071, 4073, constant multiplier 4075, and adders 4072, 4074. Weighted adder 407 is supplied with noisy speech power 901 from multiplier 404 (FIG. 2), threshold value 902 from threshold memory 401 (FIG. 2), weight 903 from weight memory 408 (FIG. 2), and weighted adder control signal 905 from NOT circuit 409 (FIG. 2) as inputs thereto. Weight 903 having a value β is transmitted to constant multiplier 4075 and multiplier 4073. Constant multiplier 4075 multiplies the input signal by −1 to produce a value −β, and transmits the value −β to adder 4074, which is supplied also with 1 as another input. Adder 4074 thus outputs a sum 1−β, which is supplied to multiplier 4071. On the other hand, multiplier 4071 multiplies the sum 1−β, by noisy speech power Qn as another input thereto, producing a product (1−β)Qn that is transmitted to adder 4072. Multiplier 4073 multiplies the value β supplied as weight 903 by threshold value 902, and transmits a product βTHn−1 to adder 4072. Adder 4072 adds βTHn−1 and (1−β)Qn, and outputs the sum as updated threshold value 904. The updated threshold value THn is calculated only when weighted adder control signal 905 is equal to “1”. That is, weighted adder 407 has a function to update THn−1 to determine THn during a silent section according to the following equation where β represents the value of weight 903:
                              TH          n                =                  {                                                                                          TH                    n                                    ,                                                                                                  TH                    n                                    ≥                                      Q                    n                                                                                                                                                                  β                      ⁢                                                                                          ⁢                                              TH                                                  n                          -                          1                                                                                      +                                                                  (                                                  1                          -                          β                                                )                                            ⁢                                              Q                        n                                                                              ,                                                                                                  TH                    n                                    <                                      Q                    n                                                                                                          (        10        )            
FIG. 5 shows an example of an arrangement of multiplexed multiplier 17 included in the noise suppressor shown in FIG. 1. Multiplexed multiplier 17 has K multipliers 17010 to 1701K−1 demultiplexers 1702, 1703, and multiplexer 1704. In multiplexed multiplier 17, the multiplexed spectral amplitude of noisy speech supplied from Fourier transform unit 3 (FIG. 1) is separated by demultiplexers 1702, 1703 into frequency-dependent K samples, which are supplied respectively to multipliers 17010 to 1701K−1. Multipliers 17010 to 1701K−1 square the supplied input signals, respectively, and transmit the squared signals to multiplexer 1704, which multiplexes the input signals and outputs the multiplexed signal as a noisy speech power spectrum.
As shown in FIG. 6, noise estimation unit 51 included in the noise suppressor shown in FIG. 1 has demultiplexer 502, multiplexer 503, and K frequency-dependent noise estimation units 5140 to 514K−1. In noise estimation unit 51, the voice activity detection flag supplied from voice activity detector 4 (FIG. 1) and the count value supplied from counter 13 (FIG. 1) are transmitted to frequency-dependent noise estimation units 5140 to 514K−1. The noisy speech power spectrum supplied from multiplexed multiplier 17 (FIG. 1) is transmitted to demultiplexer 502. Demultiplexer 502 separates the supplied multiplexed noisy speech power spectrum into K frequency-dependent components, and transmits the K frequency-dependent components respectively to frequency-dependent noise estimation units 5140 to 514K−1. Frequency-dependent noise estimation units 5140 to 514K−1 calculate noise power spectrum components using the noisy speech power spectrum supplied from demultiplexer 502, and transmit the calculated noise power spectrum components to multiplexer 503. Calculation of the noise power spectrum is controlled by the count value and the value of the voice activity detection flag and is performed only when predetermined conditions are satisfied. Multiplexer 503 multiplexes the supplied K noise power spectrum components, and outputs the multiplexed noise power spectrum as an estimated noise power spectrum.
FIG. 7 shows an arrangement of each of frequency-dependent noise estimation units 5140 to 514K−1 included in noise estimation unit 51 (FIG. 6). Since frequency-dependent noise estimation units 5140 to 514K−1 are identical in arrangement to each other, they are indicated as frequency-dependent noise estimation unit 514 in FIG. 7. The noise estimation algorithm disclosed in Reference 2 serves to update an estimated noise value in a silent section, and uses instantaneous values of estimated noise which are averaged by a recursive filter, as the estimated noise value. Another noise estimation algorithm is disclosed in IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, Vol. 6, No. 3, pp. 287-292, MAY 1998 (Reference 4), which states that instantaneous values of estimated noise are averaged and used. Reference 4 suggests the implementation of an averaging process using a transversal filter, i.e., a filter comprising a shift register, rather than a recursive filter. Since both of the above implementations have equal functions, the process disclosed in Reference 4 will be described below.
Frequency-dependent noise estimation unit 514 has update decision unit 521, register length memory 5041, switch 5044, shift register 4045, adder 5046, minimum value selector 5047, divider 5048, and counter 5049. Switch 5044 is supplied with the frequency-dependent noisy speech power spectrum from demultiplexer 502 (FIG. 6). When switch 5044 closes its circuit, the frequency-dependent noisy speech power spectrum is transmitted to shift register 5045. In response to a control signal supplied from update decision unit 521, shift register 5045 shifts stored values in internal register elements to adjacent register elements. The length of the shift register 5045 is equal to a value stored in register length memory 5941. The outputs from all the internal register elements of shift register 5045 are supplied to adder 5046. Adder 5046 adds the supplied outputs from all the internal register elements, and transmits the sum to divider 5048.
On the other hand, update decision unit 521 is supplied with the count value from counter 13 and the voice activity detection flag from voice activity detector 4. Update decision unit 521 outputs “1” at all times until the count value reaches a preset value. After the count value reaches the preset value, update decision unit 521 outputs “1” when the voice activity detection flag is “0”, i.e., during a silent section, and outputs “0” otherwise. Update decision unit 521 transmits its output to counter 5049, switch 5044, and shift register 5045. Switch 5044 closes its circuit when the signal supplied from update decision unit 521 is “1”, and opens its circuit when the signal supplied from update decision unit 521 is “0”. Counter 5049 increments its count value when the signal supplied from update decision unit 521 is “1”, and does not change its count value when the signal supplied from update decision unit 521 is “0”. Shift register 5045 reads one signal sample supplied from switch 5044 and shifts the stored values in the internal register elements to the adjacent register elements, when the signal supplied from update decision unit 521 is “1”.
Minimum value selector 5047 is supplied with the output from counter 5049 and the output from register length memory 5941. Minimum value selector 5047 selects a smaller one of the count value and the register length which are supplied thereto, and transmits the selected value to divider 5048. Divider 5048 divides the sum of the frequency-dependent noisy speech power spectrum supplied from adder 5046 by the smaller one of the count value and the register length, and outputs the quotient as a frequency-dependent estimated noise power spectrum λn(k). If the sample values of the frequency-dependent noisy speech power spectrum components stored in shift register 5045 are represented by Bn(k) (n=0, 1, . . . , N−1), then the frequency-dependent estimated noise power spectrum λn(k) is expressed by equation (11):
                                          λ            n                    ⁡                      (            k            )                          =                              1            N                    ⁢                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                          B                n                            ⁡                              (                k                )                                                                        (        11        )            where N represents a smaller one of the count value and the register length. Since the count value monotonously increments from zero, dividing operation is initially performed by using the count value and then performed by using the register length. Performing dividing operation by using the register length means determining an average value of the values stored in the shift register. Initially, since sufficiently many values are not stored in shift register 5045, the sum of frequency-dependent noisy speech power spectrum is divided by the number of register elements where values are actually stored. The number of register elements where values are actually stored is equal to the count value when the count value is smaller than the register length, and equal to the register length when the count value becomes larger than the register length.
FIG. 8 shows an arrangement of update decision unit 521. Update decision unit 521 has NOT circuit 5202, comparator 5203, threshold memory 5204, and OR circuit 5211. In update decision unit 521, the count value supplied from counter 13 (FIG. 1) is transmitted to comparator 5203. Comparator 5203 is also supplied with a threshold value output from threshold memory 5204. Comparator 5203 compares the supplied count value and the supplied threshold value with each other. If the count value is smaller than the threshold value, then comparator 5203 transmits “1” to OR circuit 5211, and if the count value is greater than the threshold value, then comparator 5203 transmits “0” to OR circuit 5211. The voice activity detection flag supplied to update decision unit 521 is transmitted to NOT circuit 5202, which determines a logical inverted value of the input signal and transmits the inverted value to OR circuit 5211. Specifically, NOT circuit 5202 transmits “0” to OR circuit 5211 in a speech section where the voice activity detection flag is “1”, and transmits “1” to OR circuit 5211 in a silent section where the voice activity detection flag is “0”. As a result, OR circuit 5211 outputs “1” during a silent section where the voice activity detection flag is “0” or when the count value is smaller than the threshold value, closing the switch shown in FIG. 7 and counting up counter 5049.
FIG. 9 shows an example of an arrangement of frequency-dependent SNR calculator 6 included in the noise suppressor shown in FIG. 1. Frequency-dependent SNR calculator 6 has K dividers 6010 to 601K−1, demultiplexers 602, 603, and a multiplexer 604. In frequency-dependent SNR calculator 6, the noisy speech power spectrum supplied from multiplexed multiplier 17 (FIG. 1) is transmitted to demultiplexer 602. The estimated noise power spectrum supplied from noise estimation unit 51 (FIG. 1) is transmitted to demultiplexer 603. The noisy speech power spectrum is separated into K samples corresponding to respective frequency components by demultiplexer 602, and the K samples are supplied to respective dividers 6010 to 601K−1. The estimated noise power spectrum is separated into K samples corresponding to respective frequency components by demultiplexer 603, and the K samples are supplied to respective dividers 6010 to 601K−1. Dividers 6010 to 601K−1 divide the supplied noisy speech power spectrum by the supplied estimated noise power spectrum, thus determining frequency-dependent SNR γn(k) according to equation (12), and transmit the frequency-dependent SNR γn(k) to multiplexer 604:
                                          γ            n                    ⁡                      (            k            )                          =                                                                                            Y                  n                                ⁡                                  (                  k                  )                                                                    2                                              λ              n                        ⁡                          (              k              )                                                          (        12        )            where λn(k) represents the estimated noise power spectrum. Multiplexer 604 multiplexes the transmitted K frequency-dependent SNRs, and outputs the multiplexed SNR as an a-posteriori SNR.
As shown in FIG. 10, a-priori SNR estimator 7 included in the noise suppressor shown in FIG. 1 has multiplexed range limitation processor 701, a-posteriori SNR memory 702, spectral gain memory 703, multiplexed multipliers 704, 705, weight memory 706, multiplexed weighted adder 707, and adder 708.
In a-priori SNR estimator 7, the a-posteriori SNRs γn(k) (k=0, 1, . . . , K−1) supplied from frequency-dependent SNR calculator 6 (FIG. 6) are transmitted to a-posteriori SNR memory 702 and adder 708. A-posteriori SNR memory 702 stores a-posteriori SNR γn(k) in the nth-frame and transmits a-posteriori SNR γn−1(k) in the (n−1)th-frame to multiplexed multiplier 705. The spectral gains Gn(k) (k=0, 1, . . . , K−1) supplied from spectral gain generator 8 are transmitted to spectral gain memory 703. Spectral gain memory 703 stores spectral gain Gn(k) in the nth-frame and transmits spectral gain Gn−1(k) in the (n−1)th-frame to multiplexed multiplier 704. Multiplexed multiplier 704 squares the supplied spectral gain Gn−1(k) to determine G2n−1(k) and transmits G2n−1(k) to multiplexed multiplier 705. Multiplexed multiplier 705 multiplies G2n−1(k) and γn−1(k) for k=0, 1, . . . , K−1 to determine G2n−1(k)γn−1(k), and transmits G2n−1(k)γn−1(k) as past estimated SNR 922 to multiplexed weighted adder 707. Multiplexed multipliers 704, 705 are identical in arrangement to multiplexed multiplier 17 already described with reference to FIG. 5 and will not be described here.
The other terminal of adder 708 is supplied with −1, so that the sum γn(k)−1 is transmitted to multiplexed range limitation processor 701. Multiplexed range limitation processor 701 processes the sum γn(k)−1 supplied from adder 708 with a range limitation operator P[·], and transmits the result P[γn(k)−1] as instantaneous estimated SNR 921 to multiplexed weighted adder 707. P[χ] is defined as (13):
                              P          ⁡                      [            x            ]                          =                  {                                                                      x                  ,                                                                              x                  >                  0                                                                                                      0                  ,                                                                              otherwise                                                                                        (        13        )            
Multiplexed weighted adder 707 is also supplied with weight 923 from weight memory 706. Multiplexed weighted adder 707 determines estimated a-priori SNR 924 using instantaneous estimated SNR 921, past estimated SNR 922, and weight 923, which are supplied thereto. If weight 923 is represented by α and estimated a-priori SNR 924 is represented by {circumflex over (ξ)}n(k), then {circumflex over (ξ)}n(k) is calculated according to equation (14):{circumflex over (ξ)}n(k)=αγn−1(k) Gn−12(k)+(1−α)P[γn(k)−1]  (14)where G2−1(k)γ−1(k)=1.
As shown in FIG. 11, above-described multiplexed range limitation processor 701 has constant memory 7011, K maximum value selectors 70120 to 7012K−1, demultiplexer 7013, and multiplexer 7014. In multiplexed range limitation processor 701, demultiplexer 7013 is supplied with γn(k)−1 from adder 708 (FIG. 10). Demultiplexer 7013 splits the supplied γn(k)−1 into K frequency-dependent components and supplies frequency-dependent components respectively to maximum value selectors 70120 to 7012K−1, whose other input terminals are supplied with zero from constant memory 7011. Maximum value selectors 70120 to 7012K−1 compare γn(k)−1 with zero, and transmit larger values to multiplexer 7014. This maximum value selecting calculation corresponds to the calculation according to equation (13). Multiplexer 7014 multiplexes the supplied values and outputs the multiplexed value.
As shown in FIG. 12, multiplexed weighted adder 707 has K weighted adders 70710 to 7071K−1, demultiplexers 7072, 7074, and multiplexer 7075. In multiplexed weighted adder 707, demultiplexer 7072 is supplied with P[γn(k)−1] as instantaneous estimated SNR 921 from multiplexed range limitation processor 701 (FIG. 10). Demultiplexer 7072 separates P[γn(k)−1] into K frequency-dependent components, and transmit the frequency-dependent components as frequency-dependent instantaneous estimated SNRs 9210 to 921K−1 to respective weighted adders 70710 to 7071K−1. Demultiplexer 7074 is supplied with G2n−1(k)γn−1(k) as past estimated SNR 922 from multiplexed multiplier 705 (FIG. 10). Demultiplexer 7074 separates G2n−1(k)γn−1(k) into K frequency-dependent components, and transmits the frequency-dependent components as past frequency-dependent estimated SNRs 9220 to 922K−1 to respective weighted adders 70710 to 7071K−1. Weighted adders 70710 to 7071K−1 are also supplied with weight 923. Weighted adders 70710 to 7071K−1 carry out the weighted addition according to equation (14), and transmit the result as frequency-dependent estimated a-priori SNRs 9240 to 924K−1 to multiplexer 7075. Multiplexer 7075 multiplexes frequency-dependent estimated a-priori SNRs 9240 to 924K−1 and outputs the multiplexed result as estimated a-priori SNR 924. Operation and arrangement of each of weighted adders 70710 to 7071K−1 are the same as weighted adder 407 already described above with reference to FIG. 4, and will not be described in detail. However, the weighted addition is calculated at all times.
FIG. 13 shows an example of an arrangement of spectral gain generator 8 included in the noise suppressor shown in FIG. 1. Spectral gain generator 8 has K spectral gain search units 8010 to 801K−1, demultiplexers 802, 803, and multiplexer 804. In spectral gain generator 8, demultiplexer 802 is supplied with the a-posteriori SNR from frequency-dependent SNR calculator 6 (FIG. 1). Demultiplexer 802 separates the supplied a-posteriori SNR into K frequency-dependent components and transmits the K frequency-dependent components respectively to spectral gain search units 8010 to 801K−1. Demultiplexer 803 is supplied with the estimated a-priori SNR from a-priori SNR estimator 7 (FIG. 1). Demultiplexer 803 separates the supplied estimated a-priori SNR into K frequency-dependent components and transmits the K frequency-dependent components respectively to spectral gain search units 8010 to 801K−1. Spectral gain search units 8010 to 801K−1 search for spectral gains corresponding to the a-posteriori SNR and the estimated a-priori SNR which have been supplied, and transmit the results to multiplexer 804. Multiplexer 804 multiplexes the supplied spectral gains and outputs the multiplexed result.
FIG. 14 shows an example of an arrangement of spectral gain search units 8010 to 801K−1. Since spectral gain search units 8010 to 801K−1 are identical in arrangement to each other, they are represented as spectral gain search unit 801 in FIG. 14. Spectral gain search unit 801 has spectral gain table 8011 and address converters 8012, 8013. In spectral gain search unit 801, address converter 8012 is supplied with the frequency-dependent a-posteriori SNR from demultiplexer 802 (FIG. 13). Address converter 8012 converts the supplied frequency-dependent a-posteriori SNR into a corresponding address, and transmits the address to spectral gain table 8011. Address converter 8013 is supplied with the frequency-dependent estimated a-priori SNR from demultiplexer 803 (FIG. 13). Address converter 8013 converts the supplied frequency-dependent estimated a-priori SNR into a corresponding address, and transmits the address to spectral gain table 8011. Spectral gain table 8011 outputs spectral gains which are stored in areas corresponding to the addresses supplied from address converter 8012 and address converter 8013, as frequency-dependent spectral gains.
The conventional noise suppressor has been described above. With the conventional noise suppressor described above, the power spectrum of noise is updated in a silent section based on the output of the voice activity detector. Therefore, if the detected result from the voice activity detector is incorrect, then it is unable to estimate the power spectrum of noise accurately. When a speech section continues for a long time, since no silent section exists, the power spectrum of noise cannot be updated, and hence the accuracy with which to estimate the power spectrum of nonstationary noise is inevitably lowered. Accordingly, the conventional noise suppressor has residual noise and distortion in the enhanced speech.
According to the conventional suppression algorithm, the power spectrum of noise is estimated using the power spectrum of noisy speech. With the conventional algorithm, therefore, the power spectrum of noise cannot be estimated accurately under the influence of the power spectrum of speech contained in the noisy speech, so that noise tends to remain and distortion tends to be introduced in the enhanced speech. According to the conventional noise suppression algorithm, furthermore, because noise suppression is carried out using spectral gains determined by the same calculation method independent of the SNR, a sufficiently high quality cannot be achieved for the enhanced speech.
It is an object of the present invention to provide a method of noise suppression to produce enhanced speech with reduced distortion and noise by accurately estimating the power spectrum of noise independent of the performance of a voice activity detector.
Another object of the present invention is to provide an apparatus for noise suppression to produce enhanced speech with reduced distortion and noise by accurately estimating the power spectrum of noise without being governed by the performance of a voice activity detector.
Still another object of the present invention is to provide a method of noise suppression to produce enhanced speech suffering with reduced distortion and noise by accurately estimating the power spectrum of noise even in a speech section when the noise is nonstationary.
Yet still another object of the present invention is to provide an apparatus for noise suppression to produce enhanced speech with reduced distortion and noise by accurately estimating the power spectrum of noise even in a speech section when the noise is nonstationary.
A further object of the present invention is to provide a method of noise suppression to produce enhanced speech with reduced distortion and noise by using optimum spectral gains with respect to all SNR values.
A still further object of the present invention is to provide an apparatus for noise suppression to produce enhanced speech with reduced distortion and noise by using optimum spectral gains with respect to all SNR values.