The following is based on Korean Patent Application No. 99-51065 filed Nov. 17, 1999, herein incorporated by reference.
1. Field of the Invention
The present invention relates to speech signal processing, and more particularly, to an apparatus and method for detecting and synthesizing transitional parts of a speech.
2. Description of the Related Art
Human speech includes stationary parts and transitional parts. For example, the stationary part includes silence, voiced/unvoiced sounds based on existence or non-existence of resonance, or the like, and the transitional part includes plosive sounds, abrupt onset sounds, irregular offset sounds, or the like. Conventional speech coders, particularly, harmonic speech coders, code speech using the harmonic component of pitch in the frequency domain, and use the magnitude information of speech and the probability of speech in each band as essential parameters.
In speech coding, it is idealistic that the magnitude information of speech is used for the stationary part of speech, and the phase information of speech is utilized for the transitional part. However, harmonic speech coders estimate only an accurate spectral magnitude of the stationary part by using only the magnitude information, and cause a deterioration in the quality of sound in transitional parts by not using phase information. Therefore, speech coders require a detection and synthesis algorithm for transitional parts to obtain high quality speech at low bit rates, preferably, at 4 Kbit/s.
In the prior art, an absolute peak value with sliding window is used to detect transitional parts from speech. The absolute peak value (P) is calculated by the following Equation 1:                               P          =                                                    max                ⁢                                  xe2x80x83                                ⁢                                  P                  i                                                                              T                  s                                -                1                                                    i              =                              -                                  T                  s                                                                    ⁢                  
                ⁢                              P            i                    =                                                                      1                  N                                ⁢                                                      ∑                                          N                      =                      0                                                              N                      -                      1                                                        ⁢                                                            "LeftBracketingBar"                                              r                        ⁡                                                  (                                                      n                            +                            i                                                    )                                                                    "RightBracketingBar"                                        2                                                                                                      1                N                            ⁢                                                ∑                                      N                    =                    0                                                        N                    -                    1                                                  ⁢                                  "LeftBracketingBar"                                      r                    ⁡                                          (                                              n                        +                        i                                            )                                                        "RightBracketingBar"                                                                                        (        1        )            
wherein Pi denotes a peak value at an i-th sample according to a sliding window, r(n) denotes a linear predictive coding (LPC) residual signal, N denotes the size of a subframe, and Ts denotes the maximum sliding range. A transitional part flag is set when the absolute peak value (P) is greater than a threshold value.
FIGS. 1 and 2 show examples of detection of transitional parts of speech according to a conventional method. FIG. 1(a) shows a speech signal in a clean environment, and FIG. 2(a) shows a speech signal in a noisy environment. FIGS. 1(b) and 2(b) show an absolute peak value in a clean environment and in a noisy environment, respectively. FIGS. 1(c) and 2(c) show results of detection of transitional parts in a clean environment and in a noisy environment, respectively. In FIG. 1, transitional parts were detected using the absolute peak value, but in FIG. 2, transitional parts were not detected. That is, in the prior art, results of detection of transitional parts in the noisy environment are not good.
When an absolute peak value is increased, the detection rate is increased, and the false alarm rate is also relatively increased. Conversely, when the absolute peak value is decreased, the false alarm rate is decreased, and the detection rate is also relatively decreased. Therefore, the conventional method has a limit in that the detection rate and the false alarm rate depend on the absolute peak value.
An objective of the present invention is to provide an apparatus for detecting transitional parts of speech, by which the detection rate of transitional parts of speech in a noisy environment can be improved, and high quality speech at low bit rates can be eventually obtained.
Another objective of the present invention is to provide a transitional speech detecting method which is performed by the apparatus.
Still another objective of the present invention is to provide a method of effectively synthesizing detected transitional parts of a speech.
To achieve the first objective of the invention, there is provided an apparatus for detecting transitional parts of speech, including: a residual signal preprocessor for emphasizing a period of a speech residual signal which includes a peak value; a relative peak value calculation unit for obtaining a peak value of a preprocessed residual signal and a relative peak value using a predetermined reference peak value; and a transitional part detector for detecting transitional parts of speech on the basis of the relative peak value.
To achieve the second objective of the invention, there is provided a method of detecting transitional parts of speech, comprising: (a) preprocessing a residual signal by emphasizing a period of a speech residual signal which includes a peak value; (b) obtaining the peak value of a preprocessed residual signal; (c) obtaining a relative peak value with respect to the peak signal of the preprocessed residual signal using a predetermined reference peak value; and (d) determining whether transitional parts exist or do not exist, on the basis of the relative peak value.
To achieve the third objective of the invention, there is provided a method of synthesizing transitional parts of speech, including: (a) determining which harmonic, among harmonic components of a pitch, phase information is to be allocated to, when speech is expressed in the frequency domain; (b) allocating the start position of a transitional part and phase information obtained from a phase at the start position, to a harmonic to which phase information is important; and (c) synthesizing corresponding transitional parts using the allocated phase information.