1. Field of the Invention
The present invention relates to a pitch period extracting apparatus of a speech signal. More specifically, the present invention relates to a pitch period extracting apparatus which extracts a pitch period of an inputted speech signal by evaluating a delay time at which a maximum autocorrelative value is obtainable.
2. Description of the Prior Art
As methods for extracting a pitch period of a speech signal with utilizing autocorrelative values, two methods are known. A first method is a method utilizing a short-time autocorrelation, and a second method is a method utilizing a modified short-time autocorrelation.
In the first method, it is assumed that the speech signal in restricted in time, and autocorrelative values are evaluated by regarding as that the speech signal exists within only a period of a time length Ts and the speech signal is always zero out of the period. In the second method, it is assumed that the speech signal is not restricted in time, and autocorrelative values between a period of a time length Tt and a period determined by delaying the period of the time length Tt within a range in which a presence of a pitch period is assumed.
Now, if a waveform of an inputted speech signal is represented by digital speech data x(n), the short-time autocorrelative value Rn(k) in the first method is given by the following equation (1).                                           Rn            ⁡                          (              k              )                                =                                    ∑                              m                =                0                                            Ts                -                1                -                k                                      ⁢                                                   ⁢                                          x                ⁡                                  (                                      n                    +                    m                                    )                                            ·                              x                ⁡                                  (                                      n                    +                    m                    +                    k                                    )                                                                    ⁢                                  ⁢                              m            =            0                    ,          1          ,          2          ,          …          ⁢                                           ,                      Ts            -            1            -            k                                              (        1        )            
In the equation (1), “Ts” indicates a time period in which a presence of the speech signal is assumed, and “k” is a delay time for delaying the speech signal waveform in calculating the short-time autocorrelative value Rn(k).
Furthermore, the modified short-time auto correlative value R′n(k) in the second method is given by the following equation (2).                                                         R              ′                        ⁢                          n              ⁡                              (                k                )                                              =                                    ∑                              m                =                0                                            Tt                -                1                                      ⁢                                                   ⁢                                          x                ⁡                                  (                                      n                    +                    m                                    )                                            ·                              x                ⁡                                  (                                      n                    +                    m                    +                    k                                    )                                                                    ⁢                                  ⁢                              m            =            0                    ,          1          ,          2          ,          …          ⁢                                           ,                      Tt            -            1                                              (        1        )            
In addition, in the equation (2), “k” is a delay time for delaying a speech signal waveform in calculating the short-time autocorrelative value R′n(k), and having a relationship of Ts>Tt>>k.
As well seen from the equations (1) and (2), in the first method, a range in which a product sum is calculated in evaluating the autocorrelative value (hereinafter, may be called as “product sum range”) is decreased according to an increase of the delay time k, and in contrast, in the second method, the product sum range is constant irrespective of the delay time k.
FIG. 6 is a graph showing a relationship of weights in the first method and the second method, and an axis of abscissa indicates a delay time k (samples), and an axis of ordinate indicates a rate of the weights with respect to the autocorrelative values. In addition, in the first method, the time length Ts is set as Ts=200 samples, for example. As seen from FIG. 6, it is understood that the autocorrelative values having the longer period, the smaller weight in the first method, but in the second method, the autocorrelative values are evenly weighted irrespective of the period.
Therefore, there is not a possibility that double a true pitch period is erroneously evaluated as a pitch period in the first method; however, in the second method, there is a possibility that double a true pitch period is erroneously evaluated as a pitch period. That is, in comparison with the second method, the first method is advantageous in a point of an accuracy of a pitch period.
However, in comparison with the second method, the first method is disadvantageous in a point of a processing time. More specifically, in the first method, the autocorrelative values are weighted with extremely large weights when a pitch period is short, while the autocorrelative values are weighted with extremely small weights when a pitch period is long. Therefore, in the case of a long pitch period, it is necessary to prevent the autocorrelative value from becoming to be smaller than autocorrelative value having a short period which is not a pitch period. Accordingly, in the first method, in order to calculate a pitch period with precision, it is necessary to set the time period Ts at a degree of a time length of at least double a possible longest pitch period (k=100 in FIG. 6). Therefore, in the first method, there is a disadvantage that the processing time becomes long. In contrast, in the second method, since the weights are constant irrespective of the pitch period, the time length Tt may be set at a degree of a time length equal to a pitch period, and therefore, the processing time is short.
In other words, in the first method, there is an advantage that it is possible to extract a pitch period with precision but a disadvantage that the processing time is long, and in the second method, there is an advantage that the processing time is short but a disadvantage that there is a possibility that an erroneous pitch period is extracted.