In such fields as packet communication typified by digital communication and Internet communication, or speech storage, speech signal encoding/decoding technology is essential for making efficient use of radio wave transmission path capacity and storage media, and many speech encoding/decoding methods have been developed to date.
Among these, a CELP (Code Excited Linear Prediction) type speech encoding/decoding method is widely used as a mainstream method when encoding/decoding speech signals at a medium or low bit rate. A CELP type speech encoding/decoding method is disclosed in Document 1 (Proc. ICASSP '85, pp. 937-pp. 940, 1985).
In a CELP type speech encoding/decoding method, a digitized speech signal is divided into frames of approximately 20 ms, linear predictive analysis of the speech signal is performed every frame and the linear predictive count and linear predictive residual vector are found, and this linear predictive count and linear predictive residual vector are encoded/decoded individually. This linear predictive residual vector is also called an excitation signal vector.
A linear predictive residual vector is encoded/decoded using an adaptive code book that holds drive sound source signals generated in the past and a fixed code book that stores a specific number of fixed-form vectors (fixed code vectors).
This adaptive code book is used to represent a cyclic component possessed by a linear predictive residual vector. On the other hand, the fixed code book is used to represent a non-cyclic component in a linear predictive residual vector that cannot be represented with the adaptive code book. In general, linear predictive residual vector encoding/decoding processing is performed in subframe units resulting from dividing frames into shorter time units (of approximately 5 ms to 10 ms).
With CELP, the pitch cycle is sought from a linear predictive residual vector, and coding is performed. A conventional linear predictive residual pitch cycle search apparatus is described below. FIG. 1 is a block diagram showing the configuration of a conventional pitch cycle search apparatus.
The pitch cycle search apparatus 10 in FIG. 1 is mainly composed of a Pitch Cycle Indicator (PCI) 11, Adaptive Code Book 12 (ACB), Adaptive Sound Source Vector Generator (ASSVG) 13, Integral Pitch Cycle Searcher (IPCS) 14, Fractional Pitch Cycle Adaptive Sound Source Vector Generator (FPCASSVG) 15, Fractional Pitch Cycle Searcher (FPCS) 16, and Distortion Comparator (DC) 17.
The Pitch Cycle Indicator (PCI) 11 sequentially indicates to the Adaptive Sound Source Vector Generator (ASSVG) 13 desired pitch cycles T-int within a preset pitch cycle search range. For example, when the CELP speech encoding/decoding apparatus performs encoding and decoding of a 16 kHz speech signal, and the target vector pitch cycle search range is preset from 32 to 267 at integral accuracy, and from 32+½, 33+½, . . . , to 51+½ at ½ fractional accuracy, the Pitch Cycle Indicator (PCI) 11 outputs 236 kinds of pitch cycle T-int (T-int=32, 33, . . . , 267) to the Adaptive Sound Source Vector Generator (ASSVG) 13. The Adaptive Code Book 12 (ACB) stores drive sound source signals generated in the past.
Next, the Adaptive Sound Source Vector Generator (ASSVG) 13 extracts from the Adaptive Code Book 12 (ACB) the adaptive sound source vector p (t-int) that has integral-accuracy pitch cycle T-int received from the Pitch Cycle Indicator (PCI) 11, and outputs it to the Integral Pitch Cycle Searcher (IPCS) 14.
The processing for extracting adaptive sound source vector p (t-int) that has integral-accuracy pitch cycle T-int from the Adaptive Code Book 12 (ACB) is described below. FIG. 2 is a drawing showing an example of frame configuration.
In FIG. 2, frame 21 and frame 31 are past drive sound source signal sequences stored in the adaptive code book. The Adaptive Sound Source Vector Generator (ASSVG) 13 searches for the frame pitch cycle between lower limit 32 and upper limit 267 of the pitch cycle search range.
As pitch cycle 22 retrieved from frame 21 here is longer than the length of subframe 23, the Adaptive Sound Source Vector Generator (ASSVG) 13 takes section 23 extracted from frame 21 for the frame length of the subframe as the adaptive sound source vector.
Also, as pitch cycle 32 retrieved from frame 31 is shorter than the length of subframe 33, the Adaptive Sound Source Vector Generator (ASSVG) 13 extracts the adaptive sound source vector up to pitch cycle 32, and takes vector section 34, obtained by iterating extracted vector section 33 up to the length of the subframe length, as the adaptive sound source vector.
Moreover, the Adaptive Sound Source Vector Generator (ASSVG) 13 extracts from the Adaptive Code Book 12 (ACB) the adaptive sound source vector necessary when finding the adaptive sound source vector corresponding to a fractional-accuracy pitch cycle, and outputs this to the Fractional Pitch Cycle Adaptive Sound Source Vector Generator (FPCASSVG) 15.
Next, the Integral Pitch Cycle Searcher (IPCS) 14 calculates integral pitch cycle selection measure DIST (T-int) from adaptive sound source vector p (t-int) that has integral pitch cycle T-int, combining filter impulse response matrix H, and target vector X.
Equation (1) is the equation for calculating integral pitch cycle selection measure DIST (T-int).
                                          DIST            ⁡                          (                              T                -                int                            )                                =                                                    [                                  xHp                  ⁡                                      (                                          T                      -                      int                                        )                                                  ]                            2                                                                                        Hp                  ⁡                                      (                                          T                      -                      int                                        )                                                                              2                                      ⁢                                  ⁢                  (                                                    T                -                int                            =              32                        ,            33            ,            …            ⁢                                                  ,            267                    )                                    Equation        ⁢                                  ⁢                  (          1          )                    
When calculating integral pitch cycle selection measure DIST (T-int), matrix H′, obtained by multiplying combining filter impulse response matrix H by auditory weighting filter impulse response matrix W, may be used in Equation (1) instead of combining filter impulse response matrix H.
Here, the Integral Pitch Cycle Searcher (IPCS) 14 repeatedly executes integral pitch cycle selection measure DIST (T-int) calculation processing using Equation (1) for 236 variations of pitch cycle T-int from pitch cycle 32 to 267 indicated by the Pitch Cycle Indicator (PCI) 11.
The Integral Pitch Cycle Searcher (IPCS) 14 also selects the DIST (T-int) with the largest value from the 236 calculated integral pitch cycle selection measures DIST (T-int), and outputs the selected DIST (T-int) to the Distortion Comparator (DC) 17. In addition, the Integral Pitch Cycle Searcher (IPCS) 14 outputs an index corresponding to adaptive sound source vector pitch cycle T-int, referenced when calculating DIST (T-int), to the Distortion Comparator (DC) 17 as IDX (INT).
Next, the Fractional Pitch Cycle Adaptive Sound Source Vector Generator (FPCASSVG) 15 finds adaptive sound source vector p (T-frac) that has fractional-accuracy pitch cycle T-frac (32+½, 33+½, . . . , 51+½) by a product-sum operation on the adaptive sound source vector received from the Adaptive Sound Source Vector Generator (ASSVG) 13 and a SYNC function, and outputs this p (T-frac) to the Fractional Pitch Cycle Searcher (FPCS) 16.
The Fractional Pitch Cycle Searcher (FPCS) 16 then calculates fractional pitch cycle selection measure DIST (T-frac) from the adaptive sound source vector p (T-frac) that has fractional pitch cycle T-frac, combining filter impulse response matrix H, and target vector X. Equation (2) is the equation for calculating fractional pitch cycle selection measure DIST (T-frac).
                                          DIST            ⁡                          (                              T                -                frac                            )                                =                                                    [                                  xHP                  ⁡                                      (                                          T                      -                      frac                                        )                                                  ]                            2                                                                                        Hp                  ⁡                                      (                                          T                      -                      frac                                        )                                                                              2                                      ⁢                                  ⁢                  (                                                    T                -                frac                            =                              32                +                                  1                  2                                                      ,                          33              +                              1                2                                      ,            …            ⁢                                                  ,                          51              +                              1                2                                              )                                    Equation        ⁢                                  ⁢                  (          2          )                    
When calculating fractional pitch cycle selection measure DIST (T-frac), matrix H′, obtained by multiplying combining filter impulse response matrix H by auditory weighting filter impulse response matrix W, may be used in Equation (2) instead of combining filter impulse response matrix H.
Here, the Fractional Pitch Cycle Searcher (FPCS) 16 repeatedly executes fractional pitch cycle selection measure DIST (T-frac) calculation processing using Equation (2) for 20 variations of fractional pitch cycle T-frac from pitch cycle 32+½ to 51+½.
The Fractional Pitch Cycle Searcher (FPCS) 16 also selects the DIST (T-frac) with the largest value from the 20 calculated fractional pitch cycle selection measures DIST (T-frac), and outputs the selected DIST (T-frac) to the Distortion Comparator (DC) 17.
In addition, the Fractional Pitch Cycle Searcher (FPCS) 16 outputs an index corresponding to adaptive sound source vector pitch cycle T-frac, referenced when calculating DIST (T-frac), to the Distortion Comparator (DC) 17 as IDX (FRAC).
Next, the Distortion Comparator (DC) 17 compares the values of DIST (INT) received from the Integral Pitch Cycle Searcher (IPCS) 14 and DIST (FRAC) received from the Fractional Pitch Cycle Searcher (FPCS) 16. Then the Distortion Comparator (DC) 17 determines the pitch cycle when pitch cycle selection measure DIST with the larger value of DIST (INT) and DIST (FRAC) is calculated as the optimal pitch cycle, and outputs the index corresponding to the optimal pitch cycle as optimal index IDX.
When, as in the above example, an integral-accuracy pitch cycle search range from 32 to 267, and a fractional-accuracy pitch cycle search range from 32+½ to 51+½, are selected as the pitch cycle search ranges, a total of 256 (256=236+20) integral-accuracy and fractional-accuracy pitch cycle search candidates are provided, and optimal index IDX is coded as 8-bit binary data.
The above-described “linear predictive residual pitch cycle search apparatus using an adaptive code book” is characterized by both performing a pitch cycle search at integral accuracy and performing a ½ fractional-accuracy pitch cycle search in a section corresponding to a shorter pitch cycle than the pitch cycle search range at integral accuracy, and performing selection of a final pitch cycle from the optimal pitch cycle retrieved at integral accuracy and the optimal pitch cycle retrieved at fractional accuracy.
Thus, with a conventional pitch search apparatus, linear predictive residual pitch cycles can be encoded/decoded efficiently for a female voice, which contains many comparatively short pitch cycles. The above characteristic and effect are disclosed in Document 2 (IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, pp. 31-pp. 41, VOL. 13, No. 1, JANUARY 1995), etc.
However, with a conventional pitch search apparatus, the range for searching for a pitch cycle at fractional accuracy is limited to short pitch cycles, and therefore, for a male voice, which contains many comparatively long pitch cycles, pitch cycles are searched for outside the range in which pitch cycles are searched for at fractional accuracy, and pitch cycles are searched for at integral accuracy only, with a resultant problem that pitch cycle resolution falls, and it is difficult to perform encoding/decoding efficiently.