In mobile communication, compression coding of digital information of speech or image is indispensable for effective use of a transmission band. Among those technologies, there are high expectations of speech CODEC technologies widely used for mobile phones and there is a growing demand for better sound quality with respect to conventional high efficiency coding with a high compressibility. Furthermore, since the technologies are used in public, standardization is indispensable, and because of powerful intellectual property rights involved, research and development are being actively conducted in many companies worldwide.
In recent years, the standardization of CODEC capable of coding both speech and music is under study in ITU-T and MPEG, and there is a demand for more efficient and higher quality speech CODEC.
Performance of speech coding technologies has been greatly improved by CELP (Code Excited Linear Prediction) which is a basic scheme that models a speech vocalization mechanism established 20 years ago and skillfully applies vector quantization. CELP is adopted as many standard schemes in international standards such as ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) standard G.729, ITU-T standard G.722.2, ETSI (European Telecommunications Standards Institute) standard AMR (Adaptive Multiple Rate Coding), ETSI standard AMR-WB (Adaptive Multiple Rate Coding-Wide Band) or 3GPP (3rd Generation Partnership Project) 2 standard VMR-WB (Variable Multiple rate-Wide Band).
FIG. 1 is a block diagram illustrating a configuration of a CELP coding apparatus. In FIG. 1, spectral parameters (LSP, ISP or the like) in CELP are quantized.
In FIG. 1, LPC analyzing section 101 applies linear predictive analysis (LPC analysis) to a speech signal, acquires an LPC parameter which is spectral envelope information and outputs the acquired LPC parameter to LPC quantization section 102 and perceptual weighting section 111.
LPC quantization section 102 quantizes the LPC parameter outputted from LPC analyzing section 101. LPC quantization section 102 then outputs the acquired quantized LPC parameter to LPC synthesis filter 109 and outputs an index (code) of the quantized LPC parameter to the outside of CELP coding apparatus 100.
On the other hand, adaptive codebook 103 stores past excitation used in LPC synthesis filter 109, and generates an excitation vector corresponding to one subframe from excitation vectors stored according to an adaptive codebook lag corresponding to an index indicated from distortion minimizing section 112, which will be described later. This excitation vector is outputted to multiplier 106 as an adaptive codebook vector.
Fixed codebook 104 is a codebook of excitation coding (also referred to as “excitation quantization” or “excitation vector coding”). Fixed codebook 104 stores beforehand a plurality of excitation vectors having a predetermined shape and outputs an excitation vector corresponding to the index indicated from distortion minimizing section 112 to multiplier 107 as a fixed codebook vector. Here, fixed codebook 104 is an algebraic excitation, and a case using an algebraic codebook will be described. The algebraic excitation is an excitation adopted in many standard CODECs.
Above-described adaptive codebook 103 is used to express a component with strong periodicity such as voiced sound. On the other hand, fixed codebook 104 is used to express a component with weak periodicity such as white noise.
Gain codebook 105 generates a gain for the adaptive codebook vector (adaptive codebook gain) outputted from adaptive codebook 103 and a gain for the fixed codebook vector (fixed codebook gain) outputted from fixed codebook 104, according to an instruction from distortion minimizing section 112 and outputs the respective gains to multipliers 106 and 107.
Multiplier 106 multiplies the adaptive codebook vector outputted from adaptive codebook 103 by the adaptive codebook gain outputted from gain codebook 105 and outputs the adaptive codebook vector after the multiplication to adder 108.
Multiplier 107 multiplies the fixed codebook vector outputted from fixed codebook 104 by the fixed codebook gain outputted from gain codebook 105 and outputs the fixed codebook vector after the multiplication to adder 108.
Adder 108 adds up the adaptive codebook vector outputted from multiplier 106 and the fixed codebook vector outputted from multiplier 107 and outputs the excitation vector after the addition to LPC synthesis filter 109 as an excitation.
LPC synthesis filter 109 generates a synthesized signal using a filter function, that is, LPC synthesis filter using the quantized LPC parameter outputted from LPC quantization section 102 as a filter coefficient and the excitation vector generated in adaptive codebook 103 and fixed codebook 104 as an excitation. This synthesized signal is outputted to adder 110.
Adder 110 calculates an error signal by subtracting the synthesized signal generated in LPC synthesis filter 109 from the speech signal and outputs this error signal to perceptual weighting section 111. This error signal corresponds to coding distortion.
Perceptual weighting section 111 applies perceptual weighting to the coding distortion outputted from adder 110 using the LPC parameter inputted from LPC analyzing section 101 and outputs the weighted distortion to distortion minimizing section 112.
Distortion minimizing section 112 acquires each index (code) of adaptive codebook 103, fixed codebook 104 and gain codebook 105 such that the coding distortion outputted from perceptual weighting section 111 becomes a minimum for each subframe and outputs these indices to the outside of CELP coding apparatus 100 as coded information. To be more specific, distortion minimizing section 112 generates a synthesized signal based on above-described adaptive codebook 103 and fixed codebook 104, a series of processes of acquiring coding distortion of this signal constitutes closed-loop control (feedback control) and distortion minimizing section 112 searches each codebook by changing the index to be indicated to each codebook in various ways within one subframe and outputs the finally acquired index of each codebook that minimizes coding distortion.
With an algebraic codebook used as a fixed codebook (also referred to as “stochastic codebook”), better coding performance can be obtained with a limited amount of calculation. The algebraic codebook is widely used in above-described ITU-T standard G.729, ITU-T standard G.722.2, ETSI standard AMR or ETSI standard AMR-WB or the like.
Here, the principles of the basic algorithm of fixed codebook search will be described.
First, a search for an excitation vector (also called “codebook vector” or “code vector”) and derivation of the code will be performed by searching an excitation vector that minimizes coding distortion of equation 1 below.E=|u−(pHa+qHs)|2  (Equation 1)
where E is coding distortion,
u is a coding target,
p is a gain of the adaptive codebook vector,
H is a perceptual weighting synthesis filter,
a is an adaptive codebook vector,
q is a gain of the fixed codebook vector, and
s is a fixed codebook vector.
Since the adaptive codebook vector and the fixed codebook vector are generally searched through sequential optimization (in separate loops), the code of fixed codebook 104 is derived by searching the fixed codebook vector that minimizes coding distortion in equation 2 below.v=u−pHa E=|v−qHs|2  (Equation 2)
where E is coding distortion,
u is a coding target (perceptual weighted speech signal),
p is an optimum gain of the adaptive codebook vector,
H is a perceptual weighting synthesis filter,
a is an adaptive codebook vector (also called “adaptive excitation”),
q is a gain of the fixed codebook vector (also called “fixed excitation”),
s is a fixed codebook vector, and
v is a target vector of a fixed codebook search (also called “fixed excitation search”).
Here, since gains p and q are determined after searching the code of the excitation, the search will be continued with ideal gains (also called “optimum gain”). The ideal gain means a gain that minimizes coding distortion. Therefore, equation 2 above may also be expressed as equation 3 below.
                    (                  Equation          ⁢                                          ⁢          3                )                                                                      v          =                      u            -                                          u                ·                Ha                                                                                Ha                                                  2                                                    ⁢                                  ⁢                  E          =                                                                  v                -                                                                            v                      ·                      Hs                                                                                                              Hs                                                                    2                                                        ⁢                  Hs                                                                    2                                              [        3        ]            
Minimizing equation 3 indicating coding distortion is equal to maximizing cost function C of equation 4 below.
                    (                  Equation          ⁢                                          ⁢          4                )                                                            C        =                                            (                                                v                  t                                ⁢                                  H                  ·                  s                                            )                        2                                              s              t                        ⁢                          H              t                        ⁢            Hs                                              [        4        ]            
Therefore, in the case of a search for fixed codebook vector s composed of a small number of pulses such as an algebraic codebook, if vtH and HtH are calculated beforehand, above-described cost function C can be calculated with a small amount of calculation. That is, coding of the algebraic codebook is performed using an algorithm of searching a position and polarity that maximize above-described cost function C through a multiplexed loop of the same number of channels (also called “tracks”) as that of pulses.
Furthermore, in the case of the pulse excitation, by preliminarily selecting the polarity based on whether the value of vtH is positive or negative simultaneously with multiplying the value of vtH and the value of HtH by the polarity, a polarity search can be omitted when searching pulse positions. This preliminary selection of polarity allows the amount of calculation to be exponentially saved.
In recent years, CODEC for encoding a wideband signal (16 kHz sampling) and an ultra-wideband signal (32 kHz sampling) is required in order to meet the need of higher quality, and the standardization is being carried forward in ITU-T, MPEG (Moving Picture Experts Group) or 3GPP or the like. As the bit rate for encoding wideband and ultra-wideband digital signals increases, the number of information bits of the fixed codebook for encoding an excitation also increases. The algebraic codebook allows high performance to be obtained through a search in simultaneous optimization (multiplexed loop), and therefore when the number of bits increases (the number of pulses increases), the amount of calculation increases exponentially.
Thus, in high quality CODEC, a method is widely adopted whereby pulses (channels or tracks) to be searched are divided into several groups, a search within each group is performed through a simultaneous optimization search and a search between groups is performed through sequential optimization. For example, when there are 32 candidates for positions at which one pulse among four pulses rises, 1048576 (32 to the fourth power) matching operations need to be performed. However, if four pulses are divided into two groups of two pulses each to be searched, performing a calculation of 32 to the second power twice requires only 2048 matching operations. In this case, since this search is a sequential optimization search, performance drops to a certain extent compared to a full simultaneous optimization search, but since the group is closed, there is no significant deterioration. For this reason, searches are performed by this grouping in recent years.
As an example of this pulse search, a case will be described using FIG. 2 where a search is performed assuming that the group size is two pulses. FIG. 2 is a conceptual diagram illustrating a flow of conventional fixed codebook search processing. The flow of search processing of a fixed codebook can be expressed as shown in FIG. 2 in a simplified manner. As shown in FIG. 2, two-pulse searches are performed the required number of times until the required number of pulses are searched. The results (pulses) obtained through the respective two-pulse searches are organized into a pulse sequence (not shown). There are various methods for organizing the results of two-pulse searches. For example, ITU-T standard G.718 uses an algorithm of increasing the number of pulses while searching two pulses at a time in a four-track configuration (see NPL 1). Here, the description thereof will be omitted.
FIG. 3 is a diagram schematically illustrating an algorithm of searching pulse positions by conventional grouping. FIG. 3 corresponds to one of the two-pulse searches in FIG. 2.
Before starting a search, parameters necessary for the search are determined using an inputted target or the like as preprocessing. Here, the target can be expressed by a vector and the target corresponds to target vector v of the aforementioned fixed codebook search. As parameters, target time inverse order synthetic vector (polarity preliminarily selected) vtH, correlation matrix between pulse synthetic vectors (polarity preliminarily selected) HtH, track number, and interval of pulse candidate positions of each track or the like are provided.
A search loop of track 1 is executed in a search loop of track 0 using these parameters. That is, the search loop of track 0 and the search loop of track 1 constitute a multiplexed loop. By conducting a search using this multiplexed loop, it is possible to obtain the searched position of each track, a correlation value forming the base of a synthetic numerator term until this search and a synthetic denominator term until this search.
FIG. 4 is a flowchart indicating an algorithm of searching pulse positions by conventional grouping. FIG. 4 is a more specific illustration of FIG. 3.
In FIG. 4, symbol “d[n]” is a target time inverse order synthetic vector (polarity preliminarily selected). Symbol “c[n][m]” denotes a correlation matrix (polarity preliminarily selected) between pulse synthetic vectors, where n≠m and values of n and m are doubled. Symbols “x” and “y” denote pulse candidate positions. Symbols “xx” and “yy” denote finally searched pulse positions. “Track 0” or “track 1” denotes a track number (one of 0, 1, 2 and 3 in FIG. 2). Symbol “ps_t” denotes the base of the numerator term of cost function C until before performing a search. Symbol “alp_t” denotes a synthetic value of the denominator term of cost function C until before performing a search. Symbol “L” denotes a subframe length. Symbol “step” denotes an interval of pulse candidate positions of each track (“4” in FIG. 4).
In FIG. 4, the flow starts when the aforementioned necessary parameters are inputted first. Then, to search pulse candidate position x, numerator term sqk of cost function C is set to “−1.0” and denominator term alpk is set to “1.0” (step ST11).
Next, it is determined whether or not pulse candidate position x is smaller than subframe length L (step ST12).
When pulse candidate position x is smaller than subframe length L (step ST12: yes), calculations ps0=ps_t+d[x] and alp0=alp_t+c[x][x] are performed (step ST13).
Next, a search of pulse candidate position y is started (step ST14), and it is determined whether or not pulse candidate position y is smaller than subframe length L (step ST15).
When pulse candidate position y is equal to or greater than subframe length L (step ST15: no), the next candidate position is selected (x=x+step) (step ST16) and the process is returned to step ST12.
On the other hand, when pulse candidate position y is smaller than subframe length L (step ST15: yes), calculations ps1=ps0+d[y], alp1=alp0+c[y][y]+c[x][y] and sq=ps1*ps1 are performed (step ST17).
Next, it is determined whether or not the value of (alpk*sq) is greater than the value of (sqk*alp1) (step ST18).
When the value of (alpk*sq) is equal to or smaller than the value of (sqk*alp1) (step ST18: no), the next candidate position is selected (y=y+step) (step ST19), and the process is returned to step ST15.
On the other hand, when the value of (alpk*sq) is greater than the value of (sqk*alp1) (step ST18: yes), the denominator term and the numerator term of cost function C are fixed, finally searched pulse positions xx and yy are fixed (step ST20), and the process is returned to step ST19.
Furthermore, in step ST12, when pulse candidate position x is equal to or greater than subframe length L (step ST12: no), ps_t=ps_t+d[xx]+d[yy] is calculated, and the denominator term of cost function C before conducting a search is determined to be the final denominator term of cost function C (step ST21).
Next, final pulse positions xx and yy, and synthetic value alp_t of the denominator term of cost function C and the value of numerator term ps_t of cost function C in that case are outputted (step ST22).
In FIG. 4, the numerator term of cost function C in equation 4 is sq and the denominator term is alp1. Cost function C is obtained by dividing the numerator term by the denominator term, but since the division involves a large amount of calculation, cross-multiplication is adopted when determining the magnitude of cost function C.
However, even when the above-described grouping is used, if the number of bits further increases, the amount of calculation becomes enormous.
Thus, ITU-T standard G.718 adopts a preliminary selection of pulse positions when searching pulses (see NPL 1). The preliminary selection of pulse positions means that positions where pulses are likely to rise or assumed to rise are selected beforehand from among pulse candidate positions to thereby reduce the number of pulse candidate positions that are included in the next loop.
An overview of a flow of search processing of the fixed codebook using a preliminary selection of pulse positions adopted in G.718 will be described using FIG. 5. FIG. 5 is a conceptual diagram illustrating a flow of conventional fixed codebook search processing. As described above, there are various methods of organizing the two-pulse search results, and G.718 uses an algorithm of increasing the number of pulses while searching two pulses at a time in a four-track configuration. Here, the description thereof will be omitted.
As shown in FIG. 5 as in the case of FIG. 2, this search processing of the fixed codebook performs two-pulse searches the required number of times, but performs a preliminary selection in the respective two-pulse searches. At this time, a relationship in ascending order of magnitude is set in the preliminary selection number of the respective two-pulse searches.
FIG. 6 is a diagram schematically illustrating an algorithm of searching pulse positions when performing a preliminary selection of pulse positions adopted in G.718. FIG. 6 corresponds to one of the two-pulse searches shown in FIG. 5.
As in the case of FIG. 3, before starting a search, parameters necessary for the search are acquired using an inputted target or the like as preprocessing. Here, the target can be expressed by a vector and corresponds to target vector v of the aforementioned fixed codebook search. As parameters, target time inverse order synthetic vector (polarity preliminarily selected) vtH, correlation matrix between pulse synthetic vectors (polarity preliminarily selected) HtH, track number, and interval of pulse candidate positions of each track or the like are provided.
By performing a preliminary selection of track 0 in the search loop of track 0 using these parameters, the number of times of entering the search loop of track 1 in the search loop of track 0 is limited. In the case of FIG. 6 as in the case of FIG. 3, the search loop of track 0 and the search loop of track 1 also constitute a multiplexed loop. By conducting a search using this multiplexed loop, the searched position of each track, correlation value forming the base of the synthetic numerator term until this search and the synthetic denominator term until this search are obtained.
FIG. 7 is a flowchart illustrating an algorithm of searching pulse positions when performing a preliminary selection of pulse positions adopted in G.718. FIG. 7 is a more specific illustration of FIG. 6. In FIG. 7, parts having the same processing as that in FIG. 4 are assigned the same reference numerals and descriptions thereof will be omitted.
In FIG. 7, symbol “pick[n]” denotes a sequence in which the order of adopting each pulse position is described in the pulse position. Symbol “thres” denotes a value acquired from the number of candidates of specified candidate position x. Furthermore, the specified number of candidates are searched through a search only when pick[n] is a value smaller than the value of thres. Meanings of the other symbols are the same as those in FIG. 4, and so descriptions thereof will be omitted.
In FIG. 7, when pulse candidate position x is smaller than subframe length L (step ST12: yes), it is determined whether or not pick[x] is smaller than the value of thres (step ST50).
When pick[x] is smaller than the value of thres (step ST50: yes), calculations ps0=ps_t+d[x] and alp0=alp_t+c[x][x] are performed (step ST13).
On the other hand, when pick[x] is equal to or greater than the value of thres (step ST50: no), the next candidate position is selected (step ST16), and the process is returned to step ST12.
FIG. 8 is a block diagram illustrating a configuration of fixed codebook searching apparatus 300 that can conduct a pulse search of the fixed codebook using the above-described conventional pulse searching method.
Preprocessing section 301 receives a target signal as input and acquires parameters necessary for a pulse search. Examples of parameters generated by calculation include “target time inverse order synthetic vector” (corresponding to vtH in equation 4) for which a polarity preliminary selection of each pulse position is performed and the result thereof is reflected, and “correlation matrix between pulse synthetic vectors” (corresponding to HtH in equation 4) for which the result of the polarity preliminary selection is reflected and values other than the diagonal term are doubled. Furthermore, examples of the parameters to be set include a track number to be searched, interval of pulse position candidates of the track with the number, subframe length and preliminary selection number. Preprocessing section 301 sends these parameters to control section 302.
Control section 302 receives the total number of bits as input, sends parameters necessary for a pulse search to multiplexed loop searching section 303 according to a timing signal from pulse sequence coding section 304, which will be described later, and controls multiplexed loop searching section 303 so as to conduct a pulse search. Examples of the parameters to be sent to multiplexed loop searching section 303 include, in addition to parameters sent from preprocessing section 301, a sequence in which the order of adopting each pulse position is described in the pulse position, a synthetic value of the numerator term and a synthetic value of the denominator term until before conducting a search. Control section 302 initializes the synthetic value of the numerator term and the synthetic value of the denominator term until before conducting a search when first driving multiplexed loop searching section 303 and sends the synthetic values sent from pulse sequence coding section 304 to multiplexed loop searching section 303 when conducting two-pulse searches in subsequent stages.
Multiplexed loop searching section 303 searches pulse positions using the multiplexed loop. In this case, multiplexed loop searching section 303 performs a preliminary selection in the outermost loop using the preliminary selection number and the sequence in which the order of adopting each pulse position is described in the pulse position, and outputs the searched pulse position, and the synthetic value of the numerator term and the synthetic value of the denominator term calculated in the pulse positions, to pulse sequence coding section 304.
Pulse sequence coding section 304 performs pulse coding using the pulse position searched by multiplexed loop searching section 303, the synthetic value of the numerator term and the synthetic value of the denominator term. Pulse coding as this fixed codebook is performed using the result of multiplexed loop searching section 303 operating a plurality of times. Pulse sequence coding section 304 sends the synthetic value of the numerator term and the synthetic value of the denominator term to control section 302 and sends a timing signal for urging multiplexed loop searching section 303 to perform the next operation to control section 302. Pulse sequence coding section 304 then finally outputs the code as the fixed codebook.
Here, when using the search processing in FIG. 2 as the pulse searching method, multiplexed loop searching section 303 does not include any configuration to perform a preliminary selection. As the pulse searching method, when the search processing in FIG. 5 is used, multiplexed loop searching section 303 is configured to perform a preliminary selection.
Thus, in order to adapt to a high bit rate fixed codebook search, methods have been conventionally invented such as a method of grouping channels (tracks) in an algebraic codebook to conduct a search in units of a small number of pulses and conduct individual searches for a small number of pulses in a closed loop, and further a method of performing a preliminary selection of positions in an outside loop then to further reduce the amount of calculation, so that high quality speech or music can be encoded while suppressing the amount of calculation.