The present invention relates to a speech coding apparatus and, more particularly, to a speech coding apparatus for coding an input speech signal using an MPEG-4/CELP scheme as one of code excited linear prediction coding schemes of modeling a sound source using a multipulse.
MPEG-4/CELP (Moving Picture Experts Group phase 4) is one of CELP (Code Excited Linear Prediction) schemes as general-purpose speech coding schemes standardized by ISO/IEC (International Organization for Standardization/International Electrotechnical Commission) in February, 1999. There are two coding modes, MPE (MultiPulse Excitation) and RPE (Regular Pulse Excitation) in accordance with the type of sound source code book. In both the MPE and RPE modes, the sound source is modeled by a multipulse made up of a plurality of impulses. However, the degrees of freedom for the pulse position have a difference. The RPE mode uses a constant pulse interval, whereas the MPE mode has a high degree of freedom for the pulse position. Because of this difference, the MPE mode can achieve higher speech quality than in the RPE mode, but suffers a large required calculation amount.
The basic operation of a speech coding apparatus using the MPEG-4/CELP scheme as a speech coding apparatus for the MPE mode will be described with reference to FIG. 5.
As shown in FIG. 5, this speech coding apparatus is constituted by an LPC (Linear Prediction Codec) analysis unit 401, quantization unit 402, LPC filter 403, speech synthesis unit 404, and subtracter 412.
Speed coding is done by segmenting input speech into frames each with a predetermined time, and using the frame as a compression unit.
An input speech signal as original speech is subjected to LPC analysis by the LPC analysis unit 401, and quantized by the quantization unit 402. A code speech-synthesized by the speech synthesis unit 404 and a code quantized by the quantization unit 402 are filtered by the LPC filter 403 to generate reproduced speech. The subtracter 412 calculates the difference between the original speech and the reproduced speech, and outputs an error signal 405. The error signal 405 is input to the speech synthesis unit 404 to select and the parameters of the speech synthesis unit 404 so as to minimize the error signal 405. When the error signal 405 minimizes, the speech synthesis model and input speech are approximate to each other. The parameters of the speech synthesis unit 404 which minimize the error signal 405 form an MPEG-4/CELP code.
The speech synthesis unit 404 comprises multipliers 409 and 410, an adder 411, and three parameters, an ACB (Adaptive Code Book) 406, MP (MultiPulse) code book 407, and GCB (Gain Code Book) 408.
The ACB 406 is generated from many basic speech models of a corresponding person on the basis of the primitive period of the sound source, and generates a pitch period component. The MP code book 407 expresses the noise/error of the sound source by the positions and amplitudes of a plurality of pulses (multipulse), and generates a random component other than the pitch period component. The GCB 408 represents the mixing ratio of the ACB 406 and MP code book 407. That is, the multiplier 409 multiplies a pitch period component generated by the ACB 406 by the mixing ratio of the ACB 406 controlled by the GCB 408, while the multiplier 410 multiplies a random component generated by the MP code book 407 by the mixing ratio of the MP code book 407 controlled by the GCB 408. Outputs from the multipliers 409 and 410 are added by the adder 411 to perform speech synthesis.
Processing of selecting a multipulse which minimizes the error signal 405 from the MP code book 407 is called multipulse search processing. The multipulse search processing method as the feature of the MPE mode is disclosed in Japanese Patent Laid-Open No. 7-160298.
In multipulse search processing, a position where each pulse can be set is uniquely determined for each pulse. Therefore, in multipulse search processing, distortions are calculated and added for respective set pulse position candidates in ascending order of pulse numbers, and a combination exhibiting the smallest distortion is obtained. The xe2x80x9cdistortionxe2x80x9d is a correlation coefficient between adjacent pulses. Multipulse search processing creates a multipulse search table which stores a distortion for each pulse position candidate set for each pulse number, and determines the position and amplitude of each pulse based on the multipulse search table. This multipulse search table must be created for each frame serving as a speech compression unit.
FIG. 6 shows the structure of the MP code book 407 for performing multipulse search processing in a conventional speech coding apparatus.
A search table creation unit 508 creates a multipulse search table 307 on the basis of an inter-pulse distortion table 301 and pulse position candidate table 302.
The contents of the pulse position candidate table 302 are shown in Table 1.
The pulse position candidate table exists for each compression bit rate. Table 1 represents a pulse position candidate table for an MPEG-4/CELP compression bit rate of 8,300 bps. The number of pulses is five, and pulses are given by pulse numbers 1, 2, . . . , 5 sequentially from the top. For a bit rate of 8,300 bps, the number of samples in one frame serving as a compression unit is 40, and 40 pulses having an amplitude of xc2x11 are modeled to be expressed by five pulses. The pulse position candidate table in Table 1 has pulse position candidates for each pulse number. The pulse position candidate interval for each pulse number is uniquely determined.
As the modeling method, the pulse position candidate table is arranged at the nodes of a tree structure as shown in FIG. 7.
FIG. 8 shows the structure of the multipulse search table 307. The structure of the multipulse search table 307 stores a distortion 704 between adjacent pulses for each pulse position candidate 703 present for each pulse number 702. The pulse interval in obtaining each pulse position candidate and a distortion between adjacent pulses varies from 1 to the maximum number of samples of one frame at a pulse position candidate interval. Distortions are calculated every pulse interval, and stored as the inter-pulse distortion table 301 as shown in FIG. 6.
Multipulse search processing in the conventional speech coding apparatus will be explained with reference to the flow charts of FIGS. 9 and 10.
The multipulse search processing sequence has a quadruple loop structure made up of, sequentially from the outer loop, a loop whose end condition (step S901) is whether processing has been performed up to the maximum pulse position candidate interval from an initial value of 1 at a distance increment of 1 using an inter-pulse distance for obtaining a distortion as an index, a loop whose end condition (step S902) is whether processing has been performed for the maximum number of samples of one frame from an initial number of 1 at a pulse position candidate interval of 1, a loop whose end condition (step S903) is whether processing has been performed for the number of pulses to be modeled, i.e., pulse numbers, and a loop whose end condition (step S904) is whether processing has been performed for the number of pulse position candidates at each pulse number. Whether processing has been done for the maximum number of samples of one frame from an initial number of 1 at a pulse position candidate interval of 1 is determined (step S902). Then, a distortion between pulses having a distance set by the outermost loop is obtained, and distortions of one frame are stored in the inter-pulse distortion table 301 (step S905). In these loops, the multipulse search table 307 is created (step S906).
FIG. 10 shows a sequence of creating the multipulse search table 307 in step S906 of FIG. 9.
The start addresses of the pulse position candidate table and inter-pulse distortion table 301 are respectively set as the current pointers of the pulse position candidate table 302 and inter-pulse distortion table 301 (step S1001). In practice, the pulse position candidate table 302 is one-dimensionally arrayed in ascending order of pulse numbers. Whether processing for pulse numbers ends is checked (step S1002). If YES in step S1002, multipulse search table creation processing ends. If NO in step S1002, the start address of the multipulse search table 307 is set as the current address of the multipulse search table 307 (step S1003).
Whether processing for the number of pulse position candidates ends is checked (step S1004). If YES in step S1004, the pulse number is incremented by one (step S1005), and the flow returns to step S1002 for checking whether processing for pulse numbers ends. If NO in step S1004, a pulse position is read out from the current pointer of the pulse position candidate table 302 (step S1006), and the difference between the readout pulse position and an inter-pulse distance in obtaining a distortion is calculated (step S1007). If the difference is 0 or more (step S1008), the difference is added to the current pointer of the multipulse search table 307 (step S1009), and added to the inter-pulse distortion table 301 (step S1010). A distortion value is read out from a position represented by the address obtained in step S1010, and stored in a position represented by the address obtained in step S1009 (step S1011). If the difference is smaller than 0 in step S1008, processing in steps S1009 to S1011 is not executed. Subsequently, the sum of the pulse position and the inter-pulse distance in obtaining a distortion is calculated (step S1012). If the sum is smaller than the number of samples of one frame (YES in step S1013), the sum is added to the current pointer of the multipulse search table 307 (step S1014), and added to the inter-pulse distortion table 301 (step S1015). A distortion value is read out from a position represented by the address obtained in step S1014, and stored in a position represented by the address obtained in step S1015 (step S1016). If the sum is equal to or more than the number of samples of one frame (NO in step S1013), processing in steps S1014 to S1016 is not executed. The number of samples of one frame is added to the current pointer of the multipulse search table 307 (step S1017), and the flow returns to step S1004 for checking whether processing for pulse position candidates ends.
Implementing multipulse search table creation processing by an actual program requires instruction processes of 12 steps corresponding to steps 1006 to 1017 in FIG. 10.
MPEG-4/CELP is used for speech of a video phone or the like as the speech codec of a portable terminal, and thus must execute real-time processing. In the prior art, a processing time necessary for multipulse search processing occupies 50% or more of a time necessary for speech coding. When a speech coding apparatus is to be mounted as software in a digital signal processor (to be referred to as a DSP hereinafter), multipulse search processing requires 17.682 MIPS (Million Instructions Per Second) in terms of the processing time, and the total decoding processing requires 30.64 MIPS, which poses a bottleneck.
This is because the addresses of a reference table and copying destination table for copying a distortion value are calculated in the four loops in processing of creating a multipulse search table to be referred in multipulse search processing, and the number of instructions is 12 steps.
FIG. 11 shows an actual program source list for performing conventional multipulse search table creation processing. Twelve instruction processes in this source list correspond to 12 steps in the flow chart of FIG. 10.
Since the conventional speech coding apparatus calculates the addresses of a reference table and copying destination table for copying a distortion value, many instructions are necessary for multipulse search table creation processing, and multipulse search processing takes a long time.
It is an object of the present invention to provide a speech coding apparatus capable of increasing the speed of multipulse search processing in MPEG-4/CELP by decreasing the number of instructions necessary for multipulse search table creation processing.
To achieve the above object, according to the present invention, there is provided a speech coding apparatus for, in coding input speech using a multipulse made up of a plurality of pulses, creating a multipulse search table which stores a distortion serving as a correlation coefficient between adjacent pulses of the multipulse for each pulse position candidate of each pulse, and using the multipulse search table to perform multipulse search processing of determining a position and amplitude of each pulse of the multipulse so as to minimize an error between the input speech and reproduced speech, comprising a pulse position candidate table for storing a pulse position candidate of each pulse for a pulse number of the pulse, an inter-pulse distortion table for storing a distortion calculated every pulse interval corresponding to a pulse position of the pulse position candidate table, a first reference address table, a second reference address table, first reference address table creation means for regarding a pulse position of the inter-pulse distortion table represented by the pulse position candidate table as a relative distance from a start of the inter-pulse as distortion table, calculating a distortion every pulse interval corresponding to the pulse position of the pulse position candidate table in advance to obtain an absolute address of the inter-pulse distortion table, and storing the absolute address in the first reference address table, second reference address table creation means for regarding a pulse position of the inter-pulse distortion table represented by the pulse position candidate table as a relative distance from a start of the multipulse search table, calculating a distortion every pulse interval corresponding to the pulse position of the pulse position candidate table in advance to obtain an absolute address of the multipulse search table, and storing the absolute address in the second reference address table, and search table creation means for, in creating the multipulse search table, reading out from the first reference address table an absolute address which uses a pulse position candidate of the inter-pulse distortion table as an index, reading out from the second reference address table an absolute address which uses a pulse position candidate of the multipulse search table as an index, and creating the multipulse search table using the readout absolute address of the multipulse search table and the readout absolute address of the inter-pulse distortion table.