Pitch shifting a digital audio signal often involves increasing (compression pitch period) or decreasing (expansion pitch period) the output frequency. This is the same as increasing or decreasing the rotary speed of a platter. However, doing the latter also changes the time period of the digital audio signal, therefore, how to pitch shift a digital audio signal while keeping a constant time period has become an important issue.
To resolve this problem, an non-uniformed audio frame segmentation method has been proposed in the thesis "On Audio Processing for MPEG Decoding, Pitch-shifting and Subband Coding" submitted to the Institute of Electronics, College of Engineering and Computer Science, at National Chiao Tung University in partial fulfillment of requirements for the degree of Master of Science in Electronics Engineering in June, 1996. The operations are described as follows.
Step 1: first, select an audio frame of a time period N from the original digital audio signal; PA1 Step 2: then, pitch shift the audio frame to obtain a pitch-shifted audio frame of a time period mN (compression pitch period when m&lt;1; and expansion pitch period when m&gt;1); PA1 Step 3: next, select another audio frame of a time period N from the digital audio signal at time mN corresponding to the end of the previous audio frame; PA1 Step 4: repeat step 2 to pitch shift the audio frame in step 3; PA1 Step 5: finding out a optimum connecting point of these two audio frames to obtain a pitch-shifted audio signal of a time period 2mN-X (X is the deviation caused by the connecting operation); PA1 Step 6: next, select a further audio frame of the original digital audio signal at time 2mN-X; and PA1 Step 7: repeat step 4 through step 6 to renew the pitch-shifted signal.
For this non-uniformed audio frame segmentation method, the optimum connecting point is searched by evaluating and comparing the mean absolute error (MAE) of the rear samples of the first audio frame (which is called the search region later) and the front samples of the second audio frame (which is called the cross region later). And, the mean absolute error (MAE) is calculated by: ##EQU1## where C is the cross region having M samples; and S is the search region having N(&gt;M) samples.
Then, the optimum connecting point is the sample corresponding to a minimum mean absolute error (MAE). These two audio frames are connected by: ##EQU2## where i is the position of the optimum connecting point, P is the connecting region which is followed by another audio frame.
FIG. 1 (Prior Art) is a diagram showing a digital audio signal in an non-uniformed audio frame segmentation method when being expansion pitch shifted.
Suppose the original digital audio signal S0 consists of a plurality of contiguous samples. At first, select and expansion pitch period an audio frame D1 of a time period L1 from the digital audio signal S0, such as 0 through L1-1 shown in FIG. 1, to obtain a pitch-shifted audio frame D1' of a time period L2.
Then, select and expansion pitch period another audio frame D2 of a time period L1 from the original digital audio signal S0 at time L2 (the time L2 corresponds to the end of the pitch-shifted audio frame D1'), such as L2 through L1+L2-1 shown in FIG. 1, to obtain another pitch-shifted audio frame D2' of a time period L2.
Next, connect the audio frames D1' and D2'.
At first, select a search region Sa from the rear samples of the pitch-shifted audio frame D1' and the original digital audio signal S0 just following the pitch-shifted audio frame D1', and select a cross region Ca from the front samples of the pitch-shifted audio frame D2'. Then, evaluate and compare each sample in the search region Sa and cross region Ca as mentioned above to obtain an optimum connecting point K1 and subsequently connect these two pitch-shifted audio frames D1', D2' to obtain an expansion pitch-shifted signal S0' until the end.
FIG. 2 (Prior Art) is a diagram showing a digital audio signal in the non-umiformed audio frame segmentation method when being compression pitch period.
Suppose the original digital audio signal S1 consists of a plurality of contiguous samples. At first, select and compression pitch period a audio frame D3 of a time period L3 from the digital audio signal S1, such as 0 through L3-1 shown in FIG. 2, to obtain a pitch-shifted audio frame D3' of a time period L4.
Then, select and compression pitch period another audio frame D4 of a time period L3 from the original digital audio signal S1 at time L4 (the time L4 corresponds to the end of the pitch-shifted audio frame D3'), such as L4 through L3+L4-1 shown in FIG. 2, to obtain another pitch-shifted audio frame D4' of a time period L4.
Next, connect the audio frames D3' and D4'.
At first, select a search region Sb from the rear samples of the pitch-shifted audio frame D3' and the original digital audio signal S1 just following the pitch-shifted audio frame D3', and select a cross region Cb from the front samples of the pitch-shifted audio frame D4'. Next, evaluate and compare each sample in the search region Sb and cross region Cb as mentioned above to obtain an optimum connecting point K2 and subsequently connect these two pitch-shifted audio frames D3', D4' to obtain a compression pitch-shifted signal S1' until the end.
However, in using this non-uniformed audio frame segmentation method, when N=160 and M=80, it is necessary to perform (80+79)*80=12720 add/subtract operations every 10 ms, which incurs a large cost in hardware implementation. Therefore, it is necessary and useful to provide an easy and effective apparatus and method to find out the optimum connecting point so that the pitch shift apparatus can be economically designed and applied in commercial electronics products.