1. Field of the Invention
The present invention relates generally to a method and system for aligning windows for voice signals, and in particular, to a method and system for aligning windows to extract a peak feature from voice signals in such a manner that the windows can be easily updated while minimizing variations even if the voice signals are discontinuous and transient.
2. Description of the Related Art
Recently, various systems for aligning windows using voice signals have been developed. The systems perform the application processes using voice signals, such as coding, synthesis, recognition, and reinforcement. To this end, the systems using voice signals extract peak feature information from voice signals according to the application fields of the systems. Therefore, in order to efficiently apply the extracted peak feature information to different application processes, it is necessary to extract exact peak feature information.
Generally, such a voice signal processing system employs a signal processing method, which processes voice signals in a block unit, based on windows having a fixed length, which has been established for extracting and calculating a peak feature, and an update rate. That is, the voice signal processing system uses fixed-length data windows. However, in order to achieve reliable calculations of peak features that are different depending on application fields, it is preferred to process voice signals in a block unit suitable for each application field. Peak calculation requires only three data points, while linear predictive coding (LPC) or cepstral coefficient calculation requires a window length determined by considering a complicated relation between variability and repeatability. When peak feature information is extracted from a voice signal, it is not always necessary that window lengths have a fixed value.
Nevertheless, generally, a fixed-length data window and fixed update rate have been used for extraction of peak information because of the following reasons:
First, the fixed-length data window and fixed update rate can be easily used in the voice signal processing system because equal values of same are applied at all times. However, until an optimum value is determined, the voice signal processing system must be tested with various window lengths and update rates. Moreover, one parameter to output an optimum result must have been obtained through such a test, before the parameter is always used as a fixed value. Meanwhile, it can be assumed that window length and update rate must be fixed for optimum processing, but such an assumption is unsuitable because it is impossible to control background noise in a general application processing. That is, in an environment that includes noise, it is difficult to obtain an optimum processing result with a fixed window length and fixed update rate
Secondly, although it is desirable to use a variable window length and update rate, there is no standard approach to and no theoretical basis for how to determine a window length and update rate every time. That is, there is no simple approach to using a variable window length and update rage.
Thirdly, both a fixed window length and update rate have been used in order to reduce processing requirements. Although the conventional voice signal processing systems have aimed at reducing the amount of calculation as much as possible, however presently, given the tremendous improvement in processing capabilities of processors, the amount of calculation does not matter because.
A window update rate is a different parameter from a window length. If a window length is too long, too much information is included in the corresponding window, so that it becomes difficult to extract peak feature information. Therefore, a window update rate is determined inside of a boundary of a window length or in a limited range of the window length, in which peak feature information can be extracted. For instance, the maximum update interval in voice processing is of an order of 40 ms, which corresponds to about half of the minimum voice energy pulse. In this case, if an update interval is at least 40 ms, the update interval may overstep an energy pulse. In contrast, the minimum update interval is 0 ms. In most cases, a fixed update interval has one value ranging from 8 to 16 ms.
As described above, the conventional voice signal processing system have used fixed values in order to determine a window length or the start and end points of a data window. Therefore, it is necessary to provide a window alignment method that is supported by a theoretical basis or logic according to the types or characteristics of voice signals to be processed There is a need for a method for aligning windows, which can adaptively update the windows even if peak feature information has the same characteristics as those of a Discrete Fourier Transform (DFT) coefficient and data have discrete points.