Apparatus and methods consistent with the present invention relate generally to detecting onset of a signal event, and in particular to apparatus and methods for detecting onset of a voicing event.
To analyze speech accurately, the point in time at which speech starts must be determined. Previous methods use a set time interval during which data is sampled and averaged over hundreds of data points. This can blur and distort time critical factors.
Raw voice data is very random and only some of the information is valuable for recognizing parts of speech. Several prior art techniques attempt to reduce the amount of randomness by processing the data into a more stable form. Typically, this has involved smoothing algorithms, which involve averaging the data. For example, a data point being analyzed is revalued by averaging the data point being smoothed with the two data points on either side of the data point being smoothed. Thus, the average of five data points is used to create the new value. This averaging, however, causes blurring of the data both in amplitude and in time. In many cases, data only exists for a portion of a millisecond. At 8 kHz sampling rate, which is a very typical sampling rate for many speech applications, the data is blurred over a 1.25 millisecond area. Thus, vital data is being destroyed by the very process of making it more useable for the algorithmic methods used to evaluate the data.
Windowing methods are another very common method of analyzing the data. Large window durations of time are often used, on the order of 25 milliseconds. The data is evaluated and averaged, with the average being calculated every 5 milliseconds. This creates a problem, for example, when analyzing information that has a just noticeable difference of one to two milliseconds. A just noticeable difference is a threshold at which a human is able to detect that a stimulus had changed, which occurs in a range of one to two milliseconds. Typically, windowing methods start sampling data at an arbitrary point in time that has no relationship to relevant portions of the data. Because of the arbitrary and random nature of the windowing, there is no way to determine where events of interest occur. An event could be bisected in the middle, thus distorting it even further. Even with smoothing the data is still too random in its motion to be able to detect the sudden onset of a signal in the midst of the randomness of noise.
The very act of arbitrary segmentation also imposes a granularity on the data. For example, if a segment is 128 samples in duration at a 44,100 Hz sampling rate, then the smallest unit of measure possible is 5.8 milliseconds, or twice the sampling rate of 2.9 milliseconds per sample (based on the Nyquist rule of two times oversampling).
Therefore, prior art smoothing techniques blur the data in both amplitude and time. Even with smoothing, the raw data in the prior art is too random to distinguish any significant features against the background of noise.
What is needed is a way to accurately determine event onset time so that signal details surrounding the event can be properly analyzed.