(1) Field of the Invention
The present invention generally relates to a signal analysis method that uses class specific features and different size analysis windows for recognizing phenomena in the signal.
(2) Description of the Prior Art
Characterizing an input signal is a common problem in many fields. In sonar and radar, it is often desirable to separate natural sources from manmade sources. This method also has application with geological survey signals and non-time series signals such as images. It can be applied to any one dimensional signal.
In speech recognition, it is desirable to recognize phonemes so that speech can be converted into text. Rabiner in “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of IEEE, Vol. 77, No. 2, (February 1989) provides background teaching a method for voice recognition using hidden Markov models. Three common hidden Markov model problems are given. The first is computing the probability of a given observation sequence from the model. This measures the matching between the observed sequence and the model. The second problem is in choosing the best state sequence for a given model and observation sequence. The solution of this problem uncovers the sequence of symbols. The third problem is refining model parameters to maximize the probability that an observation is characterized by a model. Resolution of this problem optimizes the model parameters to best describe an observed sequence.
Because the naïve solution to computing observation sequence probability is numerically intensive, a well-known forward procedure has been developed for efficiently calculating the probability of a given observation sequence from the model. In this method, a forward probability vector is calculated for each time in the observational period. This is calculated from the state probabilities, the observational symbol probabilities and the state transition probabilities. The calculation greatly reduces the number of calculations by using the lattice structure of the hidden Markov model.
In a typical automatic speech recognition (ASR) system, such as that taught by Rabiner, the speech signal is divided into equal-sized segments, from which features (usually cepstral coefficients) are extracted. The probabilistic model is usually a Hidden Markov Model (HMM). The process of characterizing the data is divided into two separate stages. In the first stage, it is necessary to extract useful information from the raw input data in a compact form that is useable by automatic recognition algorithms. This usually means that the data is divided into segments, and information in the form of features is extracted from each segment. In the second stage, an algorithm, usually a probabilistic model such as a hidden Markov model, decides which type of signal is present in each segment, or combines segments to recognize signals spanning multiple segments.
One problem with such a two-stage approach is the necessity of making difficult compromises in the first (feature extraction) stage. The size of the segments and the type of features must be consistent, even if the potential signal types are not. In view of this, existing systems aren't very good at processing complex signals where the segment sizes vary.
A good example of a complex signal is human speech where a typical vowel is consistent over time-durations as long as 60 milliseconds and is rich in spectral detail, whereas a “T-burst”, a component of the sound of the consonant “T”, has a duration as short as a few milliseconds and has little spectral detail. In spite of this, a single segment size and feature extractor is used for both. The typical analysis window for human speech is about 30 milliseconds (effectively 16 milliseconds after applying a window function), which is a compromise. It can be regarded as a poor compromise because it is too long to observe the occurrence of the “T-burst” and too short to fully characterize the spectral detail of the vowel.
The reason that the compromise is needed at all is because of the way decisions are made between competing signal hypotheses. These decisions are primarily made using a common feature set. For example, in order to decide between signal type A and B, the system needs to first train itself on the patterns observed for signal types A and B at the output of a feature extractor. Then, in the testing phase, the pattern observed using exactly the same feature extractor is compared with the learned patterns for signal types A and B, and a decision is made. Prior to the introduction of the class-specific method (CSM), classical decision theory did not consider the problem of deciding between signal types when patterns were learned using different segment sizes and different features extractors for each signal type. This is unfortunate because the segment size and feature type that is best to characterize each signal type may differ. Accordingly, there is a need for a method that can use different features and segment sizes, yet at the same time make optimal statistical decisions.
Several techniques related the use of signal-dependent or class-dependent features for classification are taught in the prior art, yet only those methods related to the class-specific method or CSM, are general in nature, and derived from a theory rooted in the classical optimum Bayesian classifier. CSM is covered in U.S. Pat. No. 6,535,641, “Class-Specific Classifier”, and augmented by the probability density function (PDF) projection theorem (PPT) which is disclosed in Baggenstoss, “The PDF Projection Theorem and the Class-Specific Method”, IEEE Transactions on Signal Processing, Vol. 51, No. 3 (March 2003). The probability density function projection theorem eliminates the need for sufficient statistics and allows the use of class-dependent reference hypotheses, improving the performance of any classification system using class-dependent features. U.S. Pat. No. 6,466,908, entitled “System and Method for Training a Class-specific Hidden Markov Model Using a Modified Baum-Welch Algorithm” alleviates the need for a common feature set in a HMM.