(1) Field of the Invention
The present invention generally relates to a class-specific signal analysis method using a subspace that maximizes class-specific J-functions.
(2) Description of the Prior Art
Characterizing an input signal using automated data processing systems is a common problem in many fields. In sonar, it is often desirable to separate natural sources from manmade sources. This is also true in radar. In speech recognition, it is desirable to recognize phonemes so that speech can be turned into text. In virtually all state-of-the-art methods, the process of characterizing the data is divided into two separate stages. In the first stage, it is necessary to extract features (useful information in the form of a compact set of parameters) from the input data that is useable by automatic recognition algorithms. In the second stage, an algorithm, usually a probabilistic model, decides which type of signal is present based on the features.
An example of such a system is automatic speech recognition (ASR) system as implemented on a computer. In the first stage of a state-of-the-art ASR system, the speech signal is divided into equal-sized segments, from which features are extracted. These features are usually extracted in mel-scale cepstral format because this format focuses on the frequency response of human hearing.
The mel-scale cepstrum is calculated by taking the Fourier transform of a time domain signal to produce a spectrum. Powers of the spectrum are mapped onto the mel scale. Logarithms are taken of the powers at each of the mel frequencies. A discrete cosine transform is calculated for the logarithms of the mel powers. The mel-scale cepstral coefficients are the calculated discrete cosine transform coefficients. In ASR systems, the mel-scale cepstral coefficients are used as the feature set for recognizing phonemes.
In mathematical terms, one may write the MEL cepstrum features asz=DCT(log(A′y))  (1)where vector y is the length−N/2+1 spectral vector, the magnitude-squared DFT output and the columns of A are the MEL band functions, and the “prime” notation indicates the transpose of the matrix A. The logarithm and the discrete cosine transform (DCT) are invertible functions. There is no dimension reduction or information loss so they may be considered a feature conditioning step, which results in more Gaussian-like and independent features.
Other approaches of feature set development are taught in the prior art. The use of signal-dependent or class-dependent features for classification, known as the class-specific method or CSM, is covered in U.S. Pat. No. 6,535,641, “Class-Specific Classifier”. The probability density function (PDF) projection theorem (PPT) is disclosed in Baggenstoss, “The PDF Projection Theorem and the Class-Specific Method”, IEEE Transactions on Signal Processing, Vol. 51, No. 3 (March 2003) which is incorporated by reference. The probability density function projection theorem eliminates the need for sufficient statistics and allows the use of class-dependent reference hypotheses, improving the performance of any classification system using class-dependent features. U.S. Pat. No. 6,466,908, entitled “System and Method for Training a Class-specific Hidden Markov Model Using a Modified Baum-Welch Algorithm” alleviates the need for a common feature set in a HMM.
The key operation here is dimension reduction by linear projection onto a lower-dimensional space. Now, with the introduction of the class-specific method (CSM) and the PDF projection theorem (PPT), one is free to explore class dependent features within the rigid framework of Bayesian classification. Some work has been done in class-dependent features; however, existing approaches are only able to use different features by using compensation factors to make likelihood comparisons fair. Such approaches work if the class-dependent feature transformations are restricted to certain limited sets. Both methods fall short of the potential of the PPT, which makes no restriction on the type of feature transformations available to each phoneme. Under CSM, the “common feature space” is the time-series (raw data) itself. Feature PDFs, evaluated on different feature spaces are projected back to the raw data space where the likelihood comparison is done. Besides its generality, the CSM paradigm has many additional advantages as well. For example, there is a quantitative class-dependent measure to optimize that allows the design of the class-dependent features in isolation, without regard to the other classes.
A prior art classifier is shown in FIG. 1. The classifier 2 receives data from a data source 4. Data source 4 is joined to a feature transformation module 6 for developing a feature set. The feature set is provided to pattern match processors 8, which correspond to each data class. Pattern match processors 8 provide an output measuring the developed feature set against trained data. The pattern match processor 8 outputs are compared in a comparison 9 and the highest output is selected.
FIG. 2 shows a class specific classifier as disclosed in U.S. Pat. No. 6,535,641 which is incorporated by reference herein. In this classifier, a data source 10 supplies a raw data sample X to the processor 12 at a processor input 14. It is assumed that the data source can be type A, B, or C, but the identity is not known. Processor output 16 is a decision concerning the identity of the data source, i.e. A, B, or C. The processor 12 contains one feature transformation section 18 for each possible data class. These sections 18 are joined to receive the raw data X at processor input 14. Each feature transformation section 18 produces a feature set for its respective class. The processor 12 further contains pattern match processors 20 with each pattern match processor joined to a transformation section 18 for receiving a feature set associated with one class. The pattern match processors 20 approximate the probability density functions (PDFs) of the feature sets for data sampled from the corresponding data class. The output of the pattern match processors 20 are highest when the input feature set is similar to or “matches” the typical values of the training set. Because the pattern match processors 20 are operating on different feature sets, the outputs cannot be directly compared to arrive at a decision without compensation. Compensation processors 22 process the raw data X together with the feature set, Zj, and provide a correction term in accordance with the PPT, which, when multiplied by the output of pattern match processors 20, convert the PDFs of feature sets Zj, into PDF of the raw data X. The outputs of the compensation processors 22, called the “J function” in the terminology of the class-specific classifier, are passed to a multiplier 24 which multiplies this output with the output of the pattern match processors 20. The result of the multiplication 24, which is an estimate of the PDF of the raw data X for the given class, is processed by a comparison 26 joined to the processor 12 output 16. The output 16 is the identity of the data class that has the highest output from the multiplier 24.