In pattern classification, which is generally used in pattern recognition and speech recognition systems, one must determine pertinent features or attributes of an object in the scene or a sequence of sounds and extract information about these features. Most features used consist of shape properties, level of pitch, texture, motion, depth, and color. These features are associated or correlated within a processor with classes or identification labels and stored in a processor as a sequence of bits. Each sequence of bits is referred to as a feature vector and it represents information such as the level of pitch of a particular musical note.
In pattern classification it is required to obtain the class probabilities for a particular feature vector to determine information, such as the number of occurrences of a particular feature in the scene and the time and place of each occurrence. For applications such as speech recognition, this is often done by modelling the marginal density of the feature space of a classifier characterizing each class (typically a phone, i.e., a single speech sound) with a model (typically a mixture of gaussians). The class probabilities of the particular feature vector are then computed using the model for each class. However, to model the marginal densities of all the classes to obtain all the class probabilities for a typical large vocabulary speech recognition system which uses tens of thousands of gaussians requires several millions of operations for each feature vector. This computational complexity in obtaining the class probabilities or probability distribution over all the classes for all feature vectors causes a noticeably excessive passage of time, thereby thwarting the real time implementation of speech recognition systems.
A procedure for simplifying the computational complexity of using gaussian models is to construct a hierarchical data structure, based on the acoustic similarity of the gaussian models. Hence a number of the original gaussians are approximated by a single gaussian at the upper levels of the hierarchy, and the approximated models, which are smaller in number than the original number of gaussians, are evaluated first. Subsequently, the original gaussians that correspond to one or more of the approximate models which give the highest probability to the feature vector are evaluated. This technique however causes a degradation in accuracy, due to the crudeness of the approximate models, and further it requires the evaluation of the approximate models.
Accordingly, a need exists for a system and method which constructs a hierarchical data structure and obtains the class probabilities using a minimum amount of computations to enable the implementation of pattern classification systems which operate in real-time.