Statistical modeling techniques generally attempt to model system behavior by incorporating various informative features into a common framework of models. For example, in language modeling and natural language modeling, statistical modeling methods, such as Maximum Entropy (ME) modeling utilizes features that encode some linguistically statistical event from a corpus of data into a common framework of conditional models to predict linguistic behavior.
In general, statistical modeling may be separated into two main tasks: a feature selection process that selects a subset of desired features to be included in the model from a feature space; and a parameter estimation process that estimates the weighting factors for each selected feature. Thus, this process involves the selection of a useful subset of features with proper weights from a feature space. The preliminary step in such a process is the definition of the feature space from which the subset of features is selected. Recent developments in statistical modeling of various linguistic phenomena have shown that increasing the size of feature spaces generally gives consistent performance improvements, since larger feature spaces help ensure that important information is not missed.
With respect to certain applications, such as natural language processing, image processing, bioinformatics, transaction predictions, business process, predictive processing, and so on, Conditional Maximum Entropy (CME) modeling has become a well established technique of statistical classification. One advantage of CME modeling is the ability to incorporate a variety of features in a uniform framework with a sound mathematical foundation. Because larger feature spaces tend to give better results, it is advantageous to include an unlimited amount of features. However, simply increasing the number of features in a feature space without considering the relationship of additional features with existing features may not provide enough useful information. What is needed, therefore, is a feature generation method that increases the size of features spaces in a deliberate manner to generate a large number of meaningful features.
Simply increasing the size of feature spaces can also cause an undue burden on the processing system. Including all or nearly all features may cause data overfitting, slow the predictive process, or make the resulting model too large for resource-constrained applications. On the other hand, present learning systems are often limited by the number of features a system is able to explore. To overcome this problem, various feature selection techniques have been developed to greatly speed up the feature selection process. One such method is the Selective Gain Computation (SGC) method, as described in U.S. Patent Application 20050021317, which is assigned to the assignees of the present invention, and which is hereby incorporated in its entirety by reference. However, like many other statistical modeling algorithms, such as boosting and support vector machine techniques, the SGC algorithm is generally limited by the quality of the features within the defined feature spaces. What is needed, therefore, is a feature generation method that provides a comprehensive set of features that can be used with developing feature selection processes that exploit large and ultra-large feature spaces.