Machine learning applications, such as natural language processing systems, typically involve the analysis of data that is both noisy and of complex relational structure. Statistical techniques are often employed to model the data being analyzed. A primary step in statistical modeling is to select a useful subset of features with proper weights from a feature space.
Recent developments in statistical modeling of various linguistic phenomena have shown that increasing the feature spaces generally gives consistent performance improvements. Such improvements are often limited by the number of features a system is able to explore. With respect to applications, such as natural language processing, image processing, bioinformatics, transaction predictions, business process, predictive processing, and so on, Conditional Maximum Entropy (CME) modeling has become a well established technique of statistical classification. One advantage of CME modeling is the ability to incorporate a variety of features in a uniform framework with a sound mathematical foundation.
Various feature selection techniques have been developed to greatly speed up the feature selection process. One such method is the Selective Gain Computation (SGC) method, as described in U.S. Patent Application 20050021317, which is assigned to the assignees of the present invention, and which is hereby incorporated in its entirety by reference. However, like many other statistical modeling algorithms, such as boosting and support vector machine techniques, the SGC algorithm is generally limited by the size of the defined feature spaces. Because larger feature spaces tend to give better results, it is advantageous to include an unlimited amount of features. Present techniques, however, are limited with respect to the size of the feature spaces they can handle. What is needed, therefore, is a feature selection method that overcomes the feature space size limitation of present statistical modeling systems.