The present invention relates generally to supporting business decisions through data analysis by way of enriching data through data mining, text mining, and automatic classification. More particularly, the invention provides a method and system for 1) automatic detection of change in the business processes to be analyzed; 2) accurate measurement of the performance of automatic classification of business process data; 3) automatic handling of semi-structured text in business process analysis; and 4) efficient and maintainable scripting of the data enrichment process. Business decisions generally require knowledge about properties of business entities related to the decision. Such properties can be inferred by an automatic classifier that processes data associated with the entity. Parts of the data may be human-generated or free form text. Other parts of the data may be machine-generated or semi-structured. It is beneficial to analyze both free form text and semi-structured text data for business process analysis. While the enrichment process can be programmed in a number of existing programming languages and data base query languages, it is advantageous to provide a specialized language for increased maintainability and faster development of the enrichment process. By way of example for the enabling features of such a language, we describe SQXML, a language developed by Enkata Technologies, Inc. for this purpose. The business decision can relate to marketing, sales, procurement, operations, or any other business area that generates and captures real data in electronic form. Merely by way of example, the invention is applied to processing data from a hard disk drive manufacturer. But it would be recognized that the invention has a much wider range of applicability. For example, the invention can be applied to other operational and non-operational business areas such as manufacturing, financial services, insurance services, high technology, retail, consumer products, and the like.
Common goals of almost every business are to increase profits and improve operations. Profits are generally derived from revenues less costs. Operations include manufacturing, sales, service, and other features of the business. Companies spent considerable time and effort to control costs to improve profits and operations. Many such companies rely upon feedback from a customer or detailed analysis of company finances and/or operations. Most particularly, companies collect all types of information in the form of data such information includes customer feedback, financial data, reliability information, product performance data, employee performance data, and customer data.
With the proliferation of computers and databases, companies have seen an explosion in the amount of information or data collected. Using telephone call centers as an example, there are literally over one hundred million customer calls received each day in the United States. Such calls are often categorized and then stored for analysis. Large quantities of data are often collected. Unfortunately, conventional techniques for analyzing such information are often time consuming and not efficient. That is, such techniques are often manual and require much effort.
Accordingly, companies are often unable to identify certain business improvement opportunities. Much of the raw data including voice and free-form text data are in unstructured form thereby rendering the data almost unusable to traditional analytical software tools. Moreover, companies must often manually build and apply relevancy scoring models to identify improvement opportunities and associate raw data with financial models of the business to quantify size of these opportunities. An identification of granular improvement opportunities would often require the identification of complex multi-dimensional patterns in the raw data that is difficult to do manually.
Examples of these techniques include statistical modeling, support vector machines, and others. These modeling techniques have had some success. Unfortunately, certain limitations still exist. That is, statistical classifiers must often be established to carry out these techniques. Such statistical classifiers often become inaccurate over time and must be reformed. Conventional techniques for reforming statistical classifiers are often cumbersome and difficult to perform. Although these techniques have had certain success, there are many limitations.
From the above, it is seen that techniques for processing information are highly desired.