1. Field of the Invention
The present invention relates to a technique for executing data mining algorithms whereby the algorithms are provided as a web service.
2. Description of the Related Art
Data and information pervades every aspect of our lives today. With vast improvements in processing power, access to the Internet, and other similar advancements, there has never been a time where more information and data is available for use by consumers, businesses, marketers, pollsters, and any other entity that might find it useful to analyze data.
Using customer information as an example, even before the computer revolution, customer information and lists and the like have long been recognized as extremely valuable corporate assets. Relatively recently, data mining was introduced as a technique that can intelligently and automatically transform data into information. Data mining is the search for relationships and global patterns that exist in large or small databases, but are hidden among vast amounts of data. Data mining extracts previously unknown, and potentially useful information (e.g., rules, constraints, correlations, patterns, signatures and any irregularities), focusing on automated methods for extracting patterns and/or models from data.
The data mining community has focused mainly on automated methods for extracting patterns and/or models from data. The state-of-the-art in automated methods of data mining is still in a fairly early stage of development, although progress in this area is certainly being made.
The primary goals of data mining in practice are prediction and description. Prediction involves using some variables or fields in the database to predict unknown or future values of other variables of interest. Description focuses on finding interpretable patterns that describe the data. The relative importance of prediction and description for particular data mining applications can vary considerably. For example, in business, a successful data mining method is known as “Market Basket Research.” Market Basket Research analyzes customer transactions for patterns or “association rules” which help make business decisions (e.g., chose sale items, design coupons, arrange shelves, etc.); this is also known as association rules mining. Data mining finds application in many other fields as well. One area in which data mining is frequently used is in the detection of fraud. Insurance companies, tax authorities, investment regulators, and the like will frequently mine data related to their field to identify persons and/or organizations that may be committing fraudulent acts.
For example, data mining can be performed by a tax authority relative to the individuals or companies falling under its jurisdiction to determine, based upon taxpayer data, which taxpayers are most likely to be committing fraudulent acts, and then focus their investigative energy and resources on those taxpayers.
In data mining, an algorithm is often created that defines the desired mining. In practice, this algorithm can be quite complex. Commonly, the algorithm goes through each customer or entity record and creates a score relative to each entity, which is utilized to determine whether to investigate the taxpayer, market a product to a customer, stop payment of a health insurance claim or investigate a clinic for services not rendered.
Typically the data mining algorithm is embodied in an application which is external to the database. One data mining product which adopts this method is the Intelligent Miner® product from International Business Machines (IBM). The external application “scores” the database from an existing model. These applications utilize an SQL cursor and fetches each record or tuple to be scored sequentially. One example of a highly efficient technique for data mining large scale relational databases using SQL is described in U.S. Pat. No. 6,484,163, incorporated herein fully by reference.
These known methods, while functioning very well with a database, require that the model be deployed inside a database. This is limited, as it is only accessible to those having access to the database in such a way that the model can be deployed therein. Further, computer resources in a database environment are finite, and the database environment is constrained by many communications protocols. Accordingly, it would be desirable to have a technique for executing data mining models as a web service, so variable demand can be accommodated and so the data mining process can be decoupled from the database.