1. Field of the Invention
The field of the invention is data processing, that is, methods and systems for financial, business practice, business management, or cost/price determinations.
2. Description of the Related Art
A data mining tool is computer software that analyzes data and discovers relationships, patterns, knowledge, or information from the data. Data mining is also referred to as knowledge discovery. Data mining tools attempt to solve the problem of users being overwhelmed by the volume of data that computers can collect. Data mining tools attempt to shield users from the unwieldy body of data by analyzing it, summarizing it, or drawing conclusions from the data that the user can understand. For example, one known computer software data mining product is IBM""s xe2x80x9cIntelligent Minerxe2x80x9d which can be operated in several computing environments including AIX, AS/400, and OS/390. The IBM Intelligent Miner is an enterprise data mining tool, designed for client/server configurations and optimized to mine very large data sets, such as gigabyte data sets. The IBM Intelligent Miner includes a plurality of data mining techniques or tools used to analyze large databases and provides visualization tools used to view and interpret the different mining results.
An analytic application is a software application that inputs historical data collected from a production system over time, analyzes samples of this historical data and outputs the findings back to the production system to help improve its operation. For example, an e-commerce server that manages an internet shopping site is a production system, and an analytic application might use historical data collected from the e-commerce server to report on what type users are visiting the site and how many of these are actually buying products. The term xe2x80x9canalytic applicationxe2x80x9d is used throughout this specification to mean xe2x80x9canalytic software application,xe2x80x9d referring to that category of software typically understood to be used directly by users to solve practical problems in their work.
Data mining is an important technology to be integrated into analytic applications. Data mining data processing technology, combinations of hardware and software, that dynamically discover patterns in historical data records and apply properties associated with these records (e.g., likely to buy) to production data records that exhibit similar patterns. Use of data mining typically involves steps such as identifying a business problem to be solved, selecting a mining algorithm useful to solve the business problem, defining data schema to be used as inputs and outputs to and from the mining algorithm, populating input data schema with historical data, training the model based upon the historical data, and scoring production data by use of the model.
In prior art, however, with available data mining tools, the end user of an analytic application must be sufficiently skilled in data mining to accomplish all these tasks, some of which require substantial expertise in data mining. For applications such as e-commerce, which are being widely adopted by businesses of all sizes and in all commerce areas, it is difficult and expensive for every business using data mining to acquire substantial data mining expertise. It would be desirable and useful, therefore, for an analytic application to integrate data mining so as to reduce the need for end users to have special expertise in data mining as such.
Embodiments of the present invention include methods and systems in which elements of data mining, such as identifying a business problem to be solved, selecting a mining algorithm useful to solve the business problem, defining data schema to be used as inputs and outputs to and from the mining algorithm, and defining data mining models, are performed by an analytic application developer. An xe2x80x9canalytic application developerxe2x80x9d is a software developer that develops analytic software applications. Throughout this specification, the analytic application developer is described in contrast to end users. An xe2x80x9cend userxe2x80x9d is a person or entity that installs and uses an analytic application for purposes of scoring and analyzing actual production data. Analytic application developers create the analytic applications that end users use.
In typical embodiments of the invention, the analytic application developer identifies a set of interesting business problems capable of definition sufficient to support data mining solutions. The analytic application developer then selects data mining algorithms useful for solving the identified problems, defines data schema useful as inputs to and outputs from the selected mining algorithm, and defines data mining models. Because the mining algorithms, the data schema, and the mining models are selected, identified, and defined prior to involvement by any end user, the mining algorithms, data schema, and mining models are referred to as being xe2x80x9cpreselectedxe2x80x9d and xe2x80x9cpredefined.xe2x80x9d
In typical embodiments of the present invention, it is end users who carry out the data mining steps of populating input data schema with historical data, training the model based upon the historical data, and scoring production data by use of the model. Because the more difficult steps of defining business problems, preselecting mining algorithms, and predefining data schema are performed by an analytic application developer before an end user acquires the analytic application, the end user need only perform straightforward steps guided by such routine graphical user interface elements as mouse-clickable buttons, pull down menus, and wizards. The overall effect of the inventive method is to greatly reduce the data mining expertise needed by the end user.
Use of data mining typically involves steps such as identifying a business problem to be solved, selecting a mining algorithm useful to solve the business problem, defining data schema to be used as inputs and outputs to and from the mining algorithm, populating input data schema with historical data, training the model based upon the historical data, and scoring production data by use of the model.
A useful key to simplifying the use of data mining in analytic applications is to make the analytic application domain-specific. xe2x80x9cDomainxe2x80x9d refers to a problem subject area, and xe2x80x9cdomain-specificxe2x80x9d means that an analytic application is designed to operate on the basis of data related to a particular problem subject area, where the data has specific defined data elements with defined relations among the data elements. For example, e-commerce is a specific domain, and a domain-specific analytic application for e-commerce would accept and analyze only e-commerce data. For illustration purposes in this specification, e-commerce is chosen as the domain of interest.
For a specific domain, it is a typical use of embodiments of the present invention to identify business problems that are applicable to such a specific domain. Once the business problems that need data mining are identified, embodiments of the invention then typically are used to build an analytic application to solve these business problems so that the analytic application developer can embed in the analytic application all data mining related knowledge needed for the solution so that the end user of the application does not require data mining specific expertise.
The steps of the inventive method in an example business problem are discussed in detail in this specification. Process flow involved in steps of typical embodiments is described by the diagram given in FIG. 1. In typical embodiments, as mentioned above, the steps of defining business problems, preselecting mining algorithms, predefining data schema, and predefining data mining models are done by the analytic application developer, whereas only the steps of populating the input data schema with historical data, production training the model, and production scoring are left to the end user.
This specification describes sets of business questions useful to the end users, predefined, and the data schema that are needed to answer these business questions, also predefined. This specification describes data mining models predefined, tested and shipped with a product which can then be trained and applied by the end users without needing data mining expertise.
A data mining model is usually defined to address a given business question based on a given input data schema. Data mining tools such as IBM""s Intelligent Miner typically are generic, functioning independently of any application. Because data mining tools do not include business questions or the data schema end users would use, developers of data mining tools do not in prior art supply predefined mining models.
Accordingly, in an integrated e-commerce analytic solution using general-purpose data mining tools such as Intelligent Miner, there is significant benefit in predefining mining models whenever possible as this will enable end users to train and apply these models without requiring data mining expertise. Intelligent Miner provides simple user interface to import predefined mining models and to train and apply these models without any knowledge of data mining. The steps for using Intelligent Miner to import, train and apply can also be documented along with the predefined mining models to simplify the job of the end user.
There are several advantages to the present inventive method. When predefined mining models are available to the end users, end users can make use of their regular information technology staff to train and apply these mining models without having to first train the staff in mining technology and mining tools. This results in significant cost savings to end users.
An additional benefit is that a product vendor, by use of the method of the present invention, can build an e-commerce analytics product in the vendor""s development shop. As a result, the vendor can ship several mining models ready to be used by end users straight out of the box, requiring no special expertise in data mining on the part of the end user""s staff. This will add significant value to the vendor""s product as it reduces end users"" costs.
A still further benefit of the present invention is that third-party vendors can use the method of the invention to add additional mining models to an already available analytic product. In addition, consultants can use the inventive method to define and add new mining models at a end user site or even to the analytic product itself at the development site.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of a preferred embodiment of the invention, as illustrated in the accompanying drawings wherein like reference numbers represent like parts of the invention.