With the advent of the Internet, and especially electronic commerce (“e-commerce”) over the Internet, the use of data analysis tools, has increased. In e-commerce and other Internet and non-Internet applications, databases are generated and maintained that have large amounts of information. Such information can be analyzed, or “mined,” to learn additional information regarding customers, users, products, etc.
Data mining (also known as Knowledge Discovery in Databases—KDD) has been defined as “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data.” Data mining can employ machine learning, statistical and visualization techniques to discover and present knowledge in a form that is easily comprehensible to humans.
One area relating to decision theory in which there is significant amount of research is decision trees. A decision tree data structure corresponds generally to an acyclic, undirected graph where nodes are connected to other respective nodes via a single path. The graph is acyclic in that there is no path that both emanates from a vertex and returns to the same vertex, where each edge in the path is traversed only once. A probabilistic decision tree is a decision tree that is used to represent a conditional probability distribution for a target variable given some set of predictor variables. As compared to a table, which is another way to represent a conditional probability distribution when all variables are discrete, a tree is generally a more efficient way of storing probabilities because of its ability to represent equality constraints within a conditional probability distribution.
A decision graph is a further generalization of a decision tree. Similar to a decision tree, a decision graph can represent equality constraints in a conditional probability distribution. In contrast to a decision tree, however, non-root nodes in a decision graph can have more than one parent. This characteristics enables a richer set of relationships to be represented by a decision graph than by a decision tree. For example, relationships between a non-root node and multiple parent nodes can be represented in a decision graph by corresponding edges interconnecting the non-root node with its parent nodes.
There are two traditional approaches for constructing statistical models, such as decision trees or decision graphs, namely, a knowledge-based approach and a data-based approach. Using the knowledge-based approach, a person (known as a knowledge engineer) interviews an expert in a given field to obtain the knowledge of the expert about the field of expertise of the expert. The knowledge engineer and expert first determine the distinctions of the world that are important for decision making in the field of the expert. These distinctions correspond to the variables in the domain of interest. For example, if a decision graph is to be used to predict the age of a customer based on the products that customer bought in a store, there would be a variable for “age” and a variable for all relevant products. The knowledge engineer and the expert next determine the structure of the decision graph and the corresponding parameter values that quantify the conditional probability distribution.
In the data-based approach, the knowledge engineer and the expert first determine the variables of the domain. Next, data is accumulated for those variables, and an algorithm is applied that creates one or more decision graphs from this data. The accumulated data comes from real world instances of the domain. That is, real world instances of decision making in a given field.
There has been much research in modeling techniques to facilitate analysis of time series data. One approach relates to the use of neural nets. While neural nets can provide reasonable predictive performance, they tend to be difficult to interpret and computationally expensive to learn. Further neural nets usually are implemented as black boxes, which provided little useful information about interrelationships between variables.
Other approaches for time series analysis include self-exciting threshold autoregressive models (SETAR), as disclosed in Threshold models in Nonlinear Time Series Analysis, Tong H., Springer-Verlag, New York (1983), and adaptive smooth threshold regressive models (ASTAR), as disclosed in Modeling time series by using mars, by Lewis, P., Ray, B., and Stevens, J. In Time series prediction, pp. 297-318, Addison Wesley, New York (1994). Both the SETAR and ASTAR models can be considered piece-wise linear models. When described in terms of a decision tree, the SETAR models are limited to a single split variable. The ASTAR models are obtained by the application of the well-known multiple adaptive regression splines (MARS) system to time-series data.