The present invention relates to computer systems for analyzing, and computing with, sets of data, such as, for example, extremely large data sets.
As computing power has grown, it has become increasingly practical to process data, and, in particular, large amounts of data, in new and useful ways. For example, the term xe2x80x9cdata base miningxe2x80x9d has been used to describe the practice of searching vast amounts of data for commercially, medically, or otherwise important patterns, patterns which would probably have been impossible to find by human pattern matching, and which probably would have taken too long to have found with prior generations of computer equipment.
For example, one common uses of data base mining is for corporations to search through data bases containing records of millions of customers or potential customers, looking for data patterns indicating which of those customers are sufficiently likely to buy a given product to justify the cost of selecting them as targets of a direct marketing campaign. In such searches, not only are millions of records searched, but hundreds, or even thousands of fields within each record. Such data base mining has proven much more successful in selecting which customers are most likely to be interested in a given new product than prior methods.
Similarly, data base mining can be used for scanning vast numbers of medical records to look for subtle patterns associated with disease; for scanning large numbers of financial transactions to look for behavior likely to be fraudulent; or to study scientific records to look for new casual relationships.
Because they often involve a tremendous number of records, and are often seeking patterns between a large number of fields per record, data base mining operations tend to require huge amounts of computation. This, in combination with the fact that most data base mining operations can be easily partitioned to run on separate processors, has made data base mining one of the first major commercial uses of massively parallel computers. But even when run on most commercially available parallel systems many data base mining functions are relatively slow because of their tremendous complexity. Therefore there is a need to improve the speed at which such tasks can be performed.
Neural nets are a well known device for automatically selecting which patterns of values in certain source fields of records are likely to be associated with desired values in one or more target fields. A neural network normally includes an input layer comprised of a plurality of input nodes, an output layer of one or more output nodes, and, in hidden-layer networks, one or more so-called hidden layers, each comprised of one or more nodes. Hidden layer are hidden in the sense that they do not connect directly to any inputs or outputs.
The knowledge in a neural net is contained in its weights. Each node in the input layer or hidden layer contains a weight associated with its connection with each node in the next layer. Thus, in a typical hidden-layer network, each node in the input layer has a separate weight for its connection to each node in the hidden layer, and each node in the hidden layer has a separate weight for its connection to each node in the output layer. The value supplied to each given node in a given layer is supplied to each individual node in the successive layer, multiplied by the weight representing the connection between the given node and the individual node in the successive layer. Each node receiving such values generates an output, which is a function of the sum of the values supplied it. Usually the output is a non-linear function of the sum of values supplied to the node, such as a sigmoid function. The sigmoid function has the effect of making the output operate like an on-off switch whose output varies rapidly from a substantially xe2x80x9coffxe2x80x9d value to a substantially xe2x80x9conxe2x80x9d value as the sum of the values supplied to the node crosses a small threshold region.
A common way for training the weights of a neural network is to take each record in a training set and apply the value of each of its source fields to a corresponding input of the net. The network""s weights are then modified to decrease the difference between the resulting values generated at the network""s one or more outputs and the actual values for the outputs"" corresponding target fields in the record. There are a variety of well know methods for making such weight modifications, including back propagation, conjugate gradient, and quick propagation. The training process is normally repeated multiple times for all the training records until the sum of the difference between the generated and actual outputs approaches a relative minimum.
One of the problems with neural nets is that the amount of time to appropriately train them to recognize all of the possible source field patterns associated with desired target field values goes up very rapidly as the number of source or target fields does, and as the number of different types of source patterns which might be associated with a desired target does. Even with large parallel computer systems the amount of time required to properly train such networks to learn such complex sets of patterns is often prohibitive.
In an attempt to improve the speed at which neural networks can train, a new type of neural network has been proposed. These are so called neural tree networks. These are decision trees, a well known type of classifying tool, in which a neural network is placed at each of the network""s non-terminal nodes. In such trees, each non-terminal node is a two layer network, which trains much more rapidly than a hidden-layer network. The data applied to each non-terminal node is used to train up the node""s neural net. This is done in a training process which applies the source fields used in the overall classification process to the input nodes of the net and the one or more target fields used in that classification process to the output the two layer net. Once the network has been trained over the training set, the data objects are split between the node""s child nodes based on whether the one or more sigmoidal output of the trained net is xe2x80x9conxe2x80x9d or xe2x80x9coffxe2x80x9d for each such data object. The data object reaching the tree""s terminal, or leaf, nodes are considered classified by the identity of the particular leaf node they reached.
Such neural tree networks have the advantage of training much more rapidly than traditional neural networks, particularly when dealing with large complex classification tasks. However, they are not as discriminating as might be desired.
In general, a major issue in parallel computing is the division of the computational task so that a reasonable percentage of the computing power of multiple processor can be taken advantage of, and so the analytical power of the process is as high as possible. This issues is particularly important when it comes to many data base mining functions, such the training of neural networks mentioned above or of other modeling tasks.
It is an object of the present invention to provide apparatuses and methods for more efficiently computing large amounts of data.
It is another object of the present invention to provide apparatuses and methods for efficiently finding patterns in data sets, particularly large data sets.
It is still another object of the present invention to provide apparatuses and methods for efficiently using and training neural networks to find patterns in data set.
It is yet another object of the present invention to provide apparatuses and methods for more efficient parallel computing.
According to one aspect of the present invention a computer system with P processors receives data objects having N parameters. It divides an N-dimensional data space defined by the N parameters into M sub-spaces, where M is greater than or equal to P. This is done in such a manner that the boundaries between the resulting sub-spaces need not be orthogonal to the N-dimensions. The system associates a different set of one or more sub-spaces with each of the P processors. It distributes data objects located in each sub-space to the sub-space""s associated processor and causes each processor to perform a computational process on each of the data objects distributed to it.
According to another aspect of the invention, a computer system with P processors receives set of data objects to be processed. A decision tree partitions the data set into at least M data sub-sets, where M is equal or greater than P. A different set of one or more of the sub-sets is associated with each processor, and the data objects in each sub-set are sent to the associated processor for processing. In some embodiments, the process of using a decision tree to partition the data set is performed on fewer than P processors. In many embodiments, the decision criteria of the non-terminal nodes of the decision tree are trained on the data set, in a process where each non-terminal node both trains on and then divides between its children the data supplied to it.
In some embodiments, the non-terminal nodes are neural nets having hidden layers. In some embodiments, the decision criteria of the non-terminal nets can be automatically set to achieve a desired ratio between the number of data objects sent to each of such node""s child nodes. In some such embodiments, the system automatically configures the decision tree to have a number of leaf nodes which is an integer multiple of the number P of processors.
According to another aspect of the invention, a computer system divides an N-dimensional data space, having a separate dimension for each of N parameters associated with the data set, into M sub-spaces. It associates each of these M sub-spaces with a corresponding one of M hidden-layer neural networks, and uses the data objects in each of the M sub-spaces to train that sub-space""s associated hidden-layer neural network. The resulting divisions need not be orthogonal to the N dimensions of the space.
According to another aspect of the invention, a computer system creates a decision tree having a neural network for each of its nodes, including a hidden-layer network for each of its terminal, or leaf, nodes. Each of the tree""s non-terminal nodes use the portion of the training data which is supplied to it to train its associated neural network and then uses that neural network, once trained, to determining which of the training data object supplied to it should be supplied to each of its child nodes. In one embodiment, the net in each non-terminal node is trained to divide an N-dimensional space defined by parameters from the training data set into sub-spaces, and the data objects associated with each sub-space are routed to a different one of that non-terminal node""s child nodes. In such an embodiment, each non-terminal node can be a two layer neural networks which defines a single vector of weights in the N-dimensional space, and the data space is split by a plane perpendicular to that vector.
The portion of the training set supplied by the decision tree to each of its terminal, or leaf, nodes is used to train that node""s corresponding neural network. In preferred embodiments, different leaf node networks are trained on different processors. In many embodiments, a copy of the entire decision tree, including the neural networks in both its non-terminal and leaf nodes, is stored on each of a plurality of processors. Then a set of new data objects is split into separate data partitions, one for each of such processor. Finally data objects from the partition associated with each processor are passed down through the copy of the complete decision tree stored on that processor. This causes each such data object to be routed to a given leaf node of the tree, at which point the hidden-layer neural network associated with the given leaf node will analyze the data object, such as by classifying it, or recording an estimated value for each of its target fields.
According to another aspect of the invention, a neural net tree has hidden-layer neural networks in it non-terminal nodes.
According to another aspect of the invention, a computer system includes a neural network, such as one in the nodes of one of the above mentioned decision trees, which automatically causes a selected percent of data objects applied to the neural network to be selected for a given purpose.