Many commercial enterprises require that large volumes of data be processed in as short a time frame as possible. In recent years, businesses needing to process such large volumes of data have purchased very expensive, specialized multi-processor hardware, often referred to as mainframe computers, supercomputers or massively parallel computers. The cost of such hardware is often in the millions of dollars, with additional costs incurred by support contracts and the need to hire specialized personnel to maintain these systems. Not only is such supercomputing power expensive, but it does not afford the user much control over how any given task gets distributed among the multiple processors. How any computing task gets distributed becomes a function of the operating system of such a supercomputer.
In the field of data processing, often very similar operations are performed on different groups of data. For example, one may want to count the unique instances of a class, e.g., a group of data, for several different classes, know what the arithmetic mean of a given class is, or know what the intersection of two classes may be. In a supercomputing environment, one has to rely on the operating system to make sound decisions on how to distribute the various parts of a task among many central processing units (CPUs). Today""s operating systems, however, are not capable of this kind of decision making in a data processing context.
Thus, there is a need for a system and method that overcomes these deficiencies.
According to the preferred embodiments, described is a system and method for allowing multiple processors, for example, a computer""s central processing unit(s) (CPU), on a network to perform varying number and type of data processing tasks. Data processing tasks are provided to any of the available CPU""s on a network equipped for the system and method. The system and method choose the first available CPU for performance of the data processing task. The system and method provide the task performing CPU with the minimum amount of data needed to complete the task and the necessary software instructions to complete the task.
More specifically, the preferred system and method allow for improved efficiency, in terms of both cost and time, of data processing. The user of the software considers a given data processing task, provides a set of definitions for the type of data required for each task, and specifies the task for a given group of data. The system and method then divide up the input file into the sub-task data files and ships the given data and task specifications to any available computer on the network. The CPU performs the task and returns the completed result to the computer that requested the task.
Thus, large amounts of data to be processed quickly, by ordinary, commodity personal computers, running conventional operating systems such as Windows NT or Unix. A small cluster of, for example, twelve dual processor computers or twenty-four single processor computers, running the software of the preferred embodiments, can equal, if not exceed the performance of a supercomputer with an equivalent number of CPUs.