This invention relates to distributed data processing systems that use multiple processing unit groups, and in particular to a programmable streaming data processor that performs initial primitive functions before data is further handled by a more general purpose job processor.
Among the applications that continue to make the greatest demands on data processing systems are those that require management of massive amounts of information. Indeed, the ability to efficiently access data stored in related files, most commonly known as Data Base Management Systems (DBMS), continues to drive development of complex but efficient system architectures. Present day DBMS systems are used to manage many different forms of data, including not only field oriented records but almost any form of data including text, images, sound, video clips and similar less structured data. DBMSs are, therefore, now expected to provide an efficient, consistent, and secure method to store and retrieve data of varying types.
It is now common in high performance systems to distribute the processing load among multiple processors, and thus provide for processing of data in parallel. These systems take a data query such as may be presented in a Structured Query Language (SQL), and develop an optimized plan for parallel execution. One processor may be used as a dispatcher to analyze the query, set up communication links between the various parallel processors, instruct the processors as to how to carry out the query, and insure that results are sent back to the server that initiated the query. Therefore, in such a distributed environment, data may typically be stored on an array of storage devices. One or more computers attached to the disk drives are responsible for reading data and analyzing it, by executing portions of the query.
Even though queries may be optimized for parallel processing in this manner, the problem with such a system is that data must be still retrieved from the disk and placed in a processor's memory prior to analyzing it. Only then can the processors operate on the data. Thus, although this approach off loads specific jobs from the responsibility of a single processor, valuable time is still spent in the process of data retrieval and storage among the distributed processors. Even if only a portion of the data retrieved is extraneous, the time spent fetching it is wasted.
The speed at which data analysis can be performed is limited to the speed at which the entire set of data can be transferred into one of the distributed processor's memories and, processed by its Central Processing Unit (CPU). Disks are inexpensive; thus, many disks can be used to store extremely large databases. Since all of them may be read in parallel, the effective data transfer rate of such a system is almost arbitrarily fast. Usually the bandwidth of the communication network connecting the distributed processors is less than the aggregate data transfer rate of the disks. Furthermore, the time required by the CPUs to analyze the data retrieved from the disks is typically far longer than the time required to retrieve the data. Bottlenecks occur, thus, either in the communication network or in the CPU processing, but not on the disks themselves.
Certain development efforts known as active disk drives and/or intelligent drives, have attempted to push processing bottlenecks from the network down to the disk elements themselves. Such efforts sought to place a processor directly on the disks, such as located on a hardware interface card connected to a disk drive device. This assembly of custom hardware card and disk then acts as a high powered disk drive. By placing methods for intelligently filtering and retrieving data on the local disk, this approach reduces the load on a host computer's Central Processing Unit. However, this approach requires custom disk assemblies so that industry standard disk drive interfaces must be modified. This increases the overall cost and complexity of installation.