This invention relates to data processing systems that make use of multiple processing unit groups, and in particular to an asymmetric architecture that allows for autonomous and asynchronous operation of processing units and streaming of record data processing.
With continued development of low cost computing systems and proliferation of computer networks, the world continues to see an exponential growth in the amount and availability of information. Indeed, the Massachusetts-based Enterprise Storage Group has observed a doubling of information every few months. Demand for easy and efficient access to this ever-growing amount of digital information is another certainty. For example, World Wide Web traffic increased 300% in 2001 according to Forrester Research. Included among the applications that continue to make the greatest demands are systems for processing:                financial transactions data;        “click stream” data that encapsulates the behavior of visitors to web sites;        data relating to the operational status of public utilities such as electric power networks, communications networks, transportation systems and the like;        scientific data supporting drug discovery and space exploration.        
Greg Papadopolous, the Chief Technical Officer of Sun Microsystems, Inc., has observed that the demand for access to decision support databases, referred to as the Input/Output (I/O) demand growth, doubles every nine months. To put this in context, Moore's Law predicts that Central Processing Unit (CPU) power doubles only about every 18 months. In other words, the demand for access to information is growing at least twice as fast the ability of a single CPU to process and deliver it.
In a typical general purpose data processing system, data is stored on one or more mass storage devices, such as hard disk drives. One or more computers are then programmed to read data from the disks and analyze it—the programs may include special database software written for this purpose. The problem with such a general purpose system architecture, however, is that all the data must be retrieved from the disk and placed in a computer's memory, prior to actually being able to perform any operations on it. If any portion of the data retrieved is not actually needed, the time spent fetching it is wasted. Valuable time is thus lost in the mere process of retrieving and storing unnecessary data.
The speed at which the data analysis can be performed is typically limited to the speed at which the entire set of data can be transferred into a computer's memory and then examined by the CPU(s). Usually, the aggregate data transfer rate of the disks does not govern the speed at which the analysis can be performed. Disks are inexpensive, and as such, data can be spread across a large number of disks arranged to be accessed in parallel. The effective data transfer rate of a set of disks, collectively, can therefore be almost arbitrarily fast.
The bandwidth of an interface or communications network between the disks and the CPUs is also typically less than the aggregate data transfer rate of the disks. The bottleneck is thus in the communications network or in the CPUs, but not in the disks themselves.
It has been recognized for some time that achieving adequate performance and scalability in the face of vast and rapidly growing data thus requires some kind of system architecture that employs multiple CPUs. The three most prevalent classes of so-called multiprocessing systems today include:                Symmetric Multiprocessing (SMP)        Asymmetric Multiprocessing (ASMP)        Massively Parallel Processing (MPP)But even these approaches have weaknesses that limit their ability to efficiently process vast amounts of data.        
SMP systems consist of several CPUs, each with their own memory cache. Resources such as memory and the I/O system are shared by and are equally accessible to each of the processors. The processors in an SMP system thus constitute a pool of computation resources on which the operating system can schedule “threads” of executing code for execution.
Two weaknesses of the SMP approach impair its performance and scalability when processing very large amounts of data. The first problem results from a limited ability to actually provide information to the processors. With this architecture, the I/O subsystem and the memory bus are shared among all processors, yet they have a limited bandwidth. Thus, when the volume of data is too high, the speed of the processors is wasted waiting for data to arrive. A second problem with the SMP approach is cache coherence. Within each processor is typically a cache memory for storing records so that they may be accessed faster. However, the more processors that are added to an SMP system, the more time that must be spent synchronizing all of the individual caches when changes are made to the database. In practice, it is rare for SMP machines to scale linearly beyond about 64 processors.
Asymmetric Multiprocessing (ASMP) systems assign specific tasks to specific processors, with a master processor controlling the system. This specialization has a number of benefits. Resources can be dedicated to specific tasks, avoiding the overhead of coordinating shared access. Scheduling is also easier in an ASMP system, where there are fewer choices about which processor to assign to a task. ASMP systems thus tend to be more scalable than SMP systems. One basic problem with asymmetry is that it can result in one processor being overloaded while others sit idle.
Massively Parallel Processing (MPP) systems consist of very large numbers of processors that are loosely coupled. Each processor has its own memory and storage devices and runs its own operating system. Communication between the processors of an MPP system is accomplished by sending messages over network connections. With no shared resources, MPP systems require much less synchronization than SMP and ASMP systems.
One weakness of the MPP model is that communication among processors occurs by passing messages over a network connection, which is a much slower technique than communication through shared memory. If frequent inter-processor communication is required, then the advantages of parallelism are negated by communication latency. Another problem with the MPP approach is that traditional programming models do not map cleanly onto message passing architectures. Using approaches such as Common Object Request Broker Architecture (CORBA), which are designed to handle message passing, are considered awkward by some designers.
There have also been attempts over the years to use distributed processing approaches of various types. These began with proposals for “Database Machines” in the 1970s, for “Parallel Query Processing” in the 1980s, and for “Active Disks” and “Intelligent Disks” in the last five to ten years. These techniques typically place a programmable processor directly in a disk sub-assembly, or otherwise in a location that is tightly coupled to a specific drive. This approach pushes processing power down to the disks, and thus can be used to reduce the load on a host computer's CPU.
More recently, system architectures have been adopted for parallel execution of operations that originate as standard database language queries. For example, U.S. Pat. No. 6,507,834 issued to Kabra et al. uses a multi-processor architecture to process Structured Query Language (SQL) instructions in a publish/subscribe model such that new entries in a database are automatically processed when added. As explained in the Abstract of that patent, a first processor is used as a dispatcher to execute optimized queries, setup communication links between operators, and ensure that results are sent back to the application that originated the query. The dispatcher merges results of parallel execution to produce a single set of output tuples that is then returned to a calling procedure.
U.S. Pat. No. 6,339,772 issued to Klein et al. discloses an SQL compiler and executer that support a streaming mode of operation. Again, with this architecture, “parent” and “child” nodes are assigned to execute portions of a SQL execution tree. Memory queues are also disposed between the nodes to permit intermediate storage of requests and fetched records.
Finally, U.S. Pat. No. 6,542,886 issued to Chaudhuri et al. discloses a database server that sequentially samples records that originate from a data stream in a pipelined query tree such that the system can sample over a “join” of two tuples without prior materialization or computation of the complete join operation.