1. The Field of the Invention
This invention relates generally to distributed parallel processing on a computer network, and more particularly to a method and apparatus for parallel processing digital rules in rule nets and slave translators which are interconnected by a global controller and bindery.
2. Background.
Many organizations are now dependent on computers for the management of their data. Because of the mission critical nature of computer applications like web servers, accounting, point of sale, inventory and customer information, even small organizations need their computer systems to be running virtually all the time. Computer users desire their systems to be fast and reliable, but current von Neumann processors are coming close to the limit of their capacity. In addition, von Neuman processors are sometimes unstable and can crash. The increased processing requirements for virtual reality graphics and client server network applications also justify the need for faster, affordable and more stable computers.
One way to speed up a computer is to increase a von Neumann single processor's speed. This has been occurring at a steady rate in the computer industry through miniaturization and circuit optimization. Another method of increasing computer performance is by adding more von Neumann processors to a single machine, also known as parallel processing. Unfortunately, conventional parallel processing machines have throughput bottlenecks and software which is difficult to debug. It is also known that von Neumann processors on separate machines or nodes can be connected in configurations such as client-server or neural networks. Some neural net models have even been designed which can process digital information and return a result.
Although adding more processors may help increase the speed of a computer it does not necessarily increase the reliability of the computer or its operating system. The popular operating systems on microcomputers such as Macintosh, Windows, Windows NT, Unix, and NetWare unpredictably fail from time to time. Overcoming single computer failures using multiple computer systems as a backup for systems that fail is known as redundancy.
One solution to single server failure is the concept of clustering. A cluster is a group of three or more servers (nodes) connected on a network which have high availability because they work together as one logical entity. When an independent server within the cluster fails, the workload of that server, and the services or applications it was providing, are distributed to the remaining servers in the cluster. Redundancy in a cluster provides high availability of data, network services and applications to users. The transfer of an application from a failed server to a remaining server is called failover. In clustering, the performance of the servers is also balanced as the servers allocate the network load to match the hardware capabilities of each server in the cluster. In order for the data on hard disk drives to be visible to the various nodes in the cluster, a shared disk sub-system is connected to all nodes in the cluster. If data were stored on a local hard drive of one of the servers, that data would be unavailable when that server crashed. Distributed file systems allow for all running servers to access the same data without corruption.
Outside users want to see the cluster as a single logical entity. This single system image (SSI) is achieved through certain characteristics of the cluster configuration. First IP (Internet Protocol) addresses which clients connect to are allowed to move from one server to another as part of failover. The virtual IP addresses, along with the software applications which use the IP address are moved from node to node when necessary.
Group membership software detects which nodes in a cluster are running, and cluster resource management software decides where the cluster resources reside (e.g. IP addresses, running application, disk subsystems). The decision as to which node gets a resource can be based on a cluster node preference list or some load balancing policy. The cluster resource management software does failover, failback and resource migration to adjust the load of the software on each node or server.
In a clustering system, the network client must have reconnection smarts so that the user cannot tell that behind the scenes a current connection to a server failed, and a new connection to the same IP address on another server has occurred. A major drawback with reconnection is the loss of gray data. An example of gray data loss is a transaction to a database on the server, which was not completed nor stored on the shared disk subsystem, thus the incomplete transaction will have to be started over. If the gray data remains in memory, then the transaction can automatically pickup where it stopped without restarting. If the gray data was erased from memory the transaction will have to be re-entered by the user when the application restarts on a remaining clustered server. Gray data recovery is a complex process and requires complex algorithms on the client side, in the database or on the servers. It is difficult for an application which runs on only one server at a time to prevent gray data loss because inside the cluster, the servers (nodes) are individually known to the group membership software and the cluster resource management software but they are not known to an individual application.
Current applications on a cluster are limited to running on one server. Of course an application will failover to another server if the server it is running on crashes, but it cannot automatically use the resources of all the servers in the cluster. Clusters provide a Single System Image (SSI) to users outside a group of network nodes, but a SSI from a software developer's perspective inside the cluster does not exist. So currently, programmers who desire to create applications which use the resources of more than one server in the cluster must organize their application to directly address the separate servers. This is a very complex and daunting task because of the necessary network knowledge and the high complexity of the clustering resource management software. What is needed is a parallel, distributed application execution environment on a cluster of von Neumann processors which appears as a single system image (SSI) to a software developer.