1. The Field of the Invention
This invention relates generally to distributed parallel processing on a computer network, and more particularly to a method and apparatus for parallel processing digital rules in rule nets and slave translators which are interconnected by a global controller and bindery.
2. Background
Many organizations are now dependent on computers for the management of their data. Because of the mission critical nature of computer applications like web servers, accounting, point of sale, inventory and customer information, even small organizations need their computer systems to be running virtually all the time. Computer users desire their systems to be fast and reliable, but current von Neumann processors are coming close to the limit of their capacity. In addition, von Neuman processors are sometimes unstable and can crash. The increased processing requirements for virtual reality graphics and client server network applications also justify the need for faster, affordable and more stable computers.
One way to speed up a computer is to increase a von Neumann single processor""s speed. This has been occurring at a steady rate in the computer industry through miniaturization and circuit optimization. Another method of increasing computer performance is by adding more von Neumann processors to a single machine, also known as parallel processing. Unfortunately, conventional parallel processing machines have throughput bottlenecks and software which is difficult to debug. It is also known that von Neumann processors on separate machines or nodes can be connected in configurations such as client-server or neural networks. Some neural net models have even been designed which can process digital information and return a result.
Although adding more processors may help increase the speed of a computer it does not necessarily increase the reliability of the computer or its operating system. The popular operating systems on microcomputers such as Macintosh, Windows, Windows NT, Unix, and NetWare unpredictably fail from time to time. Overcoming single computer failures using multiple computer systems as a backup for systems that fail is known as redundancy.
One solution to single server failure is the concept of clustering. A cluster is a group of three or more servers (nodes) connected on a network which have high availability because they work together as one logical entity. When an independent server within the cluster fails, the workload of that server, and the services or applications it was providing, are distributed to the remaining servers in the cluster. Redundancy in a cluster provides high availability of data, network services and applications to users. The transfer of an application from a failed server to a remaining server is called failover. In clustering, the performance of the servers is also balanced as the servers allocate the network load to match the hardware capabilities of each server in the cluster. In order for the data on hard disk drives to be visible to the various nodes in the cluster, a shared disk sub-system is connected to all nodes in the cluster. If data were stored on a local hard drive of one of the servers, that data would be unavailable when that server crashed. Distributed file systems allow for all running servers to access the same data without corruption.
Outside users want to see the cluster as a single logical entity. This single system image (SSI) is achieved through certain characteristics of the cluster configuration. First IP (Internet Protocol) addresses which clients connect to are allowed to move from one server to another as part of failover. The virtual IP addresses, along with the software applications which use the IP address are moved from node to node when necessary.
Group membership software detects which nodes in a cluster are running, and cluster resource management software decides where the cluster resources reside (e.g. IP addresses, running application, disk subsystems). The decision as to which node gets a resource can be based on a cluster node preference list or some load balancing policy. The cluster resource management software does failover, failback and resource migration to adjust the load of the software on each node or server.
In a clustering system, the network client must have reconnection smarts so that the user cannot tell that behind the scenes a current connection to a server failed, and a new connection to the same IP address on another server has occurred. A major drawback with reconnection is the loss of gray data. An example of gray data loss is a transaction to a database on the server, which was not completed nor stored on the shared disk subsystem, thus the incomplete transaction will have to be started over. If the gray data remains in memory, then the transaction can automatically pickup where it stopped without restarting. If the gray data was erased from memory the transaction will have to be re-entered by the user when the application restarts on a remaining clustered server. Gray data recovery is a complex process and requires complex algorithms on the client side, in the database or on the servers. It is difficult for an application which runs on only one server at a time to prevent gray data loss because inside the cluster, the servers (nodes) are individually known to the group membership software and the cluster resource management software but they are not known to an individual application.
Current applications on a cluster are limited to running on one server. Of course an application will failover to another server if the server it is running on crashes, but it cannot automatically use the resources of all the servers in the cluster. Clusters provide a Single System Image (SSI) to users outside a group of network nodes, but a SSI from a software developer""s perspective inside the cluster does not exist. So currently, programmers who desire to create applications which use the resources of more than one server in the cluster must organize their application to directly address the separate servers. This is a very complex and daunting task because of the necessary network knowledge and the high complexity of the clustering resource management software. What is needed is a parallel, distributed application execution environment on a cluster of von Neumann processors which appears as a single system image (SSI) to a software developer.
It is an object of the present invention to provide a distributed digital rule processor and method which create a true SSI for distributed software application execution, inside or outside a networked von Neumann processor cluster by broadcasting digital rules which are processed by rule nets and slave translators.
It is an object of the present invention to provide a distributed digital rule processor and method to allow the sharing of gray data between servers and the hiding of servers or von Neumann processors as slaves to a layer of digital rules.
It is another object of the invention to provide a distributed digital rule processor and method to create, store, and execute complex digital rules over a clustered network of von Neumann processors.
It is yet another object of the invention to provide such a distributed digital rule processor and method for executing digital rules and utilizing the high reliability benefits of a networked cluster of servers.
The distributed digital rule processor creates a single system image (SSI) and executes digital rules on a clustered von Neumann processor network. The processor has a plurality of rule nets each having an ordered list of rules and each rule has input variables and output variables. The rule nets can broadcast rules to other rule nets or slave translators. A plurality of slave translators executes the rules received from the rule nets and returns the results and data from the executed rules to the calling rule net. A global controller is coupled to the rule nets and slave translators which has a global bindery, a global data memory, and a current broadcast state. A global rule distribution queue is coupled to the global controller to store pending digital rules and broadcast rules to the rule nets and slave translators as signaled by the global controller.
In one embodiment of the invention, a computer programmer compiles source code through a compiler into digital rules. These digital rules are stored in the rule nets. Slave translators are also provided to convert rule calls from the rule nets, which may include data, into API calls to be executed by a von Neumann host where a slave translator resides. The results returned from the API call or von Neumann processor calls are then returned to the rule nets. A global controller has a global queue to arbitrate and store the rule calls between the rule nets and the slave translators. The global controller also stores global data and current processing state. The rules are a dynamic length group of variables connected together which include at least the Function State number of the rule and the inputs and outputs of the rule.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by the practice of the present invention. The objects and advantages of the invention may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims.