1. Field of the Invention
The present invention generally relates to computer systems, and more particularly to a method of distributing a multithread process among multiple workstations of a computer network, for improved high performance multithread data processing.
2. Description of the Related Art
The basic structure of a conventional computer system includes one or more processors which are connected to several input/output (I/O) devices for the user interface (such as a display monitor, keyboard and mouse), a permanent memory device for storing the computer's operating system and user programs (such as a magnetic hard disk), and a temporary memory device that is used by the processors to carry out program instructions (such as random access memory or RAM). The processors communicate with the other devices by various means, including a system bus or a direct channel.
When a user program runs on a computer, the computer's operating system (OS) first loads the main program file into system memory. The program file includes several objects (values) stored as data or text, and instructions for handling the data and other parameters which may be input during program execution. The processors use "logical addresses" to access the file objects, and these logical addresses correspond to physical addresses in RAM. Binding of instructions and data to physical memory addresses is accomplished by compiling the program file using relocatable code, which is indexed (linked) to physical memory by the OS loader during loading of the file.
A computer program can be broken down into a collection of processes which are executed by the processor(s). A process is a set of resources, including (but not limited to) logical addresses, process limits, permissions and registers, and at least one execution stream. The smallest unit of operation to be performed within a process is referred to as a thread. The use of threads in modern operating systems is well known. Threads allow multiple execution paths within a single address space (the process context) to run concurrently on a processor. This "multithreading" increases throughput in a multiprocessor system and provides modularity in a uniprocessor system.
In a single tasking operating system, a computer processor executes computer programs or program subroutines serially, that is no computer program or program subroutine can begin to execute until the previous computer program or program subroutine has terminated. This type of operating system does not make optimum use of the computer processor in a case where an executing computer program or subroutine must await the occurrence of an external event (such as the availability of data or a resource) because processor time is wasted. This problem lead to multitasking operating systems wherein each of the program threads performs a specific task. If a thread being executed must wait for the occurrence of an external event, i.e., the thread becomes "non-dispatchable," then its execution is suspended and the computer processor executes another thread of the same or different computer program to optimize utilization of processor assets. Multitasking operating systems have also been extended to multiprocessor environments where threads of the same or different programs can execute in parallel on different computer processors.
FIG.1 illustrates multithreading in a prior art multiprocessor computer system 10. System unit 10 includes one or more processors 12a, 12b and 12c which are connected to various I/O devices 14 and a memory device 16 (RAM) via a bus 18. Each processor includes a central processing unit (CPU) and one or more on-board caches. Each cache comprises a small amount of high speed memory which stores a local copy of data utilized by its associated processor. A typical processor used for multithread processing is the PowerPC.TM. integrated circuit superscalar microprocessor manufactured by International Business Machines Corporation. When data requested by a processor is not resident within its associated cache, the processor will attempt to load the requested data from an optional (L2) cache, or from global memory 16 (which may include one or more individual modules of physical memory).
Global memory 16 has a kernel portion with a set 20 of thread context fields for N number of threads associated with a particular process. Global memory 16 further includes a process context 22 in a user address space which contains all of the logical addresses for data and instructions used by the process. After a thread is created and prior to termination, the thread will most likely utilize system resources to gain access to process context 22. Through the process context, process threads can share data and communicate with one another in a simple and straightforward manner.
Separate computers can be linked into vast networks which allow for the distribution of processing among various systems in the networks. Unfortunately conventional systems only support distributing entire processes across the network, and do not support distribution of threads, since there is no common process context for the various systems in the network. This limitation causes programmers to use contorted mechanisms when writing distributed programs that need to work together to accomplish a task. These mechanisms often involve complicated communication protocols (such as sockets or pipes) that are very time consuming to develop and difficult to debug. An alternative approach is to provide a common process context using a shared virtual memory system which maps a large virtual memory space globally to all processes in the system. This approach, however, causes problems in performance, security and integrity. It would, therefore, be desirable and advantageous to provide an improved method to allow execution of various threads of a single process on different systems in the network.