This application is filed with three Appendices, which are a part of the specification and are herein incorporated by reference. The Appendices are:
Appendix A: Descriptions of QIO library routines for a shared memory queueing system.
Appendix B: A description of socket calls supported in a preferred embodiment of the invention.
Appendix C: A list of QIO events occurring in a preferred embodiment of the present invention.
1. Field of the Invention
This invention relates to operating system software and, more particularly, to a method and apparatus for increasing the efficiency of data transfer between processes and between processes and drivers in a data processing system.
2. Description of Related Art
Conventional multiprocessor computers and massively parallel processing (MPP) computers include multiple CPUs, executing the same instructions or executing different instructions. In certain situations, data passed between the processors is copied when it is passed from one processor to another. In conventional fault tolerant computers, for example, data is backed up and checkpointed between the CPUs in furtherance of the goals of fault tolerance, linear expandability, and massive parallelism. Thus, in fault tolerant computers, data is duplicated between CPUs and if one CPU fails, processing can be continued on another CPU with minimal (or no) loss of data. Such duplication of data at the processor level is highly desirable when used to ensure the robustness of the system. Duplication of data, however, can also slow system performance.
In some conventional systems, data is transferred between software processes by a messaging system in which data is physically copied from one process and sent to the other process. This other process can either be executing on the same CPU or on a different CPU. The messaging system physically copies each message and sends each message one at a time to the receiving process.
When the copied data is used for purposes of checkpointing between processors, for example, it is desirable that the data be physically copied. At other times, however, the data is merely passed between processes to enable the processes to communicate with each other. In this case, there is no need to physically copy the data when the processes reside in the same CPU. At such times, it may take more time to copy and transmit the data between processes than it takes for the receiving process to actually process the data. When data is transferring between processes executing on the same CPU, it is not efficient to copy data sent between the processes.
Traditionally fault-tolerant computers have not allowed processes or CPUs to share memory under any circumstances. Memory shared between CPUs tends to be a xe2x80x9cbottleneckxe2x80x9d since one CPU may need to wait for another CPU to finish accessing the memory. In addition, if memory is shared between CPUs, and if one CPU fails, the other CPU cannot be assured of a non-corrupt memory space. Thus, conventionally, messages have been copied between processes in order to force strict data integrity at the process level.
On the other hand, passing data between processes by duplicating the data is time-consuming. To improve execution time, programmers tend to write larger processes that incorporate several functions, instead of breaking these functions up into more, smaller processes. By writing fewer, larger processes, programmers avoid the time-delays caused by copying data between processes. Large processes, however, are more difficult to write and maintain than smaller processes. What is needed is an alternate mechanism for passing data between processes in certain circumstances where duplication of data takes more time than the processing to be performed and where duplication of data is not critical for purposes of ensuring fault tolerance.
The present invention provides an apparatus and method for improving the efficiency of data transfer between processes and between processes and drivers in a fault tolerant, message based operating system. In the present invention, processes can communicate with each other through two distinct methods. First, processes can communicate with each other using a conventional messaging system, where data is copied each time it is transferred between processes. This first method is used primarily for functions relating to fault tolerance, linear expandability and parallelism where it is desirable, or at least acceptable, to duplicate the data being transferred. Second, processes can communicate with each other by using a shared memory queueing system (sometimes shortened to xe2x80x9cshared memoryxe2x80x9d, xe2x80x9cqueued I/Oxe2x80x9d or xe2x80x9cQIOxe2x80x9d). This method is used primarily for functions relating to server processing, LAN protocol processing, and transmitting data between processes running on the same processor.
The shared memory queueing system allows processes executing on the same processor to transmit data without copying the data each time it is transferred. This increase in inter-process speed also makes it possible to divide the processes into small, functional modules. Process modularity can be xe2x80x9cvertical,xe2x80x9d e.g., a single large process can be broken down into several smaller processes with a minimum loss of time lost due to transferring data between the processes. Process modularity can also be xe2x80x9chorizontal,xe2x80x9d e.g., various client processes can access one server process through the shared memory queueing system.