The Unix (tm licensed by X/Open Company, LTD) operating system and Linux operating system currently offer a Pipes control program to control sequencing of data processing by different applications. A programmer provides to the Pipes control interpreter program, various program stages (or program functions) and a Pipes command to control sequencing of data between the sages. The Pipes command indicates which stage is entitled to request the output of another, specified stage. For example, a user can provide to the Pipes control program, stages A, B and C and issue the following Pipes command: “Stage A/Stage B/Stage C”. In response, the Pipes control program will form a Pipes application program. According to this Pipes application program, Stage A will generate output data and automatically send it to the Pipes control program. Upon request by Stage B to the Pipes control program, the Pipes control program will furnish to Stage B the output data from Stage A. Stage B will process the output data from Stage A, and automatically send its output data to the Pipes control program. Upon request by Stage C to the Pipes control program for data, the Pipes control program will furnish the output data from Stage B to Stage C. The format for the Pipes command and the interface between each stage and the Pipes control function is based on a predefined protocol. According to the Pipes control function protocol, each stage in the “Pipe” is ignorant of which other stage is the source or recipient of its data, and does not synchronize the data with the prior or subsequent stages. To synchronize the data means to coordinate access to and processing of the data. This simplifies programming of the stages and definition of the Pipes applications by the users. In Unix and Linux Pipes control programs, each stage in the Pipe can receive data from only one stage and can provide data to only one stage, i.e. “single-streaming”. Also, a Unix or Linux Pipes Application is limited to stages and control programs executing in the same real computer.
International Business Machines Corporation has licensed an IBM z/VM operating system to provide a Virtual Machine environment in a real computer. To form a Virtual Machine environment, a base operating system (called “Control Program” or “CP” in IBM Virtual Machine operating systems) logically divides the physical resources (i.e. processor time, memory, etc.) of a real computer into different functional units. Each functional unit or “virtual machine” typically has all the physical resources to execute its own operating system (such as IBM VM/CMS operating systems, Linux (tm of Linus Torvalds) operating system or z/OS operating systems) and applications. Applications, guest operating systems and other programs execute in each virtual machine as if the programs were executing in separate real computers. In these respects, a virtual machine is similar to a logical partition or “LPAR”, which is another known technique to logically divide the physical resources of a computer into different functional units.
The IBM z/VM operating system provides a Pipeline control program in the IBM VM/CMS guest operating system, and IBM z/OS operating system provides a similar Pipeworks control program in its guest operating system. A user provides program stages to each control program and a Pipeline command or Pipeworks command, which is similar to the Pipes command. The known Pipeline control function and Pipeworks control function control sequencing of data between stages, according to the Pipeline or Pipeworks command. The Pipeline or Pipeworks command indicates which stage is entitled to request the output of another, specified stage. For example, a user can provide to the Pipeline control program, Stages A, B and C and issue a Stage A/Stage B/Stage C command. In response, the Pipeline control program will form a Pipeline application program. According to this Pipeline application program, Stage A will generate output data and send it to the Pipeline control program. Upon request by Stage B to the Pipeline control program, the Pipeline control program will furnish to Stage B the output data from stage A. Stage B will process this output data from Stage A, and automatically send its output data to the Pipeline control program. Upon request by Stage C to the Pipeline control program for data, the Pipeline control program will furnish the output data Stage B to Stage C. The format for the Pipeline command and the interface between each stage and the Pipeline control function are based on a predefined protocol. According to the Pipeline control function protocol, each stage in the Pipeline is ignorant of which other stage is the source or recipient of its data, and does not synchronize the data with the prior or subsequent stages. This simplifies programming of the stages and definition of the Pipeline command. A Pipeline or Pipeworks application is limited to stages and control programs executing in the same virtual machine or real computer.
In many respects, the Pipeline and Pipeworks control programs are similar to the Pipes control program. However, as noted above, the Pipes control program only supports “single-streaming”, whereas the Pipeline and Pipeworks control programs support “single-streaming” and “multi-streaming”. In multi-streaming, a Pipeline stage or Pipeworks stage can receive data from one or more other stages and can provide data to one or more other stages. Often times, different units of output from one stage are provided as input to more than one other stage in the “multi-streaming” arrangement so that the other stages can process the output from the one stage in parallel. To implement multi-streaming output, the Pipeline control program provides special purpose stages that can either take multiple streams and convert them into one stream (“fan-in”) or take one stream and convert it into multiple streams (“fan-out”). This allows pipeline applications to be much more flexible than traditional pipes applications, thus enabling pipeline applications to perform a much wider set of tasks. An example of a Pipeline command for a multi-streaming output is as follows:
Pipe (endchar ?)Literal “George Washington” /* define some data */| a: fanout /*output data to multiple streams */| > USA Presidents /* write output on first stream to a file */? /* end of first stream */| a: /* start second stream */| > Bad Golfers/* write output on second stream to a different file */An example of a Pipeline command for a multi-streaming input is as follows:
Pipe (enchar ?)Literal “George Washington” /* define data for first stream */| A: fanin /* input data from multiple streams */| > USA Presidents_/* output data to a file */? /* end of first stream */Literal “John Adams” /* define data for second stream */A: /* output second stream data to fanin stage */
Parallel processing was also known in non-piping environments. For example, an application has been divided into multiple parts to be run on multiple computers, where communications between the computers are used to synchronize the processing done by such a multi-part program. The purpose of such an arrangement was to provide parallel processing of independent parts of the program where the sequential execution of those parts would not provide sufficient throughput. Such a program is complex because it is difficult to determine exactly which parts of the program are independent and which parts require synchronization. In addition, managing the multiple parts and implementing the required synchronization is also difficult.
It was known in a nonpiping environment to provide shared files in a shared memory accessible by different applications in different virtual machines in the same or different real computer. The nonpiping applications in the different virtual machines can write data to the shared memory without identifying an authorized reader(s) of the data from the shared memory. The nonpiping applications in the different virtual machines can read data from the shared memory without identifying an authorized writer(s) of the data to the shared memory. It was known that these nonpiping applications could process in parallel the data read from the queue, and return resultant data to the queue. It was also known in a nonpiping environment to serialize access to the data in the shared memory by providing a shared queue in the shared memory. It was also known in a nonpiping environment to synchronize access to the data in the shared memory by a shared lock structure.
An object of the present invention is to improve the versatility of a Pipes control program, Pipeline control program, Pipeworks control program and other such piping control programs.