UNIX® is a registered trademark referring to a computer operating system (“OS”) developed at Bell Labs in about 1969, but the term has become associated with a number of operating systems that merely share some characteristics with the original OS. In the present disclosure, the word Unix will be used to denote UNIX® and UNIX-like operating systems, including BSD (a variant of UNIX), LINUX® (an independently-developed OS with many points of similarity), Mac OS® X (an operating system derived from BSD that is commonly used on Macintosh® computers from Apple Computer Corporation of Cupertino, Calif.), and other systems that encourage or support the pipelined data processing techniques described below.
In a Unix system, a variety of small, single-purpose (or limited-purpose) applications is usually provided, and sophisticated data manipulations can be accomplished by setting up a “pipeline” of these small applications, each application to perform a stage or step of the complete manipulation. Such a pipeline may be defined or expressed as a textual command:data-generator|step-1|step-2| . . . |data-consumer  Listing 1
The vertical bars (“|”) in Listing 1 are pronounced “pipe” when the command is read aloud. The command above expresses a data processing pipeline in which a program named data-generator produces some sort of information, which is passed (as if through a pipe) to a second program, step-1, that performs a first manipulation. The manipulated data from step-1 is in turn passed to step-2 for further manipulation, and so on, until the processed data finally makes it to data-consumer for disposition. For example, data-consumer may store the processed data in a file, print it, operate a machine according to the processed data, etc.
Applications or “utilities” that can be used in a data processing pipeline operate to receive data from a predetermined source known as the “standard input,” and send their results to a predetermined destination known as the “standard output.” Informational and error messages may be emitted on a “standard error,” and systems often arrange for these messages to be displayed to the user. A pipeline is constructed by connecting the standard output of one program to the standard input of the next program using an interprocess communication facility.
Data pipelines provide an easy way to express a complicated sequence of manipulations from a command-line user interface (“UI”), where a computer user types commands to be executed, on a keyboard. Such command-line interfaces (“CLI”) were in widespread use for many years, and have survived for certain applications despite the current popularity of graphical user interfaces (“GUIs”). CLIs are often easier to use in setting up Unix-style data processing pipelines, while GUIs provide a more intuitive paradigm for controlling large, monolithic applications with many built-in features and options.
Novel extensions to the traditional CLI pipeline setup syntax can provide easy control of distributed data processing operations.