UNIX® is a registered trademark referring to a computer operating system (“OS”) developed at Bell Labs in about 1969, but the term has become associated with a number of operating systems that merely share some characteristics with the original OS. In the present disclosure, the word Unix will be used to denote UNIX® and UNIX-like operating systems, including BSD (a variant of UNIX), LINUX® (an independently-developed OS with many points of similarity), Mac OS® X (an operating system derived from BSD that is commonly used on Macintosh® computers from Apple Computer Corporation of Cupertino, Calif.), and other systems that encourage or support the pipelined data processing techniques described below.
In a Unix system, a variety of small, single-purpose (or limited-purpose) applications is usually provided, and sophisticated data manipulations can be accomplished by setting up a “pipeline” of these small applications, each application to perform a stage or step of the complete manipulation. Such a pipeline may be defined or expressed as a textual command:
data-generator|step-1|step-2| . . . |data-consumer
Listing 1
The vertical bars (“I”) in Listing 1 are pronounced “pipe” when the command is read aloud. The command above expresses a data processing pipeline in which a program named data-generator produces some sort of information, which is passed (as if through a pipe) to a second program, step-1, that performs a first manipulation. The manipulated data from step-1 is in turn passed to step-2 for further manipulation, and so on, until the processed data finally makes it to data-consumer for disposition. For example, data-consumer may store the processed data in a file, print it, operate a machine according to the processed data, etc. Information flowing through a pipeline is commonly (though not necessarily) represented as printable text characters and separated into larger groups or units by delimiters such as newline characters.
Applications or “utilities” that can be used in a data processing pipeline operate to receive data from a predetermined source known as the “standard input,” and send their results to a predetermined destination known as the “standard output.” Informational and error messages may be emitted on a “standard error,” and systems often arrange for these messages to be displayed to the user. A pipeline is constructed by connecting the standard output of one program to the standard input of the next program using an interprocess communication facility. (Setting up a pipeline may be referred to as “plumbing” the data connections.) Most pipeline-compatible applications operate on unstructured data (e.g., a stream of bytes), singular data objects (e.g., a graphical image or a sound clip), or a plurality of delimiter-separated units of text such as words or lines.
Data pipelines provide an easy way to express a complicated sequence of manipulations from a command-line user interface (“UI”), where a computer user types commands to be executed, on a keyboard. Such command-line interfaces (“CLI”) were in widespread use for many years, and have survived for certain applications despite the current popularity of graphical user interfaces (“GUIs”). CLIs are often easier to use in setting up Unix-style data processing pipelines, while GUIs provide a more intuitive paradigm for controlling large, monolithic applications with many built-in features and options.
Novel extensions to the traditional CLI pipeline can permit structured data streams to feed or be used within a pipeline.