1. Field of the Invention
This invention relates to processing data in a single-processing environment by emulating a multi-processing environment. In particular, this invention relates to an object-oriented system employing a library of predefined objects or data structures which can be linked between a host application and a data source to create a stream-oriented data processing structure emulating a UNIX.RTM.-like pipeline data structure.
2. Description of the Related Art
The use of data processing pipelines in true multi-processing environments is well known. Examples of known multi-processing environments include both multiple-processor systems and high level systems where a single processor is able to support true multi-processing. The UNIX.RTM. operating system is often used in such multi-processing systems.
In such multi-processing environments, data processing pipelines are extremely useful for processing large, highly structured data blocks, such as those associated with image processing, database processing, or spreadsheet processing. In such data blocks, various data processing operations must be performed on each data element of the data blocks. Further, the various data processing operations are performed in a specific order.
In a multi-processing environment, data processing pipelines provide a very efficient method for processing the data blocks. In these data processing pipelines, each separate data processing operation is defined as a section of the pipeline. Each section is linked to one or both of the sections (the upstream and downstream sections) to which it is adjacent. The data processing operation thus form a chain of linked pipeline sections between a data source and a host application. In a computer having a number of independent processors, each pipeline section corresponds to one of the processors. In this case, each processor works independently, and the computer operating system controls the flow of data between processors and the memory allocation. While this efficiently processes the data, the overhead necessary to control the processors and the memory consumes a significant proportion of the system resources.
Likewise, in a computer having a single processor which can simultaneously run a number of different independent processing operations, or processes, each pipeline section corresponds to one of the independently-running processes. In this case, the operating system allocates the run-time of each process, the flow of data between each process and memory and the memory allocation. The overhead necessary to control the computer in this case consumes an even larger proportion of the system resources, as each process and its data must be swapped into and out of the processor each time it is run. Additionally, since the processes communicate through the operating system, dynamically altering the pipeline is difficult, if not impossible.
In general, the source of data for the pipeline can be, for example: 1) a spreadsheet providing financial information; 2) the records within a data base file providing database information; and 3) image data generated by a conventional image scanner from an original document, a computer generated image, or the like. In contrast, the host application could be a graphics program for producing pie charts, bar charts or the like from processed financial data; an inventory, accounting, or merging program which uses processed database data; or an image forming apparatus for forming an image from the processed image data.
Regardless of the particular source of data or ultimate host application, the first, or upstream-most, section of the data processing pipeline is generally the data source for the pipeline. Alternately the data source for this first pipeline can be a second pipeline. In this case, a special branching or "fan-out" pipeline section of the second pipeline can be used to supply data to both the first pipeline and the downstream sections of the second pipeline. In either case, the first pipeline section obtains a data element for the pipeline from the source of data and makes the data element available to the immediately downstream, or second, pipeline section. The second pipeline section sequentially receives the data element from the first pipeline section, processes the data element, and passes it downstream to the next immediately downstream or third pipeline section. Simultaneously, in a true multi-processing environment, the first pipeline section obtains the next data element from the source of data and outputs it to the second pipeline section while the second or downstream-adjacent section of the data pipeline processes the data element received from the first pipeline section and outputs it to the third section of the data processing pipeline. Accordingly, as each data element is processed by one of the pipeline sections, it is output to the next downstream section until it is output to the host application.
In this way, the data can be efficiently and quickly processed in the multi-processing environment by associating one processing operation or processor with each section of the pipeline. This ensures that the pipeline is able to process the data block with the data through-put being limited only by the least efficient pipeline section and the inefficiencies caused by the overhead.
In contrast, in a single-processing environment, a variety of methods for processing the data are available, although all them are extremely inefficient. For example, each data processing operation is applied to every data element of the data block before any other data processing operation is applied to any data element of the data block. That is, every element of the data block must be processed before the next data processing operation is performed on any of the already-processed data elements. Thus, the efficiency of the single tasking data processing operation is proportional to the efficiency of each data processing operation.
Therefore, because data elements continually move from one data processing section of a pipeline in a multi-processing environment, the necessary overhead required to manage the memory allocation of the data block is small in comparison to a single-processing environment, where each data element must be repeatedly read from and written to memory. That is, the overhead required to manage the necessary memory is substantial in a single-processing environment.