With the emergence of the Internet and the interconnection of devices utilized in most every aspect of modern life, a wide range of data has become available of an almost limitless diversity. However, because of the sheer volume of data, a user wishing to locate data of a particular type may not be able to interact with the desired data in an optimized manner. Even if a user is able to locate data of interest, locating related data may be difficult. To address these limitations, data may be processed and linked to indicate related data among various sets of data. Therefore, a user interacting with linked data may efficiently locate and access related data.
Internet content may be thought of as data that has intrinsic value to a subset of users of web sites, internet client devices, and the like. This data can be configured to more efficiently address and therefore be of greater value to the subset of users. In many cases, this greater value is created as a result of some type of data processing, typically in the form of a sequence of stages, which may be implemented through use of a pipeline. A pipeline is a workflow process that includes multiple stages, which may provide processing of sets of data, such as combining multiple sets of data into a single set of data through interlinking related data, and the like. Often, an output of a stage of a pipeline will serve as input to multiple subsequent stages, each of which may represent a beginning of a new pipeline and/or a continuation of the same pipeline.
Because of the wide range of data available from the Internet, systems utilizing a large number of pipelines may be utilized to process the data through use of the various stages. In some systems, for example, pipelines are interconnected with other pipelines through interconnected stages, resulting in a large and intricate system of pipelines, such that execution of the pipelines demands a significant amount of computer resources. Execution of pipelines may include executing data processing services included in stages of the pipeline, such as interlinking related data, and the like.
One previous method utilized to address pipeline operation was through a “schedule driven” model. In such an instance, developers made a best guess of time needed for each stage to execute to arrive at an expected execution time. Stages were then operated based on the expected execution time of previous stages. Using a schedule driven model, a stage in a pipeline processed data and transferred the processed data when a subsequent stage was available to process the data. Therefore, the stage was dependent on whether the subsequent stage was available to processed data. Additionally, the schedule driven model is problematic because situations may be encountered which do not conform to the expectations, which may be based on performance assumptions made when constructing a pipeline. Therefore, a pipeline using a schedule driven model requires support systems and provision for manual start of stages for unscheduled instances. Use of a schedule drive model may be software, hardware and user intensive, and therefore consume valuable resources.
Therefore, it would be desirable to provide a system and method for asynchronous pipeline operation.