1. Field of the Invention
The present invention generally relates to data processing and, more particularly, to managing execution of a workflow in successive workflow runs.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, in a DBMS, a requesting entity (e.g., an application, the operating system or an end user) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL). Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data, and so forth.
One of the issues faced by data mining and database query applications, in general, is the manner in which the data is processed prior to being presented to the end user. A number of software solutions support the use of multiple functional modules to process data as desired by the user, but management of functional modules execution is difficult. For example, a query building tool will present the user with a list of functional modules that aid in building queries and analyzing query results. Often times, execution of numerous functional modules are needed to compile the data in the desired state. Unfortunately, the selected functional modules need to be invoked individually by the user. This can be a very inconvenient and inefficient process for invoking multiple functional modules.
Current workflow technology provides the ability to call multiple functional modules in a specified order, but there is an accompanying drawback: users are required to perform data transformation each time data is passed from one functional module to another. For example, if four functional modules, FM1, FM2, FM3, and FM4 are called (respectively) and each successive functional module depends on a result set produced by the functional module executed immediately prior to it, data transformation would need to be performed by the user three separate times: between FM1 and FM2, FM2 and FM3, FM3 and FM4.
Users typically employ two methods for performing the data transformation. One method comprises creating a custom program, or functional module, for extracting data from the result set produced by the first functional module and then formatting it in accordance with the requirements of the next functional module to be executed. For example, a custom program would be used to transform the result set produced by FM1 and prepare the data to be passed as input to FM2. Of course, this would need to happen with data produced by FM2, and again with FM3's result set. Another method consists of utilizing mapping tools to allow for the mapping of data fields from one program to the next. For example, the mapping tool would allow the user to map the output fields of FM1 to the input fields of FM2. The fields are mapped by users prior to execution of the programs. At runtime, data is transformed per the field mapping definitions. Both of these methods for performing data transformation are cumbersome and inefficient to use and depend heavily on user interaction.
Another shortcoming of the prior art, is the manner in which repeated executions of functional modules in multi-step workflows are managed. By way of example, assume that FM1 is repeatedly executed for a given input, IP1, and produces the identical result set RS1 each time. That is, execution of FM1 is absolutely deterministic in that it produces the same result set for the same input. Despite this level of determinism, FM1 is, nevertheless, executed each time it is invoked and takes IP1 as input. This can be very unproductive and inefficient, particularly if each execution of FM2 is complex and requires a substantial amount of processing resources and time. Moreover, this frequently leads to user frustration, especially when running time-consuming multi-step workflows requiring execution of a large amount of functional modules.
Therefore, there is a need for a technique for managing repeated executions of functional modules in multi-step workflows.