1. Field of the Invention
The present invention generally relates to data processing and, more particularly, to managing execution of data driven workflows.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, in a DBMS, a requesting entity (e.g., an application, the operating system or an end user) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL). Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data, and so forth.
One of the issues faced by data mining and database query applications, in general, is the manner in which the data is processed prior to being presented to an end user. A number of software solutions support the use of multiple functional modules to process data as desired by the user, but management of such functional modules is difficult. For example, a query building tool may present the user with a list of functional modules that aid in building queries and analyzing query results. Often times, execution of numerous functional modules are needed to compile the data in the desired state. Unfortunately, the selected functional modules need to be invoked individually by the user. This can be a very inconvenient and inefficient process for invoking multiple functional modules.
Current workflow technology provides the ability to call multiple functional modules in a specified order. A workflow is a defined set of collective steps that are driven in a programmatic environment. Each step is associated with execution of a corresponding functional module in this environment based on processing of a previous step. The steps are well defined and connected by a specific process. The specific process is defined in the workflow by programmatic logic. An example of a workflow might be purchase order processing affecting production and warehouse management, payment discounts being fully exploited in the accounts payable process, and customer requests being completed on time because of complex quotation procedures.
One shortcoming of the current workflow technology is the management of data driven workflows. In a data driven workflow a given step is only connected to another step by the result set obtained upon execution of the given step, i.e., by the data itself. For instance, assume a workflow for auditing data, providing statistics, profiling and normalizing the data. Here, the different steps are not connected by specific processes. Thus, in order to connect these different steps with each other, a user would need to include corresponding programmatic logic in the workflow. This, however, requires knowledge of the internal processing of the workflow and depends heavily on user interaction. Furthermore, a significant amount of time can be required to create the programmatic logic.
Therefore, there is a need for an efficient technique for managing execution of data driven workflows.