In a data warehouse environment, a task is the smallest entity that can be run as a single unit. Clients in the data warehouse environment often need to run Extraction/Transformation/Loading (ETL) tasks in one or more job threads. Each job thread usually contains a group of related tasks that must be completed before the next group of tasks can be started, so that the waiting tasks in the next group will not run multiple times.
A task is responsible for data extraction, transformation, and loading to the data warehouse. Clients usually have a group of tasks to perform on a regular basis. As an example, a client (or client application) may have three regions to account for and manage: Western, Eastern, and Central regions. The data for each region is distributed into databases. The client wishes to build a data warehouse for the transaction data collected during the day. Data must be pulled from each region, mapped, and processed. The results are then stored in the data warehouse. If the results of the Western region transactions depend on those of the Eastern region, the Eastern region data must be processed first and a dependency must be built into the processing instructions.
The client needs to carefully plan and manually link these related tasks in a group or across related groups with the required run conditions to keep from loading the data warehouse with redundant data. Grouping of tasks becomes a very tedious, manual task for the client, which is rendered even more complicated when groups of tasks are scheduled to run with multiple overlapped schedules.
Data warehouse clients need a relatively easy way to use a graphical client interface that allows them to group related tasks with predetermined run conditions and to run the grouped tasks automatically without running the same tasks multiple times. The need for such a system has heretofore remained unsatisfied.