The present disclosure relates generally to optimizing job schedules in a computing environment and, more particularly, to scheduling the execution of computational jobs based on the time dimension of reports dependent on such jobs.
Many large enterprises utilize a data warehouse to store consolidated business data to facilitate enterprise reporting, analysis and decision making processes. A data warehouse provides a sanitized repository of current and historical details for analytics, data mining, strategic planning, and reporting. Data generated by an enterprise's internal operations may be stored in the data warehouse and thereafter moved to domain-specific data marts to help generate analytical business intelligence (BI) reports. The BI reports may provide information about important trends, risk exposure, liabilities and assets, for example.
The flow and transformation of information from the operational systems to the BI reports via data warehouses and data marts can be very complicated. The data will need to flow through data warehouses, staging databases, extract, transform, and load (ETL) processes, intermediate files, online analytical processing (OLAP) layers, data marts, file transfers and operational data stores reporting layers. The OLAP enables the end-user tools to translate the data into BI reports via a series of interdependent flows and processes.
A so-called extract, transform, and load tool (e.g., IBM InfoSphere DataStage®) may be used to create one or more ETL jobs to extract target data from operational systems and place the extracted data in a data warehouse, and further manage data movement from the warehouse to a data mart. Developing the warehouse, populating it, moving the data to a data mart and then creating the necessary BI reports, using a BI tool, are large and complex projects.
Typically, many dozens of human operators or software developers are needed to develop, test and maintain the related ETL jobs and BI codes that are needed to produce the final reports. In addition, business analysts, data stewards, data modelers, enterprise architects and project managers dedicated to the reporting project may be needed. All these, combined with the ETL and BI developers result in very large teams of human operators.
Since ETL jobs have to run in a timely manner to ensure the BI reports are generated based on up-to-date information, a common scheduling approach is to run every single ETL job in the system very frequently (e.g., every night). Running many thousands of jobs on such frequently scheduled intervals requires a significant amount of resources and is time consuming