The present disclosure relates to systems that run Extract, Transform, and Load (ETL) jobs and, more particularly, to a performance checking component for an ETL job.
Extract, Transform, and Load (“ETL”) jobs extract information from various sources, transform the data, and store it in one or more targets. For example, an ETL job may extract data from various applications and databases. The source systems may be managed and operated by different functional entities within an enterprise, which use their source systems for different purposes. The sources may be diverse systems hosted on different hardware located at multiple locations. In addition, the sources may organize and store data in different formats. After extracting data, the ETL job typically transforms the data into a single, homogeneous format. The data may be transformed using a variety of rules, functions, and algorithms. The transform stage may include a variety of steps that are hosted on different hardware. Once transformed, the transformed data is loaded to one or more targets, such as a database or data warehouse. Where there is more than one target, the targets can be hosted on different hardware located at multiple locations.
ETL jobs are complex. Because of this complexity, and the multiple, distributed components, it can be difficult to identify sources of performance problems with an ETL job.