The present invention relates generally to the field of data processing, and more particularly to a mainframe interface for an extract, transform, and load (ETL) process.
ETL (extract, transform, and load) systems facilitate extracting data from various sources, transforming the extracted data to fit operational requirements, and loading the transformed data into a data repository, such as a database at a target location. In many cases, the extracted and accumulated data is in a different format than what is needed in the target data repository. The process of acquiring this data and converting the data into useful, compatible, and accurate data is referred to as an ETL process, as in extract, transform, and load.
In an ETL process, the extract phase acquires data from the source system(s). Data extraction can be as simple as copying a flat file from a database or as sophisticated as setting up interdependencies with remote systems which supervise the transportation of source data to the target system. The extracted source data is often temporarily stored as one or more relational database tables. The transform phase in the ETL process is typically made up of several stages and includes parsing data, converting data formats, and merging extracted source data to create data in a format suitable for the data repository, or target database(s). The load phase of the ETL process includes depositing the transformed data into the new data store (e.g., the data repository, Warehouse, mart, etc.). The target database may be located on the same local computer as the data source, on a separate computer from the data source, or on a remote system, such as a mainframe computer.
Mainframe computers process large amounts of data, such as census information, industry and consumer statistics, and financial transactions. Current mainframe computers are defined by the redundancy of their internal design, extensive throughput capabilities, and backward compatibility with older software. Mainframe computers utilize proprietary operating systems for running applications, such as data processing. The operating system of a mainframe computer may include an interface for file transfer protocol (FTP) functions. FTP is a part of a standard transmission control protocol (TCP) used to transfer data sets and files between a client host computer and a remote host server computer running an FTP server application over a network. FTP users may authenticate with a username and password, or connect anonymously depending on the mainframe server configuration.