A relational database is a collection of related data that can be represented by two-dimensional tables of columns and rows wherein information can be derived by performing set operations on the tables, such as join, sort, merge, and so on. The data stored in a relational database is typically accessed by way of a user-defined query that is constructed in a query language such as Structured Query Language (SQL).
Often it is useful to extract data from one or more sources, transform the data into some more useful form, and then load the results to a separate destination. A data warehouse, for example, is a central repository for all or significant parts of the data that an entity's various business systems collect and store (often in separate databases), the purpose of the data warehouse being to support data mining, decision support systems (DSS), and other data actions. Data from various sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access.
In the context of a data warehousing, and more generally for managing databases, extract-transform-load (ETL) refers to three separate functions of obtaining, processing, and storing data. The extract function reads data from a specified source database and extracts a desired subset of data. The transform function works with the acquired data—using rules or lookup tables, or creating combinations with other data—to convert it to the desired state as defined by the specific ETL tool. The load function is used to write the resulting data (either all of the subset or just the changes) to a destination database. Various and diverse ETL tools can be used for many purposes, including populating a data warehouse, converting a database of a specific type into a database of another type, or migrating data from one database to another.
In general, ETL tools operate to perform the aforementioned simple three-step process: (a) the ETL tool extracts the data from the source; (b) the ETL tool transforms the data according to its predefined functionality; and (c) the ETL tool loads the data to the destination. However, while basic transformations can be achieved with simple ETL tools, complex transformations require custom development of new ETL tools with specific and complex functionality—an approach that is resource intensive. While simple ETL tools might have broader usability and thus naturally lend themselves to widespread reuse, complex ETL tools do not lend themselves to reusability due to their high customization and narrow utility (and thus the frequent need to custom develop complex ETL tools when they are needed).
What is needed in the art is a means for developing complex transformation functionality that is more efficient than custom development of complex transformation tools.