1. Background and Relevant Art
Computer systems and related technology affect many aspects of society. Indeed, the computer system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of tasks (e.g., word processing, scheduling, accounting, etc.) that prior to the advent of the computer system were performed manually. More recently, computer systems have been coupled to one another and to other electronic devices to form both wired and wireless computer networks over which the computer systems and other electronic devices can transfer electronic data. Accordingly, the performance of many computing tasks is distributed across a number of different computer systems and/or a number of different computing environments.
Data provided to computer systems can come from any number of different sources, such as, for example, user input, files, databases, applications, sensors, etc. In some environments, computer systems receive (potentially large volumes of) data from a variety of different domains and/or verticals. Data can also be received in a variety of different formats.
Data provided to computer systems is often accessed an extract, transform, and load (ETL) technique. ETL refers to a process that extracts data from data sources, transforms the data to fit operational needs, and loads the data into an end target. ETL systems can be used to integrate data from multiple varied sources, such as, for example, from different vendors, hosted on different computer systems, etc.
ETL is essentially an extract and then store process. Prior to implementing an ETL solution, a user defines what (e.g., subset of) data is to be extracted from a data source and a schema of how the extracted data is to be stored. During the ETL process, the defined (e.g., subset of) data is extracted, transformed to the form of the schema (i.e., schema is used on write), and loaded into a data store. To access different data from the data source, the user has to redefine what data is to be extracted. To change how data is stored, the user has to define a new schema.
ETL is beneficially because it allows a user to access a desired portion of data in a desired format. However, ETL can be cumbersome as data needs evolve. Each change to the extracted data and/or the data storage results in the ETL process having to be restarted.