Organizations desire access to data in order to function efficiently. Advances in computer networks, data processing and data storage have greatly enhanced the ability to assemble large repositories of data and use this data for strategic planning, operational management and tactical decision making. Large organizations or enterprises such as, for example, corporations, government agencies and private institutions often obtain data from many sources including database systems of record such as internal systems, transaction databases, accounting records, sales records, customer databases and/or third-party data providers. Moreover, most organizations spend a significant portion of their operating budget on human and information technology resources to maintain data and provide information technology solutions that provide access to the data.
Due to the volume, complexity and importance of data typically relied upon by most every function of an enterprise, data management organizations (DMOs) play a critical role in the success of most modem enterprises. Typically, among a DMO's most challenging and resource intensive tasks is integrating data from various sources and different formats into a set of production data which may include centralized databases and other data sources. The production data should be technically valid, internally consistent, stable and reliable. More importantly, production data should be accessible in a form that is valid and useful to the enterprise and capable of interfacing with applications which may include a variety of software modules, business analysis tools and information systems. As the number of data sources maintained by a DMO grows, so does the expense and complexity of maintaining production data.
The typical marketing department is an example of an organization within an enterprise that relies heavily on real-time, high-quality production data. In order to plan and execute effective campaigns, marketing departments access data that may be generated internally or acquired from third-party sources. For example a typical marketing department may desire access to industry data, sales records, customer data, customer survey data, government regulations, competitor information, partner data, and the like. This information is often time sensitive, so an organization without real-time or near real-time access to data often fails to accomplish its goals. For example, a marketing department that is trying to take advantage of a favorable market condition may miss the opportunity to advertise effectively without relevant and accurate data on target customers.
A typical method of managing and integrating data from multiple sources is commonly known by the acronym ETL which stands for “extract,” “transform” and “load.” ETL is a set of methods used by DMOs to gather data from one or more data sources (extract), manipulate the data into a valid and useful format (transform) and put the data into production databases (load) where the data is accessed and manipulated by the organization's various information technology resources and applications. However, existing ETL systems and processes often fail to deliver timely, accurate and relevant data to meet an enterprise's needs. Therefore, a long-felt need exists for a system to reduce time and costs associated with integrating new data sources into an enterprise data architecture.