Analytical methods may be used to extract meaningful trends and patterns from sets of data. Business Intelligence (BI) Analytics refers to analytical methods, as applied by business enterprises, to extract trends and patterns from large datasets. These trends and patterns may subsequently be used to inform future business decisions.
The datasets considered by BI analytical methods may consist of hundreds of thousands, or millions of data points. One example of a data set considered by these analytical methods may be a record of user clicks on a website, over a given time period. In this example, the use of BI analytics may extract trends from clicking patterns to establish, in one instance, when a given user is most likely to be receptive to advertisements placed on the website. Due to the sheer number of data points, a data stream processed by BI analytical methods may measure several terabytes to several petabytes or more, and due to the storage space requirements, such datasets are often referred to as “big data.”
Conventional analytical methods and processes for analyzing and storing “big data” may be ineffective, due to the size of the datasets, and the associated memory requirements. Accordingly, several companies have specialized in building software, and supporting hardware, to receive, store, and analyze large datasets. One such company is Teradata® Corporation, which produces data warehousing software solutions. A data warehouse, also referred to as an enterprise data warehouse, is a repository of data, whereby a stream of raw, unprocessed data, or previously-processed data, is received by a data warehouse, and processed by, in one implementation, Extract Transform and Load (ETL) processes, before being stored. ETL refers to the extraction of data from a source, the transformation, or formatting of the data, and the loading, or storing of the data. Such commercially available solutions may thus be referred to as proprietary solutions. Open-source solutions may be available from, e.g., The Apache Software Foundation. Open-source solutions may be associated with a plurality of open-source software and supporting hardware solutions. Proprietary solutions and open-source solutions may each offer their own unique advantages.
Apache Hadoop® is an example of an open-source solution framework that facilitates the use of distributed hardware for parallel processing of large datasets. Apache Hive is an example of an open-source data warehouse that expands upon the Hadoop® framework. Open-source solutions may include unique functionality relative to proprietary solutions. This functionality may include, for example, large scalability and expansion to include increased computational resources, such that the solution may be scaled for use with large collections of computer server clusters. Open-source solutions may also offer various cost savings due to their ability to run processes on non-specialized, commodity hardware and their ability to be implemented and utilized using a variety of programming languages. It will be appreciated, however, that proprietary solutions may similarly offer some or all of these advantages as well. Accordingly, an enterprise may choose to implement both open-source and proprietary solutions to analyze big data.
Although open-source and proprietary solutions each offer various advantages, individual solutions may implement one or more unique formats, protocols, and the like. As a result, open-source solutions may not be compatible with proprietary solutions and vice versa. An enterprise, however, may wish to exchange data between an open-source solution and a proprietary solution. As an example, an enterprise may desire for a proprietary data warehouse solution to be able to communicate with an open-source data warehouse solution and vice versa. As another example, a business may wish that employees familiar with a proprietary solution to be able to perform similar tasks on an open-source solution and vice versa.
Therefore, a need exists to establish compatibility between open-source data warehouse solutions and proprietary data warehouse solutions in order to utilize the unique advantages provided by each type of solution.