Analytical methods may be used to extract meaningful trends and patterns from sets of data. Business Intelligence (BI) Analytics refers to analytical methods, as applied by business enterprises, to extract trends and patterns from large datasets. These trends and patterns may subsequently be used to inform future business decisions.
The datasets considered by BI analytical methods may consist of hundreds of thousands, or millions of data points. One example of a data set considered by these analytical methods may be a record of user clicks on a website, over a given time period. In this example, the use of BI analytics may extract trends from clicking patterns to establish, in one instance, when a given user is most likely to be receptive to advertisements placed on the website. Due to the sheer number of data points, a data stream processed by BI analytical methods may measure several terabytes to several petabytes or more, and due to the storage space requirements, such datasets are often referred to as “big data.”
Conventional analytical methods and processes for analyzing and storing “big data” may be ineffective, due to the size of the datasets, and the associated memory requirements. Accordingly, several companies have specialized in building software, and supporting hardware, to receive, store, analyze, and schedule processing of large datasets. One such company is Computer Associates, Inc., which produces workflow controller software solutions, otherwise referred to as workflow automation tools, or workflow controllers. One such example of a workflow controller produced by Computer Associates, Inc. is Automation AE (AutoSys® Edition). A workflow controller is one or more processes, and associated hardware, for scheduling, tracking, and reporting, among others, on computational tasks, or computational processes executed by one or more computer systems that are in communication with the workflow controller. In one implementation, a workflow controller may schedule computational tasks to be carried out by one or more distributed computer systems, or one or more extract, transform, and load (ETL) processes to be carried out by an ETL processing module, and the like.
Commercially available solutions may be referred to as proprietary solutions. Open-source software solutions may be available from, e.g., The Apache Software Foundation. Open-source software solutions may be associated with a plurality of open-source software and supporting hardware solutions. Proprietary solutions and open-source solutions may each offer their own unique advantages.
Apache Hadoop® is an example of an open-source solution framework that facilitates the use of distributed hardware for parallel processing of large datasets. Apache Hive is an example of an open-source data warehouse that expands upon the Hadoop® framework. Apache Oozie is an example of an open-source workflow controller that also expands upon the Hadoop framework. Open-source solutions may include unique functionality relative to proprietary solutions. This functionality may include, for example, large scalability and expansion to include increased computational resources, such that the solution may be scaled for use with large collections of computer server clusters. Open-source solutions may also offer various cost savings to enterprises due to their ability to run processes on non-specialized, commodity hardware. Furthermore, open-source solutions may be implemented and utilized using a variety of programming languages. It will be appreciated, however, that proprietary solutions may similarly offer some or all of these advantages as well. Accordingly, an enterprise may choose to implement both open-source and proprietary solutions to analyze big data.
A data warehouse, also referred to as an enterprise data warehouse, is a repository of data, whereby a stream of raw, unprocessed data, or previously-process data, is received by a data warehouse, and processed by, in one implementation, ETL processes, before being stored. Although open-source and proprietary solutions each offer various advantages, individual solutions may implement one or more unique formats, protocols, and the like. As a result, open-source solutions may not be compatible with proprietary solutions and vice versa. As an example, a compatibility issue may arise if an enterprise desires to communicate open-source workflows to a proprietary workflow controller, or if an open-source workflow controller is to receive proprietary workflows from a proprietary workflow controller.
Therefore, a need exists to establish compatibility between an open-source workflow controllers and proprietary workflow controllers in order to utilize the unique advantages of each.