Data analytics refers to techniques and processes for qualitatively and quantitatively evaluating data so as to create value (i.e., by enhanced productivity or obtaining insight into a business). Often, extremely large and complex data sets (referred to as big data) may be generated based on an item of interest, such as the functioning of a business or market segment. Through data analytics, information is extracted, categorized, and processed so as to obtain insight and value from the data, such as by identifying meaningful or significant patterns and trends.
A number of different data analytics tools and techniques currently exist. Generally, these tools are in the form of computer-executable software programs. Due to the amount of processing required to perform a meaningful analysis on big data, it is simply impractical (or impossible) for a human to perform data analytics without the assistance of a computational device. As a result, existing data management and analytics solutions are hard-coded into separate software platforms, such as Greenplum, Amazon Redshift, Spark Streaming, Spark with Scala, and PostgreSQL. Each of these platforms provides valuable tools for performing data analytics, but software provided by different vendors is generally incompatible. Users typically create workflows that perform numerous operations using a given platform, enabling the user to perform the same type of analysis repeatedly on different data sets. This requires that the user understand the interface of the user's chosen software platform, and often requires that the user be capable of writing executable code (i.e., in the form of a scripting language) to accomplish necessary tasks using the platform. As a result, users must spend significant time learning and becoming comfortable working with each separate software platform that they wish to use.
However, the various software platforms offer different features and may be provided at different prices. As a result, users periodically desire to use a new software platform, or a platform with which they have not previously worked. In order to migrate from one software package to another, the user must recreate their entire workflow (which typically involves obtaining assistance from a programmer to write unique code compatible with the new software package). Further, it is practically impossible for a user to utilize multiple software products in a single workflow, as hard-coded translation scripts are needed for each separate software package. The user must either be capable of programming the scripts him or herself or obtain assistance from a developer. Significant programming time is required to create or modify each such composite workflow. Such hard-coded and fragmented solutions are difficult and expensive to maintain and are exposed to technology and skill obsolescence risks. For example, a user must devote significant time to learning new software platforms, or spend money to hire someone who is competent in the new platform. Hard-coded workflows are difficult to migrate to newer technologies that may become available in the future, particularly if the original creator of the workflow (i.e., the programmer) is not available or does not recall how the workflow was created.
Accordingly, a need exists for a single, comprehensive technological solution providing both data management (i.e., data acquisition, data integration, data quality, business rules and data governance, etc.) and analytics (i.e., reporting and analytical models) tools enabling the integration with multiple proprietary software platforms provided by multiple vendors.