Designing a successful analytics product involves the consideration of a number of various factors including performance (e.g., response time), total cost of ownership, and/or availability of existing packages. Currently, there exists a number of different systems to support analytics such as distributed file systems (e.g., Hadoop systems), distributed systems that support an open source programming language (e.g., R systems), and systems that support in-memory technology. For example, the Hadoop system may be a system supporting massive parallelism with a relatively low cost of ownership, but may include limitations of shared nothing architecture, relatively lower performance compared with many modern in-memory analytics applications and a lack of query languages. The R system may be a statistical computing package with more than 3000 available packages, but with limited scalability in term of parallelism and handling large data set as well as lower performance since it is a disk-based system. The in-memory system may have the highest performance in term of response time. Also, the in-memory system may fully utilize multi-core infrastructure to ensure full parallelism of complex analytics, and may provide a flexible query language for database queries as well as capability to include non-SQL stored procedures of any kind, such as C/C++, R, and binary code that may invoke external systems. However, in-memory database systems are, in general, more expensive than other systems and have less new application/analytical packages available.
As such, these analytic systems have their own advantages and disadvantages. Generally, customers demand a system that has high performance with affordable prices. Building an analytics system using one of the above described systems restricts the system to the limitations of the underlying technology. For example, the Hadoop system has a lower cost of ownership but lacks of the high performance and language capability of the in-memory system, whereas building the system using in-memory system may not be cost-effective for processing large sets of raw data. Also, the R system may lack the capacity of processing large sets of raw data as well as the scalability/parallelism for high performance.