As the global population progresses rapidly in getting online and media-enabled, corresponding volumes of “cross-connections” between data points managed by an enterprise (e.g., data received from Social media sources, professional media sources, organizational data repositories, and the like) will grow rapidly. The corresponding data graphs, including connected edges and vertices, grow super-exponentially as the number of data points increases. This data growth poses a huge problem for enterprises in providing efficient yet practical methods for managing risk, analyzing large amounts of data and forming predictions based on the large volumes of both enterprise generated data and external-sourced data. Often, this rapid growth makes data management intractable for analyzing data and forming predictions based on the analyzed data in regards to critical business functions.
Currently, many challenges exist for real-time processing of large-volume data repositories (e.g., big data repositories), particularly in generating predictive models based on one or more data mining algorithms. For example, the predictive analytics that may be used for processing (e.g., predictive models, and the like) large-volume data repositories when transforming data into one or more user presentations as the large volume data repositories are most often performed using data silos, relational databases or other non-big data technologies (e.g., columnar databases, SQL appliances and the like) into user presentations. These non-big data technologies lack variety of data and may be limited to structured data analysis, opposed to text data analysis and/or log data information, and may require expensive data management support. Further, these non-big data technologies may still pose strict limits on data growth rates and/or volumes of data that may be analyzed, stored or otherwise transformed. As such, a need has been recognized for a big data processing system for processing large volumes of data in near real time to perform complex graph-based real-time business analytics using big data processing solutions (e.g., an open source cluster computing framework, a proprietary cluster computing framework, an open source graphing API for use with the open source cluster computing framework, and the like).