Field
The present disclosure relates to data analysis. More specifically, the present disclosure relates to a method and system for efficient data analysis for detecting information of interest.
Related Art
The exponential growth of computing power has made it possible to extract information of interest, such as shopping preferences, social media activities, medical referrals, and e-mail traffic patterns, using efficient data analysis. Such data analysis requirements have brought with them an increasing demand for efficient computation. As a result, equipment vendors race to build larger and faster computing devices with versatile capabilities, such as graph clustering, to calculate information of interest efficiently. However, the computing capability of a computing device cannot grow infinitely. It is limited by physical space, power consumption, and design complexity, to name a few factors. Furthermore, computing devices with higher capability are usually more complex and expensive. More importantly, because an overly large and complex computing device often does not provide economy of scale, simply increasing the capability of a computing device may prove economically unviable.
One way to meet this challenge is to increase the efficiency of data analysis tools used for extracting information of interest from a large and arbitrary data set. Increasing efficiency of data analysis of such a large data set can increase the complexity of the analysis tools, typically not suitable for large-scale real-life deployment. Hence, efficient data analysis techniques additionally require viability of real-life deployment.
Graph clustering is a tool for analyzing large data sets. Typically, the elements in the data set are represented as vertices and the relationships between the elements are represented as edges in a graph. Graph clustering finds clusters (i.e., groups) of similar elements such that the vertices in one cluster are more densely interconnected to each other than they are to vertices in other clusters. In this way, the data set can be viewed at a more abstract level and allows a computing device to determine the information of interest.
While graph clustering brings many desirable features to data analysis, some issues remain unsolved in efficiently obtaining information of interest from large and arbitrary data sets.