The present invention relates to a computer for big data analytics using graph data, and a graph data generation method therefor.
Big data analytics in which useful knowledge (information) is extracted by using a vast amount of data (big data) obtained from the Web or sensors, or the like, has been gaining attention. Big data analytics is designed to extract, as knowledge, correlations and patterns of the items hidden within the data by applying data analysis techniques including statistics, pattern recognition and artificial intelligence, or the like to the vast amount of data in a comprehensive manner. Big data analytics is sometimes referred to as data mining as it mines underlying information hidden in data. Techniques for big data analytics include, for example, correlation analysis, regression analysis and principle component analysis used in statistics, and pattern recognition, machine learning and clustering.
In order to obtain useful knowledge in big data analytics, the vase amount of data needs to be analyzed. However, as an amount of data to be analyzed becomes larger and the methods for data analysis become more complicated, processing time and memory usage or the like would generate an excessive amount of load imposed on hardware resources, which is problematic. In particular, in the fields of social infrastructure, it has been expected to output results efficiently in a limited amount of time by using the limited resources.
For example, basic statistical data analysis techniques such as the correlation analysis and the principal component analysis generate indicators (feature amount, item) from big data, and obtain a correlation between the indicators. At this point, the correlation that includes m number of indicators will be given as an m-by-n correlation matrix, and the correlation analysis and the principal component analysis will be executed by the operation of the correlation matrix. However, there is needed to store the data for all of the elements, because the matrix operation will be executed with respect to all elements. Accordingly, a system that handles big data may perform substantially inefficiently in terms of calculation amount and memory usage. As a result, storing and calculation process of the big data (correlation matrix) having a large number of indicators make large loads to the hardware resources.
As for methods to compress and efficiently process big data, U.S. Unexamined Patent Application Publication No. 2001/0011958 A discloses a technique therefor. The U.S. Unexamined Patent Application Publication No. 2001/0011958 A discloses a technique to reduce the cost of communication and storing of data for converting big data by using a multivariate data analysis method, and compressing and reconfiguring the big data. The method disclosed in the U.S. Unexamined Patent Application Publication No. 2001/0011958 includes a step for acquiring a m-by-n correlation matrix from original data of an m item in n row, a step of obtaining an eigenvalue and an eigenvector of the correlation matrix, a step of obtaining the matrix of the factor loading from the eigenvalue and the eigenvector, a step of generating a l-by-p random matrix, a step of obtaining an l-by-m intermediate data matrix by multiplying the random matrix by the factor loading matrix, and a step of obtaining a l-by-m data matrix that is reconfigured by scaling the intermediate data column for n number of sample and m number of indicators. The technique capable of reducing the cost of communication and the storing data by reconfiguring data is disclosed in the U.S. Unexamined Patent Application Publication No. 2001/0011958 discloses.