The present invention generally relates to the information processing technology field, and in particular, to a computer processing method and system for network data.
Nowadays, as information technology, especially network technology, develops, information is transferred between respective information nodes, so lots of such network data reflecting the relation between information nodes exists on the network. With respect to the large amounts of network data and network data of large scale, there are many technical analysis requirements now, i.e., how to find the relationship between these information nodes, for example, detecting nodes having abnormal behavior from the network, or filtering junk e-mails, and so on.
However, when processing large scale network data including lots of nodes, for example when the nodes relating to network data to be processed reach 105 or larger, the existing technology seems to be inadequate, and even helpless. FIG. 1 shows performance estimation for a community detection method which has been a technical hotspot now (for details, see reference document [1] Y. Zhang, J. Wang, Y. Wang, L. Zhou. Parallel Community Detection on Large Networks with Propinquity Dynamics. ACM SIGKDD '09 (PP:997-1005), expressly incorporated herein by reference in its entirety for all purposes), the data set processed by which being three-month post records of some Bulletin Board System (BBS) website, in which the relationship between users is established by replies to a post. This method is implemented and run on Hadoop MapReduce flat, which is composed of a total of six X86 cluster machine nodes, of which the average CPU is dual-core 1.66 G, and average memory is 4G. From FIG. 1, it can be found that when the number of users increases to 0.2 million, the processing time rapidly increases to more than 27 hours, and if the data scale continues to grow, the processing time increases exponentially, so obviously, utilizing the above method cannot process them.
Thus, it is desirable to provide a computer processing method and system for network data.