When internet becomes more and more important channel for acquiring all kinds of information, the total information storage on the internet is also high-speed expanded continually. Based on study of IBM, 90% of all the data acquired by total human civilization are generated in past two years. However, the data size generated by the whole world will reach 44 times of today in 2020. Therefore, with the rise of dependence on the internet information and internet application, people needs internet data mining service more and more (service extracting useful information from mass data or database) to improve and increase application efficiency of massive internet information continually.
Internet information includes different kinds of data types (such as text, picture, video, audio, structured data etc.), however, in internet information appeared in form of webpage, word information is not only the most important content, but also the basic tool facing other kinds of data of user organization. Therefore, the top priority of internet information data mining service is to extract data mining result valuable to user from all kinds of structured information (such as all kinds of structured summary), semi-structured information (such as website information, and non-structured information (such as linked plain text information).
The main technical feature of data mining is to perform extraction, transformation, analysis and other modeling processing on mass data in the database, and extract critical data assisting user to make decision. However, in massive internet words information, structured data is of biggest value density and smallest data volume, while semi-structured webpage information and non-structured (plain text) information are of smallest value density and biggest data volume. Therefore, the internet information data mining service is with huge market and big mining difficulty, and a system method with common practical value is not really formed.
The present invention provides a universal internet information data mining method for realizing full and systematic data mining from structured, semi-structured and non-structured word information on the internet and providing mining result valuable to user.