1. Field of the Invention
The invention relates to a data constructing method and system, and more particularly to a method and system for constructing data tag based on a concept relation network.
2. Description of the Related Art
With the popular application of communication networks and information technology, digital documents are being produced and accumulated at a much faster pace, resulting in increased issues and requirements for management, organization, access, and utilization of the digital documents. As a result, “Automatic Information Organization and Subject Analysis for Digital Documents” and “Text Knowledge Discovery” are provided, comprising information retrieval, natural language processing, machine learning, and the like.
Knowledge discovery (KD) technology empowers development of next generation database management and information systems through the abilities to extract new, insightful information embedded within large heterogeneous databases and to formulate knowledge.
Knowledge discovery comprises data mining and text mining due to different data characteristics. Data mining is used for structured data, wherein each piece of data comprises a common field to be recorded in a database. Text mining processes unstructured data, which have no applicable structures between each piece of data. Knowledge discovery collects data, sorts the data, transforms the data, performs the mining processes, and represents and analyzes results using association, classification, clustering, summarization, prediction, and sequence analysis.
Based on the different data characteristics, data mining and text mining provide different steps and process details. Data mining is the principle of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but it is increasingly being used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. It has been described as “the nontrivial extraction of implicit, previously unknown, and potentially useful information from data” and “the science of extracting useful information from large data sets or databases”. Text mining, sometimes alternately referred to as text data mining, refers generally to the process of deriving high quality information from text. High quality information is typically derived through the dividing of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluating and interpreting of the output. ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
Data mining and text mining can be further applied to construct data tags, for example, browsing tags for the Internet search, which is hierarchical concept space construction. The hierarchical concept space construction is applied to collaborative tagging of Folksnomy classification, constructing hierarchical concept space by estimating the relation intensity between tags.
Convention tag construction methods comprise drawbacks described as follows. “Tag Organization Methods and Systems” result in more maintenance cost. “Visual Tags for Search Results Generated from Social Network Information” does not provide a weighting concept, resulting in difficult searches. With respect to “Improving Search and Exploration in The Tag Space for Automated Tag Clustering”, the numeric is not normalized such that maintenance cost for a tree structure is greater.
As described, the invention provides a method and system for constructing data tag based on a concept relation network, maintaining concept space by increment to reduce cost time and system resource of recalculation of tag count values, relations, and weightings.