1. Technical Field
The present teaching relates to the methods, systems and programming for classifying information. Particularly, the present teaching is directed to methods, systems, and programming for classifying information based on content over time, such as assignment of identifiers.
2. Discussion of Technical Background
The advancement of the Internet has made it possible to make a tremendous amount of information accessible to users located anywhere in the world. With the explosion of information, new issues have arisen. First, much effort has been put into organizing the vast amount of information to facilitate the search for information in a more effective and systematic manner. Along that line, different techniques have been developed to automatically or semi-automatically categorize content on the internet into different topics and organize them in an, e.g., hierarchical fashion. Imposing organization and structure on content has led to more meaningful search and has promoted more targeted commercial activities. For example, associating a piece of content with a designated topic identifier often greatly facilitates the presentation of information that is more on the point and relevant. However, variations in content or changes to incoming data, such as misspellings or content not previously associated with a topic, may disrupt the efficacy of the existing systems for categorizing data.
An important issue has to do with how to identify useful information out of massive amounts of available content in order to link different pieces of information in a more meaningful manner. For example, certain processing and enriching systems commissioned with the task of identifying relationships between pieces of information content take in source objects from various feeds, finds duplicates, and merge them to create a composite object. These composite objects are often associated with an identifier. Although system users expect identifiers will persist over time—meaning that over time all new information related to a certain composite object will be properly categorized as such—this is not the case. New information content relevant to existing content is constantly being created and existing solutions fail to incorporate such changes into existing categorization systems. Whereas information may be categorized through content processing and deduplication, there is presently no solution to maintain deduplication decisions over time.