The present invention relates generally to the field of information mining and, more specifically, to identifying emerging concepts in unstructured text streams.
Identification of emerging trends in unstructured text streams is an important area of interest because of the vast amount of data created daily on the world wide web, in particular in web logs (blogs). Automatically identifying emerging concepts is the fastest way to identify these trends. Mining such data to detect emerging trends that are relevant to an individual or organization is a rapidly growing industry.
Prior art approaches to detect emerging trends in text articles such as blogs have focused on detecting increased frequency of words or phrases (features) within recent blogs when compared to older blogs. These word or phrase features are typically presented to the user as new “events”. One weakness of this approach is that it may result in a very large collection of such words or phrases, with underlying events and even articles repeated across features. Also, the events are labeled with just a word or phrase feature, providing little contextual information about the event—like a new event within a larger, continuing event.