Recently, information processing systems are increasingly expected to handle large amounts of data such as, for example, news data, client information, stock data, etc. Users of such databases find it increasingly difficult to search desired information quickly and effectively with sufficient accuracy. Therefore, timely, accurate, and inexpensive detection of new topics and/or events from large databases may provide very valuable information for many types of businesses including, for example, stock control, futures and options trading, news agencies which could afford to quickly dispatch a reporter without being able to afford a number of reporters posted worldwide, and businesses based on the internet or other fast paced environments, which need to know new and important information about competitors in order to succeed.
Conventionally, detection and tracking of new events in enormous databases is expensive, elaborate, and time consuming work, because searchers of the database usually need to hire extra persons for monitoring thereof.
Most of the recent detection and tracking methods used for search engines use a vector model for data in the database in order to cluster the data. In vector space models, each document in the database under consideration is modeled by a vector, each coordinate of which represents an attribute of the document. Ideally, only those attributes that can help distinguish documents from one another during information retrieval are incorporated in the attribute space. In a Boolean model, each coordinate of the vector is zero (when the corresponding attribute is absent) or unity (when the corresponding attribute is present). Many refinements of the Boolean model exist. The most commonly used are term weighting models which take into account the frequency of appearance of an attribute (e.g., keyword) or location of appearance (e.g., keyword in the title, section header or abstract). In the simplest retrieval and ranking systems, each query is also modeled by a vector in the same manner as the documents.