A recent database system becomes to handle increasingly a large amount of data such as, for example, news data, client information, stock data, etc.. Use of such databases become increasingly difficult to search desired information quickly and effectively with sufficient accuracy. Therefore, timely, accurate, and inexpensive detection of new topics and/or events from large databases may provide very valuable information for many types of businesses including, for example, stock control, future and options trading, news agencies which may afford to quickly dispatch a reporter without affording a number of reporters posted worldwide, and businesses based on the Internet or other fast paced actions which need to know major and new information about competitors in order to succeed thereof.
Conventionally, retrieval, detection and identification of documents in enormous database is expensive, elaborate, and time consuming work, because mostly a searcher of the database needs to hire extra persons for monitoring thereof.
Recent retrieval, detection and identification methods used for search engines mostly use a vector space model for data in the database in order to cluster the data. These conventional methods generally construct a vector f (kwd1, kwd2, . . . , kwdn) corresponding to the data in the database. The vector f is defined as the vector having the dimension equal to numbers of attributes, such as kwd1, kwd2, . . . , kwdn which are attributed to the data. The most commonly used attributes are keywords, i.e., single keywords, phrases, names of person (s), place (s), and time/date stamp. Usually, a binary vector space model is used to create the vector f mathematically in which the kwd1 is replaced to 0 when the data does not include the kwd1, and the kwd1 is replaced to 1 when the data include the kwd1. Sometimes, a weight factor is combined to the binary model to improve the accuracy of the search. Such a weight factor includes, for example, appearance times of the keywords in the data.