In recent years, along with development and widespread use of computers, various data have been computerized. The computerized data are utilized in various industries. For example, there have been proposed marketing research conducted based on data obtained by computerizing product purchase behavior, and prediction of stock price fluctuations based on data obtained by computerizing economic indicators and the like. However, if there are enormous amounts of computerized data, it is difficult to properly select desired data only. Accordingly, technologies such as data search have been conventionally used.
The following documents are considered:                [Patent Document 1] Japanese Patent Laid-Open Publication No. 2003-256477        [Non-Patent Document 1] V. Pestov, On the geometry of similarity search: dimensionality curse and concentration of measure, Information Processing Letters, 73:47-51, 2000        [Non-Patent Document 2] Michael E. Houle, SASH: a spatial approximation sample hierarchy for similarity search, IBM Tokyo Research Laboratory Research Report RT-0446, 18 pages, Feb. 18, 2002        [Non-Patent Document 3] “To the world of randomized algorithm, information processing based on probabilistic algorithm” written by Osamu Watanabe, homepage on the Internet “http://www.statp.is.tohoku.ac.jp/kazu/SMAPIP/2003/tutorial/index.html”        [Non-Patent Document 4] R. Motwania and P. Raghavan, Randomized Algorithms, Cambridge, 1995        [Non-Patent Document 5] Y. Yang and X. Liu, A re-examination of text categorization, Proc, Of the 22nd Annual International ACM SIGIR Conf. On research and development in Information Retrieval, Morgan Kaufman, 1999        [Non-Patent Document 6] Michael E. Houle, Navigating Massive Data Sets via Local Clustering, IBM Tokyo Research Laboratory Research Report RT-0518, 15 pages, Mar. 5, 2002.        [Non-Patent Document 7] G. Salton, The SMART Retrieval System-Experiments in Automatic Document Processing, Prentice-Hall, Englewood Cliffs, N.J., USA, 1971.        [Non-Patent Document 8] Chavez, E., Navarro G., Baeza-yates, R. and Marroquin, J. L, Satisfying general proximity/similarity queries with metric trees, Inf. Proc. Lett. 40, pp. 175-179, 1991        [Non-Patent Document 9] Navvarro, G., Searching in metric spaces by spatial approximation, IN Proc. Of String Processing and Information Retrieval, pp. 141-148, 1999        
As a basic issue of the data search, there has been heretofore known k-nearest neighbor search for searching k pieces of data adjacent to a given query. In the k-nearest neighbor search, if there is a massive set of data to be searched, or if the number of dimensions of parameters indicating properties of data is large, an enormous computation time is required to accurately obtain k pieces of data nearest to the query. Thus, there has been proposed an approximate solution method for approximately processing the k-nearest neighbor search in a realistic computation time (see Patent Document 1, Non-Patent Document 2, and Non-Patent Document 6). For example, the technology of Patent Document 1 has been proposed as an effective method for “dimensionality curse” described in Non-Patent Document 1.
Meanwhile, there has been known an amplification technique as a method for improving results obtained by execution of a non-deterministic algorithm producing different processing results for each time of execution. For example, in a computation for determining whether or not a product of matrix A and matrix B is matrix C, although there is a possibility of returning a wrong answer with a certain probability, in the methods of Non-Patent Documents 3 and 4, it is possible to set the probability to be very small. Thus, the determination can be made with a smaller computational effort than that of using a method for surely returning a right answer.
According to the technology described in Patent Document 1 or the like, a computation time is cut back by narrowing a range of searching k pieces of data nearest to a query, in relation to an entire data to be actually searched. Therefore, the computation time and accuracy of approximation are in a trade-off relationship, and a trade-off ratio is approximately constant. Thus, for example, in order to improve the accuracy of approximation, the computation time has to be increased by widening the range of the search. In consideration of the foregoing problem, it is an object of the present invention to improve the trade-off ratio described above, in other words, for example, to improve the accuracy of approximation without increasing the computation time.
Non-Patent Document 5 and Non-Patent Documents 7 to 9 are described later.