Hitherto, in order to detect a pattern which is substantially the same as an already known pattern from an unknown input signal, or to evaluate similarity between two signals, judgment of similarity or coincidence of data is conducted in all technical fields to which signal processing is related, such as acoustic processing technology, image processing technology, communication technology, and/or radar technology, etc. In general, in order to detect analogous data, a known technique features data as vectors to judge similarity by magnitude of the distance or angle (correlation) thereof.
Particularly, the so-called full search in which similarities between input value and all respective candidates are determined thereafter to determine data where the distance is the shortest is a technology which is simple and has no detection leakage, and is frequently used in the case where data quantity is small. However, e.g., in the case where the portion similar to input image or input voice (sound) is retrieved from a large quantity of accumulated images or voices (sounds), since the dimension of the feature vector per second is large and retrieval with respect to those feature vectors which have been accumulated by ten to several hundred hours is conducted, there is the problem that retrieval time becomes vast when such a simple full search is performed.
On the other hand, in order to retrieve a large quantity of data, in such cases that complete simultaneous retrieval of coded data, e.g., document retrieval is conducted, high speed operation technology such as binary tree search or Hash method is used. In accordance with this technology, data are stored in advance in the state where they are put in order, to omit comparison of branch or table different from input data at the time of retrieval to thereby realize high speed operation. However, in the case where physical signal, e.g., image or sound, etc. is taken as subject, since distortion and/or noise essentially exist in data, it is rare that coded data completely coincide with each other. As a result, in the case where high speed operation technology is used, a large number of detection leakages would take place. In addition, since data is essentially multi-dimensional, there is the problem that it is difficult to implement in advance univocal sequencing to data.
In view of the above, there is proposed, in the Japanese Patent Publication Laid Open No. H08-123460, a technology in which a process for grouping plural vectors close in distance to represent the grouped vectors by one representative vector is performed at the time of data registration to first calculate distance between input vector and representative vector at the time of retrieval to conduct comparison with all vectors within group only with respect to vectors of the group close in distance to thereby permit similar (analogous) vector retrieval to be performed at high speed, and to have ability to reflect distortion of vector at multi-dimension.
Further, there is proposed, in the Japanese Patent Publication Laid Open No. 2001-134573, a technology in which vectors are encoded to index them by short code to thereby suppress increase in the number of times of distance calculations to permit high speed similar (analogous) data retrieval.
However, in the technology described in the above-described Japanese Patent Publication Laid Open No. H08-123460, there was the problem that suitable grouping and selection of representative vector are required at the time of registration so that the registration operation becomes troublesome. Moreover, there was also the problem that since it is not limited at the time of retrieval that, e.g., registered vector which is least distant with respect to input vector belongs to group in which representative vector which is least distant with respect to input vector represents, operation for determining group to be retrieved becomes troublesome.
Further, in the technology described in the above-described Japanese Patent Publication Laid Open No. 2001-134573, there was the problem that distance relationship between vectors is lost when encoding is performed, or the results in complicated distance relationship in non-additive or non-monotonous manner so that mechanism of registration and/or retrieval becomes troublesome.
Here, since image and/or sound are essentially time-series, it is desirable that registration is conducted on the real time basis, and it is desirable that time order can be reflected at the time of retrieval. In other words, there are instances where such techniques which require registration operation to exchange time-series, and/or which require redistribution (reshuffle) with respect to data or index of already registered data at the time of registration as in the case of the technology described in the above-described Japanese Patent Publication Laid Open No. H08-123460 and Japanese Patent Publication Laid Open No. 2001-134573 are not suitable for retrieval of time-series data.
That is, there is desired such a mechanism that retrieval is performed in a time extremely shorter than that at full search while satisfying the conditions where                (a) structural simplicity and robustness with respect to distortion of full search are not lost,        (b) registration and/or deletion are conducted within real time, and        (c) operation with respect to other already registered data is not required by registration or deletion.        