1. Field of the Invention
The invention relates to methods and apparatus for pattern recognition, and particularly, though not exclusively, speech recognition.
The invention is particularly concerned with monitoring temporal sequences of input vectors so as to recognise particular patterns.
2. Description of Related Art
In this specification, an "N dimensional vector" comprises a group of N values, each value corresponding to a respective dimension of the vector. The values may be represented by analogue signals or digitally. Such a vector has a magnitude which may be, for example, defined by the square root of the sum of the squares of the values, and also a direction in N dimensional space. For simplicity, throughout this specification, scalar quantities (except for N) will be represented by lower case letters, and vector quantities by upper case letters.
Vectors of this type can in particular be derived from an analysis of human speech. Thus, an analogue signal representing a series of speech sounds can be regularly sampled and the content of each sample can be represented in terms of a vector comprising a set of feature values corresponding, for example, to the amplitude of respective frequencies within the sample.
A paper entitled "Clustering, Taxonomy, and Topological Maps of Patterns" by T. Kohonen in Proceedings of the Sixth International Conference on Pattern Recognition, October 1982, pages 114-128 describes an approach for the statistical representation of empirical data. Sets (vectors) of input data are successively applied, in parallel to each of a number of processing units regarded as forming a two-dimensional array; each unit produces a single output proportional to the degree of matching between the particular input vector and an internal vector associated with that unit. An adaptation principle is defined so that a succession of input vectors, which form a statistical representation of the input data, cause changes in the internal vectors. This works (for each input vector) by:
(1) identifying the unit whose reference vector is most similar to the input (eg the smallest Euclidean distance); PA0 (2) defining a neighbourhood within the array, around this unit; PA0 (3) changing the internal vectors of those units belonging to this neighbourhood; the direction of change being such that the similarity of those internal vectors is increased.
As this `self-organisation` process proceeds, the size of the neighbourhood is progressively reduced; the magnitude of the adjustments may also decrease. At the conclusion of this process, the array internal vectors define a mapping of the input vector space onto the two-dimensional space. Kohonen trained such an array using manually-selected speech samples of certain stationary Finnish vowel phonemes (selected to exclude those including transients), the input vectors each consisting of fifteen spectral values, and found that it mapped the phonemes into the two-dimensional array space.