In pattern recognition or retrieval tasks, a database of empirical data to be processed (images, signals, documents, etc.) is commonly represented as a set of a vectors x.sub.1, . . . , x.sub.n of numeric feature values. Examples of feature values include the number of times a word occurs in a document, the coordinates of the center of mass of black pixels in an image, or the set of spectral coefficients output by a speech analyzer.
In the art, the desired processing of sampled empirical data or instances is often implemented using linear models. The knowledge required to carry out the processing is captured in a vector of numeric weights, q=(q.sub.1, q.sub.2, . . . , q.sub.d). Each instance is processed by taking the dot product of the weight vector q with each of the vectors x.sub.1, . . . , x.sub.n of feature values. The weights in q may be set by hand, or by supervised learning from training data, so that the products q.multidot.x.sub.i satisfy some desired goal. In classification problems, the goal is that values of q.multidot.x.sub.i are high when x.sub.i is "related" to or lies within a particular class represented by q. In regression problems, the goal is that q.multidot.x.sub.i approximate some numeric value associated with instance x.sub.i.
In some cases, computing the proximity of one instance to another can be cast in linear form. For instance, if q and x are normalized to have Euclidean norm of 1.0 and are viewed as unit length vectors, then q.multidot.x is the cosine of the angle between them, a commonly used proximity measure. (In this case, the roles of instance vector and weight vector are interchangeable.)
These pattern recognition or retrieval systems are applied, for instance, in statistical information retrieval systems. Documents and queries are represented as vectors of terms or features. The user query can be treated as a sample document, whose similarity to database documents must be found (see G. Salton and M. J. McGill, "Introduction to Modern Information Retrieval", McGraw-Hill, New York 1983), or it can be used as evidence toward setting the parameters of a probabilistic classification function, which is then applied to database documents (see S. E. Robertson and K. S. Jones, "Relevance Weighting of Search Terms", Journal of the American Society for Information Science, May-June 1976). In both cases, linear models are widely used and continue to be developed (see D. D. Lewis, et al. "Training Algorithms for Linear Text Classifiers, Proceedings of the 19.sup.th Annual International ACM SIEIR Conference, 1996).
Applying a weight vector q to a database of instance vectors means computing the vector by matrix product q.sup.T A.sup.T, where A is a matrix whose rows {x.sub.1, . . . , x.sub.n } are the instance vectors. The value of the ith entry of q.sup.T A.sup.T corresponds to the similarity of the instances represented by q and x.sub.i. Computing the cosine proximity measure among all pairs of instances is equivalent to computing the matrix product AA.sup.T. Calculations of this sort are common in the art, but expensive. In text retrieval, for example, the number of instances (documents) can be 10.sup.4 to 10.sup.6 and the number of features (words or phrases) may be 100,000 or more. This is an expensive task in terms of computing resources, even when utilizing reductive sparse matrix and indexing techniques. Dense instances with hundreds of features are also common, for example, with factor analytic text representations or in image retrieval. There is therefore a need for ever more efficient recognition and retrieval technology.