The use of vector spaces to represent documents and words, and to analyze the structure of languages by bringing the mathematics of statistics, combinatorics, and linear algebra to bear, has already proved to be a powerful tool in the field of natural language processing. Vector or matrix representations of word frequency within documents allow software programs to assess the likely meaning of documents in various ways without anybody ever having to read them. Mathematical transformations of such matrices, informed by statistical theory concerning language structure, can enable a textual search algorithm to seek out synonyms and related phrases for a search query, enabling a user to navigate the incredibly vast corpus of text available electronically on the internet and in collections of ebooks and other documents. Most users of the internet have benefitted from such algorithms, albeit unconsciously. Marketers have been able to find thousands of opinions concerning products, scattered among hundreds of millions of documents on various subjects, and compile them in intuitively clear ways, making reading any particular document almost unnecessary. Several different ways of representing documents as collections of vectors have developed since the inception of this discipline. In addition to term-document matrices, in which the rows represent terms (e.g. individual words) present in some document set, and the columns represent documents containing such words, there are matrices representing words and some form of word context in their rows and columns, respectively. Finally, there is a kind of matrix which symmetrically represents terms in both the rows and columns, and in which each cell contains a number representing a relationship, such as co-occurrence, between the cell's row-term and column-term. That last example will be referred to herein as a term-association matrix. These matrices, the vector sets they contain, and the vector spaces they represent, provide engineers with a wealth of textual information that may be used to analyze text.
Natural language processing is still a new and maturing field. That there remains much to be discovered is obvious to anyone who has been exposed to the stilted conversation of automated phone systems, as compared to the facile manipulation of language within the grasp of an ordinary human mind. Furthermore, there remains a dearth of products that can quickly analyze documents and produce useful mathematical representations in a timely manner.