The present invention relates to the field of parsing and interpreting natural language text. Deciding which combinations of elements are possible in a natural language (language modeling), deciding what the syntactic relationships are between elements in a given language string (parsing), and deciding the information, or even representing the information expressed by a given natural language string (language understanding), are all fundamental and largely unsolved natural language processing problems. Current approaches can be roughly divided into symbolic, statistical, and distributional:    1) Symbolic or rule-based (rules about symbols) methods which seek to find a combination of rules which describe a given string in terms of a set of symbols.    2) Statistical methods which optimize the information over larger numbers of simple observations to make predictions about a given string.    3) Distributed or memory-based methods which keep the large numbers of simple observations and use them to define or recognize elements of natural language in terms of assemblies.
Rule-based methods keep a list of all possible relationships between all possible classes of language tokens, then at processing time they look up the tokens in a dictionary and attempt to decide which of their many possible classes, in which of many possible combinations, best describes a given string. E.g.,
All possible classes (dictionary):
    the<-DET    Rain<-N    in<-PREP    Spain<-N    . . .All possible relations:    NP<-DET+N    PP<-PREP+N    NP<-NP+PP    . . .Analysis: ((The (DET)+rain (N))+(in (PREP)+Spain (N)))The above bracketing, denoting a sequence of rule combinations, defines a dependency tree or structure between the elements, as follows, called the “parse tree”:

Statistical methods also keep a list of all possible relationships between all possible classes, but in they simplify model building by considering, in general, more, simpler relationships, and use statistics to summarize regularities and optimize predictions. E.g., in+Spain, on+Spain=>posit a statistical variable PREP where (“in”, “on”) are members of PREP and “Spain” follows PREP with probability P(PREP|“Spain”). The analysis is as with rules, but now each combination (branch of the tree) has a probability.
Distributed methods also build classes among more, simpler relationships, but they don't summarize the information. They gain in flexibility and robustness by representing classes directly in terms of collections. E.g., in+Spain, on+Spain=>posit a paradigmatic (paradigmatic-sets of alternative, as opposed to syntagmatic-sets of consecutive) vector class PREP where “in”, “on” are examples of PREP and “Spain” is a component of PREP. (Or inversely, defining the vector in terms of equivalents rather than contexts, posit a vector class PREP where (“in”, “on”) are components of PREP, and PREP is defined as the set of all things which precede “Spain”, . . . ). The analysis is as with rules, but you can have partial matches between vector classes. “Vector” is a well-known expression in natural-language processing; and for present purposes a vector may be briefly described as a list.
What distributed models gain in flexibility and robustness of representation (partial matches between classes), however, they suffer by being unwieldy to describe (sets instead of symbols), and for all their advantages in modeling static concepts there is no consensus on how they might be advantageous for modeling combinations of concepts (syntax or grammar-which is usually still done by keeping a list of all possible relationships between all possible classes).
The interesting thing about all these prior art models for language processing is that no-one has yet been able to compile a truly comprehensive list of all possible relationships between all possible classes of language tokens. And even probabilities only minimize the errors which arise from incomplete information (often called the “sparse data problem”); they don't eliminate them. The models don't quite seem to fit the problem. The present status of natural language processing might justifiably be compared with the status of artificial flight before the discovery of the airfoil.
Vectors of associative properties have become a popular method of representing the grammar and meaning of words and sometimes word sequences (U.S. Pat. No. 6,173,261 is herein incorporated by reference). But hitherto this has mainly been because of their flexibility and robustness, not generally because of their generative power. I think this power is necessary. In simple terms, the failure in accuracy of the prior art can be expressed as a failure to explain why the expression “strong tea” is preferred over the expression “powerful tea” or why we tend to say “apples and oranges” instead of “oranges and apples”, “bacon and eggs”, “chalk and cheese”, and any one of a number of gradations of syntactive restrictiveness between these and what are recognized as errors in traditional grammar. E.g., from the literature:    It's/that's easier said than done    I'm terribly sorry to hear that    You can't believe a word he says    I see what you mean    sad to say    time of day    in advance    (verb) the un(verb)able    . . .
The issue for accurate syntax modeling is why are we comfortable, even familiar, with these examples, but less so with, say:    That is easier spoken than done    I am terribly happy to hear that    You can't believe the word he says    I see the thing you say    sad to mention    time of week    in forward
Not only is the boundary subtle, but it is fuzzy (there are degrees of distinction). In a classical rule-based or statistical processor we would need a class for every such distinction (and degree of distinction). The power of combinations of examples provides a more practical solution. While we cannot imagine listing classes for each distinction, it is easy to imagine producing a unique combination of examples which distinguishes each, and which provides a fuzzy distinction. “That is easier said than done”, if used often enough, can define its own class and explain itself, all the while providing elements which can form other classes and explain broader regularities in expressions of the type “That is ______ than ______”, or “That ______ easier”. “Strong tea” and “powerful tea” might be distinguished because the distribution of word associations associated with “strong” is different from that associated with “powerful”, in detail, if not generalities.
While current distributed systems have the power to describe such subtleties of representation, they are limited by the perception that grammatical groupings represent immutable qualities, that classes are to be found not forced.