1. Field of the Invention
The invention generally relates to a method and system for making decisions based on a statistical interpretation of sign or symbol relationships. In particular, the invention relates to a method and system for making decisions based on a recursive statistical interpretation of linguistic sign and symbol relationships.
2. Description of the Prior Art
There have been a variety of approaches taken in developing devices which can interpret linguistic information. One known approach is to create a model of language by defining rules of grammar that are used by the system. In such systems, concepts such as nouns and verbs must be codified in software or by some other means for the system to have understanding of the human language. One advantage of such a system is that the knowledge of the designers is provided through code to the system. However, such systems are limited by the models used to create them.
Another approach is to allow a device itself to interpret the data. The device learns on its own the fundamentals of grammar or symbol relationships. This approach effectively eliminates designer-based limitations. However, such systems have been computationally complex. All of the possibilities of grammar or permissible symbol relationships create an ever exponentiating explosion of computations.
An object of the current invention is to provide an automated system and method for making decisions which are independent of the constraints of any specific language or other system of symbolic representation. A further object is to provide such a system which does not require unreasonably long computational time or unreasonably large memory requirements.
The autognome (gnome) is a device for statistically analyzing the relationships between signs or symbols, preferably on two levels, to make decisions. The signs the gnome is analyzing in the preferred embodiment are alpha-numeric characters which form text. The analyzed text can be from a variety of sources, i.e. scanned in documents or from voice recognition devices. The gnome can be used to make decisions in virtually any context. Examples include responding to queries concerning menu items for a cafe and responding to e-mail inquiries.
A preferred autognomic decision making system includes a sensing module, a dyadic morphologic module, a dyadic taxemic module, a triadic taxemic module and a pseudo deduction module. The sensor component receives sets of training and query data in a prespecified format, identifies elemental symbols and defines delimiters, of preferably two orders, in the sensed data. First order delimiters define first order sets of sequential elemental symbols and second order delimiters define second order sets of sequential first order sets.
The dyadic morphologic component receives the sequential elemental symbols identified by the sensor component and evaluates the sequential relationship of elemental symbols and sets of elemental symbols within first order sets. For training data, the morphologic component identifies a most statistically significant set of subsets of each first order set of elemental symbols as a token associated with that first order set. For query data, the morphologic component identifies the most statistically significant set or sets of subsets of each first order set of elemental symbols which corresponds to training generated tokens and identifies such corresponding tokens as tokens associated with that first order set of query data.
The dyadic taxemic component receives representations of the sequential first order sets of elemental symbols and evaluates the sequential relationship of first order sets and subsets of sequential first order sets within each second order set. The dyadic taxemic component identifies a most statistically significant tree of subsets of each second order set which includes all the elements of the second order set as well as each subtree included within the most statistically significant tree. In some instances, a single tree is not identified with respect to a particular second order set so that two or possibly more most statistically significant trees which have no common element of the second order set, but collectively include all of the elements of the second order set, are identified as well as their subtrees.
The triadic taxemic component receives the most statistically significant trees corresponding to each second order set identified by the dyadic taxemic module and evaluates the sequential relationship of the subtree elements of each tree. For each second order set, the triadic taxemic component identifies one or more most statistically significant groupings of subtree elements called percepts as tokens with respect to the corresponding second order set.
The dyadic and triadic modules may all be variations of a single generalized semiotic processing module. In operation, the gnome can call the generalized semiotic module and provide instructions and parameters for the generalized semiotic module to operate as a dyadic or triadic, morphologic or taxemic module. This enables the configuration of the gnome of the present invention to be readily altered dependent upon the specified application in which it is used.
In the preferred embodiment, a pseudo deduction module receives identified tokens, preferably from both the dyadic morphologic module and the triadic taxemic module, and stock answers or response categories associated with respective sets of training data. The pseudo deduction module associates each stock answer or response category with the tokens generated from the evaluation of one or more respective sets of training data associated with that answer or category. The pseudo deduction module then evaluates tokens generated from a set of sensed query data and identifies a statistically most significant stock answer or response category associated with the generated query data tokens.
In the preferred embodiment, the prespecified data is in a linguistic form where the sensor component identifies linguistic symbols as elemental symbols, spaces and punctuation as first order delimiters and selected sentence punctuation, i.e. periods, question marks, and exclamation points, as second order delimiters. As such, first order sets are generally words and second order sets are generally sentences or sentence phrases. The sensor also preferably identifies artificial delimiters based on a selected maximum word or sentence length.
In operation, the gnome first analyzes a training corpus, i.e. training data associated with a set of response categories, in a training mode. During training mode the gnome creates a knowledge data base in the dyadic and triadic modules. The gnome is then switched to a performance mode to receive inquiries. In response to a query, the gnome selects a statistically most appropriate response out of the set of response categories based upon the knowledge data generated during training. A response can then be provided containing a selected stock answer, routing or combination thereof which is mapped to the selected response category. However, in the preferred embodiment, if the most statistically appropriate response category does not meet a prespecified criteria, a response indicating that the gnome is unable to provide a satisfactory answer is provided to the inquirer.
As will be apparent to those skilled in the art, one of the major advantages of the gnome""s analysis technique is that it is equally applicable to languages other than English, for example French, German, etc., to symbol based languages such as Chinese and Japanese, and to non-language symbolic environments.