The present invention relates to computer systems storing natural language concepts and a dictionary or database of meanings associated with symbols useful for storing natural language concepts. Most dictionaries serve people who are looking up loosely defined meanings or spellings for specific words. A few dictionaries serve computational linguists, in order to automatically identify grammatical structures within simple sentences. Although grammatical structure conveys some meaning within sentences, the full meaning of sentences extends to deeper levels than grammar. To help identify those deeper meanings, computational linguists devise systems to store meanings in ways which differentiate abstract general meanings from concrete specific meanings (see Steven Pinker, The Language Instinct .COPYRGT.1984, page 219).
The ability to differentiate abstract from concrete meaning confers significant advantages when parsing and interpreting conversational dialog. In sentences, references to pronouns and implied objects can be resolved by linking them to structures of deeper meaning. Within the flow of conversation, abstract meanings are associated with implied objects in a manner affected by the changing context of the conversation. For overall conversations, links to abstract meanings complement links to concrete meanings. A conversation linking only abstract meanings is overly vague. A conversation linking only concrete meanings omits links to the organizing essences and presumptions which are a crucial part of deeper meanings. The following conversation (from Steven Pinker, The Language Instinct .COPYRGT.1984, page 227) shows the significance of deeper meaning in normal conversation:
Woman: I'm leaving you. PA1 Man: Who is he? PA1 Son: How come you have more money than I do? PA1 Father: Son, you have to buy low and sell high. PA1 Son: How did you know what to buy? PA1 Father: Early to bed and early to rise makes a man healthy, wealthy and wise. PA1 Tourist: How do I get to the train station? Is it hard to get to? PA1 Commuter: No, it's always easy this time of day. You can take the 40A bus, in about ten minutes. PA1 Tourist: Where do I get it? PA1 Commuter: Right here at this bus stop.
A grammatical structure without links to deeper meaning about human behaviors would not comprehend the situation described by the above conversation. On the other hand, conversations linking only to deeper meaning also can be incomplete. The following conversation shows how a conversation weakens when deprived of concrete information:
Without linking the aphorisms to specific concrete actions, the father's responses are hollow and incomplete. A system for differentiating between abstract and concrete meaning can provide a basis for identifying the hollowness of the father's responses. The following conversation has a more balanced range of abstraction:
The concept of `hard to get to` and `always easy` above are general abstract concepts, and the concepts of `the 40A bus` and `ten minutes` are concrete specific concepts linked to the general concepts by a mid-level concept `at this bus stop`. Thus the above conversation conveys a range of abstraction which has a quality of completeness and continuity. A system for quantifying the difference between abstract and concrete meaning would also provide a basis for quantifying the qualities of completeness and continuity in all three of the above conversations, by tracking the range of abstraction to which each conversation links.
In a system which must parse and interpret conversational dialog, the need to quantify the difference between abstract and concrete meaning frequently arises, and yet that need is typically unfulfilled. The spreading use of computers has caused this deficiency to be commonplace, particularly in the standard graphical user interfaces between humans and computers.
Most current human-to-computer conversational interfaces contain primarily concrete meaning. As a result, such conversations are dominated by literal commands issued from the human and dominated by statements of literal facts issued from the computer. When such conversations run into snags, the snags remain until the human connects the utility of a specific literal command to some abstract general need unrecognized by the computer.
If these conversational interfaces freely utilized and accurately tracked varying levels of abstract general needs, humans could issue commands at various levels of abstraction, and the computer would be able to link those commands into a context of coherent semantic abstraction hierarchies, displaying the linked hierarchy to resolve any ambiguities. Such an interface would be far more robust than current systems when reporting and handling unexpected errors, since errors would be linked to precise points within semantic abstraction hierarchies, rather than being presented within ambiguous contexts such as beeping pop-up dialog boxes or chronological message logs.
Computer-to-computer interfaces also have deficiencies typical of conversations based upon overly literal meaning. The interfaces between computer hardware are mostly based upon protocols of literal meaning, and interfaces between computer software are mostly based on application program interfaces of almost purely literal meaning. The difficulty and necessity of hand-coding error-handlers and protocols for computer-to-computer interfaces has created enormous systems integration problems. Systems integration programming teams typically arrive at solutions by diagnosing faults in, and remapping the causal links between literal commands and general systems integration goals. By linking each command to a precise point within semantic abstraction hierarchies headed by general systems integration goals, the work of systems integration programming teams could be clarified and automated.
In prior art, abstract meanings have been manually defined for parsing deeper meaning from text. Early case-based reasoning systems such as CYRUS (see Janet Kolodner, Retrieval and Organizational Strategies in Conceptual Memory .COPYRGT.1984 on page 34), have used a limited number of manually defined abstract verbs such as PTRANS. PTRANS stands for an abstract verb of motion which requires an actor, an object, a source and a destination. PTRANS is a template of reusable meaning for the broad range of events. Events such as shipping a package overnight, walking from a bathroom to a bedroom and returning books to the library all can inherit meaning from PTRANS.
In CYRUS, these manually defined abstract events provide a means to recognize the similarity between events. For instance, shipping, walking, and returning events can all be linked to a common definition of PTRANS, thus creating an index tree headed by PTRANS. Whenever an event must be recalled involving an actor, an object, a source and a destination, CYRUS can narrow the search to events connected with PTRANS, thus speeding up the search. CYRUS also conserves memory by using links to PTRANS which take up less memory than storing the PTRANS template in each inheriting event such as shipping, walking or returning.
CYRUS also creates new generalizations by comparing features of incoming events to the features of previous events. If the same features repeatedly appear in incoming events, CYRUS then creates a new generalization event containing the common features, and replaces those common features inside other events with links to the generalization event.
After creating such generalization events, CYRUS then tracks their certainty, since CYRUS may have over-generalized. CYRUS tracks this certainty by comparing the number of events which successfully utilize the generalization to the number of events which would conflict with the generalization by having some but not all its features. When conflicts appear frequently, CYRUS computes a lowered certainty for the generalization-concept. When a generalization-concept falls below a certainty threshold, it is removed, causing CYRUS to re-map connections away from the failed generalization-concept.
Although CYRUS remembers generalization failures as notable events, and afterwards avoids such generalizations, CYRUS has no methods to utilize generalization failures as evidence of abstract conceptual activity. CYRUS's methods dealing with failure are thus consistent with assumptions that all current art shares: that failure is to be avoided, that certainty makes information more useful and that uncertainty makes information less useful.
These assumptions guide current systems builders to treat uncertain information like unrefined mineral ore. Current art attempts to refine large quantities of uncertain information into small quantities of certain information, resulting in large quantities of rejected information which are information "tailings" regarded as useless by the refinement process. However, in the present invention, by recycling concepts associated with failure, useful information about the abstract structure of concepts can be acquired.
In current art there are many systems which rely upon manually defined abstraction hierarchies for parsing meaning from text. Parsing usages of such hierarchies were described by Janet Kolodner in Retrieval and Organizational Strategies in Conceptual Memory .COPYRGT.1984 on pages 346-349. Parsing systems have been developed which combine definitions of abstraction hierarchies with definitions of augmented transition networks, for example, U.S. Pat. No. 4,914,590 granted Apr. 3, 1990 to Loatman et al. Systems also have been developed utilizing specific formats for storing abstraction hierarchies, for example, U.S. Pat. No. 4,868,733 granted Sep. 19, 1989 Fujisawa et al. By carefully constructing user interfaces to describe existing abstraction hierarchies, these systems help people to edit the existing abstraction hierarchies.
Some systems use abstraction hierarchies to compute the degree of similarity between concepts (see Janet Koloder, Case-Based Reasoning .COPYRGT.1993, page 346), by using `specificity values` assigned to each node of abstraction hierarchies, seeking a `most specific common abstraction` between two symbols. In such systems, highly abstract concepts have less specific numbers assigned, and highly concrete concepts have more specific numbers assigned, the `specificity values` being a span of numbers such as fractions (0.2, 0.5, 0.7, etc.) ranging from zero to one in size.
However, there are no means in prior art to automatically assign `specificity values` to symbolic nodes using as input only the changes in topology of links within a semantic network, as does the present invention. By automatically assigning `specificity values` to symbolic nodes, the consistency of `specificity values` can be maintained automatically, preventing errors of human judgment which creep into large manually defined linguistic structures.
The extreme complexity of linguistic structures (most commercially useful natural language dictionaries have over 100,000 entries) prevents any one person from defining the whole linguistic structure. When multiple people define a large linguistic structure, slight inconsistencies in their viewpoints for interpreting the linguistic structure cause dramatic exponential increases in the population of inconsistencies within the linguistic structure.
The present invention deals with these inconsistencies by using symptoms of inconsistencies to re-assess `specificity values` and properly restructure semantic abstraction hierarchies based on the latest assessment of `specificity values`.