A dialog system has a text or audio interface, allowing a human to interact with the system. Particularly advantageous are ‘natural language’ dialog systems that interact using a language syntax that is ‘natural’ to a human. A dialog system is a computer or an Interactive Voice Response (IVR) system that operates under the control of a dialog application that defines the language syntax, and in particular the prompts and grammars of the syntax. For example, IVRs, such as Nortel's Periphonics™ IVR, are used in communications networks to receive voice calls from parties. An IVR is able to generate and send voice prompts to a party and receive and interpret the party's voice responses made in reply. However, the development of a dialog system is cumbersome and typically requires expertise in both programming and the development of grammars that provide language models. Consequently, the development process is often slower than desired.
One approach to reducing the time and expertise of developing natural language dialog systems is to use processes whereby a relatively small amount of data describing the task to be performed is provided to a development system. The development system can then transform this data into system code and configuration data that can be deployed on a dialog system, as described in the specification of International Patent Application No. PCT/AU00/00651 (“Starkie (2000)”), incorporated herein by reference. However, one difficulty of this process is that the development system needs to make numerous assumptions, some of which may result in the creation of prompts that, while understandable to most humans, could be expressed in a manner more easily understood by humans. For example, a prompt may be created that prompts a person to provide the name of company whose stocks they wish to purchase. The development system might create a prompt such as “Please say the company”, whereas the phrase “Please say the name of the company whose stocks you wish to purchase” may be more understandable to a human interacting with the dialog system.
As described in Starkie (2000), another approach for reducing the time and expertise requirements for developing a natural language dialog system is to use processes whereby developers provide examples of sentences that a human would use when interacting with the dialog system. A development system can convert these example sentences into a grammar that can be deployed on a computer or IVR. This technique is known as grammatical inference. Successful grammatical inference results in the creation of grammars that:                (i) cover a large proportion of the phrases that people will use when interacting with the dialog system;        (ii) attach the correct meaning to those phrases        (iii) only cover a small number of phrases that people won't use when interacting with the dialog system; and        (iv) require the developer to provide a minimal number of example phrases.        
The use of grammatical inference to build a dialog system is an example of development by example, whereby a developer can specify a limited set of examples of how the dialog system should behave, rather than developing a system that defines the complete set of possible examples.
Thus a development system can be provided with a list of example sentences that a human would use in reply to a particular question asked by a dialog system. These example sentences can be defined by a developer or by recording or transcribing the interactions between a human and a dialog system when the dialog system has failed to understand the sentence that the human has used. In addition, a development system can be provided with a list of interactions between a human and a dialog system using a notation that lists the sentences in the order they are spoken or written, indicating whether it is either the dialog system or the human that is speaking (or writing). This is referred to as an example interaction. Similarly, an example interaction can be defined by recording or transcribing the interactions between two or more humans, or between a human and a dialog system when the dialog system has failed to understand the sentence that the human has used. A benefit of this technique is that example interactions are understandable to anybody who understands the language contained within them. In addition, most people would be capable of creating example interactions of desired behaviour. There is also the benefit that example interactions describe specific behaviours, given a set of inputs, and therefore provide test cases for the behaviour of the dialog system. As they document specific behaviour, there is also a reduced risk of errors being introduced in the specification of the dialog system for the given behaviour listed in the example interactions. Example interactions are also ideal forms of documentation to describe the behaviour of the dialog system to others.
Example interactions can be annotated to include high level descriptions of the meaning of a sentence. This annotation might include the class of the sentence, and any key pieces of information contained in the phrase, known as slots. For example, the sentence “I want to buy three hundred acme bolt shares” might be annotated to signify that the class of the sentence is buy_stocks as opposed to sell_stocks, and that the quantity slot of the sentence is 300, while the stockname slot is “acme bolt”.
A grammatical inference process for developing an interactive development system is described in Starkie (2000). The grammatical inference process generates the example sentences used to infer the grammar, and the process is capable of generalising the inferred grammar so that it can be used to generate many more phrases than the training examples used to infer the grammar. A limitation of existing grammatical inference processes is that given a set of training sentences that the grammar is required to generate, referred to as positive examples, there is always more than one possible grammar that could generate those sentences. Therefore mathematically it is provable that it is not possible for the grammatical inference process to infer the grammar exactly. One approach to overcome this problem is to enable the developer to sample the inferred grammar and provide additional sentences to guide the grammatical inference process to infer the correct grammar. It is provable that even under these circumstances it is still not possible for the grammatical inference process to eventually infer the correct grammar.
However, it is possible for the inference process to eventually infer the exact solution over one or more iterations if one of the two approaches are used: either only a sub-set of all possible context-free languages can be learnt, or the developer can provide additional but grammatically incorrect sentences that should not be generated by the grammar, referred to as negative examples. A process that can do this is referred to as an identification in the limit process. Both of these approaches will be advantageous if they reduce the amount of development required to build the grammars. In addition, the developer can guide the grammatical inference by providing positive and negative examples even if they don't know what the underlying grammar should be. All that is required is that they can identify that a given sentence should or should not be covered by the grammar. This is not surprising because humans create the training examples and the exact model of language used by humans when formulating sentences is not known.
As described in Gold, E. M. [1967] Language identification in the limit, in Information and Control, 10(5):447-474, 1967 (“Gold”), it was demonstrated in 1967 that the grammars used to model natural languages at that time could be learnt deterministically from examples sentences generated by that grammar, but that it was possible for a language to be learnt from both examples sentences generated from that grammar, referred to as positive examples, and examples of bad sentences that are not generated from that grammar, referred to as negative examples.
Gold's findings contradicted the findings of psycholinguists that children are rarely informed of their grammatical errors, yet children do eventually learn natural languages. To reconcile this contradiction, Gold suggested that, even if the classes of grammars known at that time could not be learnt from arbitrarily presented text, there might be ways in which these grammar classes could be restricted in such a way that they could be learnt.
As described in Angulin D. [1982] Inference of Reversible Languages, in Journal of the Association for Computational Machinery 29, p 741-765 (“Angulin”), it was subsequently shown that some classes of grammar could be learnt from example sentences, the most notable of which was referred to as the K-Reversible class of regular language. Angulin also described a process for inferring K-Reversible regular languages. However, this class of grammar is not powerful enough to describe some of the constructs found in human language.
Sakakibara, Y. [1992] Efficient Learning of context-free grammars from positive structural examples, in Information and Computation, 97. 23-60 (“Sakakibara”), defined a subset of context free grammars was defined that could be inferred from positive (in the sense of positive examples described above) unlabelled derivation trees, and a process for doing so. An unlabelled derivation tree is a parse tree in which the non-terminal names attached to edges in the tree are unknown. The processes described in Sakakibara, and also in Oates, T., Devina D., Bhat, V. [2001], Learning k-reversible Context-free grammars from Positive Structural Examples, available at http://citeseer.nj.nec.com/544938.html, can only be applied when the structure of the grammar is partially known.
However, no sub class of context free grammars has yet been identified that can be deterministically learnt from unlabelled examples. Instead, most prior art processes use some probabilistic or heuristic bias.
Van Zaanen, M. [2001], Bootstrapping Structure into Language: Alignment-Based Learning, Phd Thesis, The University of Leeds School of Computing, (“Van Zaanen”) describes a new unsupervised learning framework know as alignment based learning that is based upon the alignment of sentences and a notion of substitutability described in Harris, Z. S. [1951], Structural Linguistics, University of Chicago Press, Chicago Ill., USA and London, UK, 7th (1966) edition, formerly entitled: Methods in Structural Linguistics. The technique involves the alignment of pairs of sentences in a corpus of sentences. Sentences are partitioned into substrings that are common and substrings that are not. An assumption of the technique is that the common substrings are generated by a common rule, and the portions of the sentences that are not common can be represented by rules that are interchangeable. For instance, consider the two sentences                Bert is baking [a biscuit]x1 . . . (1)         Ernie is eating [a biscuit]x1 . . (2)         
Using alignment based learning, a learner might align the two sentences such the phrase “a biscuit” is identified as being common to both, and therefore concludes that the two phrases are generated by the same rules. Similarly, the learner may conclude that the phrases “Bert is baking” “Ernie is eating” are interchangeable, resulting in the rules:                S->X2 X1        X2->bert is baking        X2->ernie is eating        X1->a biscuit        
In this notation, each line represents a rule whereby the symbol on the left hand side can be expanded into the symbols on the right hand side of the rule. Symbols are defined as either terminal or non-terminal symbols. A non-terminal symbol is a symbol that can be expanded into other symbols. A non-terminal can appear on either the left hand side or the right hand side of a rule, and always begins with an upper case letter. In contrast, a terminal symbol cannot appear on the left hand side of a rule, and always begins with a lower case letter. The non-terminal “S” is a special non-terminal represents an entire sentence.
If a third phrase is introduced as follows:                Bert is baking [a cake]x1 . . . (3)         
The substring “Bert is baking” may then be identified as being common to both example (1) and (3), resulting in the addition of the rule                X1->a cake        
The resultant grammar can now be used to generate an additional phrase                Ernie is eating a cake--(4)        
Alignment based learning suffers from a series of problems. The first of these problems is that two strings can often be aligned multiple ways and selecting the correct alignments to identify constituents is nondeterministic. For instance, consider the two phrases:                From england to sesame street new york        From sesame street to sydney        
A large number of alignments are possible, two interesting ones to consider are
      (                                        from            ⁢                                                  ⁢            england            ⁢                                                  ⁢            to            ⁢                                                  ⁢            sesame            ⁢                                                  ⁢            street            ⁢                                                  ⁢            new            ⁢                                                  ⁢                          york              --                                                                                      from              --                        ⁢            sesame            ⁢                                                  ⁢            street            ⁢                                                  ⁢            new            ⁢                                                  ⁢            york            ⁢                                                  ⁢            to            ⁢                                                  ⁢            australia                                )        and    ⁢                  (                                                      from              ⁢                                                          ⁢                              england                --                                      -                          to              ⁢                                                          ⁢              sesame              ⁢                                                          ⁢              street              ⁢                                                          ⁢              new              ⁢                                                          ⁢              york                                                                                      from              ⁢                                                          ⁢              sesame              ⁢                                                          ⁢              street              ⁢                                                          ⁢              new              ⁢                                                          ⁢              new              ⁢                                                          ⁢              york              ⁢                                                          ⁢              to              ⁢                                                          ⁢                              australia                --                                      -                                )  
The first of these alignments requires 2 deletions and 2 insertions, compared to 2 substitutions, 3 insertions, and 3 deletions for later. Despite requiring a greater number of insertions, deletions and substitutions, the second alignment would result in the following grammar:                S->from Place to Place        Place->england        Place->sesame street new york        Place->australia        
This grammar is closely aligned to the English language, and thus it is clear that using alignments that minimize the number of insertions, deletions and substitutions is not always the most desirable strategy.
A second problem of alignment-based learning is that is can result in overlapping constituents. This undesirable situation arises from the fact that it is not guaranteed that substrings common to two phrases are generated from the same rules. For instance, consider the following three training examples:                oscar sees the apple.        big bird throws the apple.        big bird walks.        
Aligning the first two sentences can result in the creation of the following rules:                X1->oscar sees        X1->big bird throws.        
Aligning the last two sentences can result in the creation of the following rules:                X2->throws the apple        X2->walks        
The sentence “big bird throws the apple” thus contains the constituents, “big bird throws” and “throws the apple”. These constituents overlap, and if the sentence is created using a context-free grammar, then the sentence can only contain one of these constituents.
A third problem with alignment based learning is that it is not guaranteed that substrings used interchangeably in one part of the language can be interchanged everywhere. For instance, consider the following three sentences:                that bus is white        that bus is going downtown        john bought some white ink        
Aligning the first two sentences can result in the creation of the following two rules:                X1->white        X1->going downtown        
If it is assumed that substrings used interchangeably in one part of the language can be interchanged everywhere, then the following would be expected to be a legitimate English sentence when in fact it is not:                john bought some going downtown ink.        
As described in Starkie (2000), it is a requirement of dialog systems to understand the meaning of sentences presented to them as either spoken or written sentences. Traditionally, spoken dialog systems use attribute grammars to attach meanings to sentences in the form of key value pairs. This was first described by D. E. Knuth, in “Semantics of context-free languages”, Mathematical Systems Theory 2(2): 127-45 (1968). Most commercial speech recognition systems such as Nuance and Scansoft use attribute grammars to attach meanings to sentences, and the W3C “international Speech Recognition Grammar Specification” (SRGS) standard, described at http://www.w3.org/TR/speech-grammar, is an attribute grammar.
Attribute grammars attach meanings to sentences in the form of key value pairs, as follows. For example, the expression:                i'd like to fly from melbourne to sydneycan be represented by the attributes:        {op=bookflight from=melbourne to=sydney}        
The values of attributes can be arbitrarily complex data structures including attributes, lists, lists of attributes numbers and strings. As described in B Starkie, Inferring attribute grammars with structured data for natural Grammar processing, in Grammatical Inference: Process and Applications; 6th International Colloquium, ICGI 2002, Berlin, Germany: Springer-Verlag (“Starkie (2002)”), all instances of arbitrarily complex data structures can be represented by one or more unstructured attributes using the same notation used in the “C” and JavaScript programming languages to assign values to members of complex data structures. For instance, a data structure with n unstructured elements such as a date can be represented as n unstructured attributes, for instance:                date.day=1 date.month=january date.year=2004.Similar notations are described in Starkie (2002) for structures containing structured elements, lists, numbers and concatenated strings. For that reason the following description is limited to the inference of grammars that convert between sentences and unstructured attributes. It will be apparent to those skilled in the art that the process can be extended to infer grammars that can convert between sentences and arbitrarily complex data structures using the techniques described in Starkie (2002).        
An alternative grammar formalism for attaching data-structures to sentences and vice-versa is the unification grammar. The most commonly used unification grammar is the Definite Clause Grammar (DCG) that forms part of the Prolog programming language, as described in ISO/IEC 13211-1 Information technology—Programming languages—Prolog—Part 1: General core, New York, N.Y., International Organisation for Standardization (“ISO 1995”). Depending upon the exact form of attribute grammar and unification grammar, most attribute grammars can be transformed into unification grammars, but some unification grammars cannot be rewritten as attribute grammars without the loss of some information.
It is desired to provide a grammatical inference system and process that alleviate one or more of the above difficulties, or at least provide a useful alternative.