(In the following description, numbers in brackets following a term refer to corresponding numbers in the GLOSSARY at the end of the description.)
The present invention relates to a system for recognizing sentence patterns (VPs) and grammatical cases (grammatical functions) which appear in sentences of a source language[1], i.e., input sentences, in syntactic and semantic analyses of the sentences. This invention is intended particularly for natural language understanding systems and machine translation (MT) systems which are categorized under natural language processing (NLP) systems.
Before a presentation of the summary and details of the invention, the following definitions of a sentence pattern (VP) and grammatical cases will be presented.
Sentence patterns (VPs) basically are the classification of sentences in terms of a predicate verb and the grammatical functions which its arguments[2] bear, such as subject (SUB), indirect object (IOB), direct object (DOB), complement (COMP), and adverbial (A); see Quirk, et al. (1972) 343ff. More intuitively, these sentences are subdivisions and extensions of the natural classification of "five basic sentence patterns" as shown below:
______________________________________ VP1 SUB + V V: Complete Intransitive Verb Ex. Fire burns. VP2 SUB + V + COMP V: Incomplete Intransitive Verb Ex. He became a merchant. VP3 SUB + V + DOB V: Complete Transitive Verb Ex. He likes English best. VP4 SUB + V + IOB + V: Dative Verb DOB Ex. He gave me a book. VP5 SUB + V + DOB + V: Incomplete Transitive Verb COMP Ex. He made us happy. ______________________________________
Hornby (1975) and Palmer (1951) use the term "verb pattern" instead, since the sentence pattern can be regarded as the "subcategorization" of the predicate verb in terms of the grammatical functions, as is clear from the above definition. Some other grammarians use "clause patterns" to avoid confusion with a sentence in a broad sense; see Quirk et al. (1972).
There is a slight difference between the notions of "sentence patterns" and "verb patterns." The former term applies to grammatical functions expressed in "sentence surface", rather than to the functions determined by the predicate verb.. For example, the sentence pattern SUB+V+DOB changes to SUB+V in passivization[3]. The value of the latter term, however, stays the same (i.e. a complete transitive verb: SUB+V+DOB), because its characteristic as a subcategorization is invariable through transforms.
This application uses the term "sentence pattern", with "VP" as its abbreviation, implying the subcategorization of the predicate/predicative, because the invariable portion of the predicate-argument structure is focused upon. The term "sentence pattern" is used instead of "verb pattern" because it is a more accepted term in grammar, and because the concept of subcategorization applies not only to the verb but also to the other parts of speech, e.g., adjectives, nouns, prepositions, and adverbs.
This application uses the most widely accepted terms and definitions for functional categories of arguments, as listed below[4].
______________________________________ Subject (SUB): A nominal argument which controls the inflection of the predicate verb in the form of NOM + V. NOM is an argument with nominal function. Characteristics: 1. Becomes the first candidate for the antecedent of a reflexive pronoun; and 2. Takes nominative case in the declension. Direct Object (DOB): A nominal argument which receives performance expressed by the predicate verb in the form of V + NOM. Characteristics: 1. Usually expresses inanimate things; 2. Takes objective (accusative) case in pronominal expression; and 3. Located after a verb and an indirect object. Indirect Object (IOB): A nominal argument which is related through a DOB in the form of V + NOM. Characteristics: 1. Usually expresses a human being; 2. Takes objective (dative) case in pronominal declension; and 3. Located before a direct object. Complement (COMP): An argument with nexus relationship with the subject/ object. Characteristics: 1. Impassivizable; and 2. Takes nominative/objective case in pronominal declension, respectively. Adverbial (A): An adverbial argument. This category includes a prepositional phrase (PP), an adverbial particle (PCL) [5], an adverbial clause (CONJ), and an adverb proper (ADV). Characteristics: 1. Typically a space adverbial; and 2. Relatively fixed compared with other optional adverbials. ______________________________________
Several other criteria are needed besides these definitions since there are arguments which can be classified in more than one category in the same sentence according to these definitions. There are supplementary criteria with respect to passivizability, wh-word categories in interrogative and relative expressions, mobility in word-order, typical semantics, etc.
Sentence patterns (VPs) are further subcategorized according to syntactic and semantic constraints on various aspects of a pattern to lexically dependent levels, as shown by the works of Hornby (1975) and Palmer (1951) and many lexicographical works of the dictionaries, such as the Oxford Advanced Learner' Dictionary of Current English (OALD), and the Longman Dictionary of Contemporary English (LDOCE). Syntactic constraints are syntactic forms[6], transform possibilities[7], relationships among arguments, and so forth. Semantic constraints are such as inhibited semantics and preference semantics on the argument.
In coping with this further subcategorization, the term, sentence pattern (VP), is re-defined to be as follows:
a subcategorization of the predicate/predicative in terms of grammatical functions (grammatical cases) of its arguments with syntactic and semantic constraints such as syntactic forms, separability, mobility, and inhibitional and preference semantics on the arguments. PA1 a means for providing a slot table, i.e., a table which has all VPs, available to a predicate in an input sentence, arranged in plural rows, and which has arguments of the VPs expressed as slot conditions in slots positioned in a finite number of positions; and PA1 a means for processing a plurality of VPs in parallel, and which includes a means for comparing features of candidates for arguments of VPs in the input sentence which slot conditions of the slots that belong to a selected position on the slot table through all available VPs, by which only VPs with slots whose conditions match a candidate's features are recognized as matched VPs, and by which only grammatical cases that have been described in the slots of the matched VPs are recognized as matched grammatical cases.
When only the aspect of grammatical functions is concerned, i.e., disregarding other syntactic and semantic constraints, the term "grammatical case pattern" is used instead in this application.
The term "grammatical case" is used for "grammatical functions" of arguments in this application because the notion of grammatical "case" is basically the same as that of the grammatical function though there are some differences in categoric setups. The grammatical term "case" applies to "formal" categories of nouns/noun-equivalents for their syntactic relations with the predicate or with other nouns/noun-equivalents (e.g. nominative, dative, accusative, and genitive). The "grammatical functions" described above are "functional" categories with consideration on the argument's "form." The difference is only the focus on the different aspects (form vs. function) of the same phenomenon. The adjective "grammatical" is used in order to distinguish it from "semantic cases" (i.e. semantic roles) which have been proposed by Ch. J. Fillmore (1968).
This application regards a prepositional phrase (PP), an adverbial particle (PCL), and a clause with a conjunction (CONJ) as VP elements because these are subcategorized for by the predicate verb in relation to its meanings, just like the other VP elements.
By this definition, a VP can be summarized to have the following major characteristics: First, a VP presents a characteristic as a structure, and all VP arguments are directly dominated by the predicate in syntax; second, it presents dependency with both the meaning and the predicate; third, it shows objective word-order among arguments; fourth, it shows syntactic constraints on arguments, such as syntactic forms and transform possibilities; and fifth, it shows semantic constraints on the argument, such as inhibited semantics and typical semantics.
Pre-existing VP recognition methods will be shown below, and their problems will be discussed.
In the system of syntactic analysis, conventionally there are two known methods for recognizing VPs of input sentences. One method is the ordinary one that involves "backtracking": a single VP is hypothesized in turn from the stock of VPs available to a predicate by a backtracking process until a VP without a matching failure is detected. The arguments of the hypothesized VP are compared, in order, with the candidates in the input sentence. The other method is to provide all VPs available to a predicate in the form of a "list," and to compare all arguments of all available VPs with each candidate in the sentence in the depth-first manner. A VP(s) with maximum satisfaction and with fewest gaps is selected. These methods are described below in detail.
In the former method, one VP is hypothesized among VPs available[8] to a predicate, and under this hypothesis, the syntactic analysis of the input sentence is carried out by comparing feature values[9] of a candidate[10] with the feature-values of arguments of the hypothesized VP. The comparison is carried out for each of the corresponding features of the candidate and the argument for all features described.
If the candidate can satisfy[11] any argument, the next candidate is compared with the rest of the arguments of the VP. If this next candidate can satisfy any argument, the following next candidate is compared with the rest of the arguments of the VP.
When there is still an argument which has to be satisfied, but there is no appropriate candidate available in the input sentence, or when there is a candidate which has to be recognized as an argument but there is no argument available in the focused VP, pattern matching is regarded as a failure, and a backtracking[12] is triggered: The process goes back to the point where the ambiguity of the VP has appeared, and another VP is hypothesized instead of the focused VP among the rest of VPs available to the predicate.
This operation is repeated until a VP is found in which all arguments of the hypothesized VP match the candidates, and this VP is recognized as the VP of the predicate in the input sentence.
Then, in the latter method, all VPs available to a predicate in the input sentence are expressed as a list, and arguments of these VPs are regarded as "slots." Constraints on an argument are expressed as "slot conditions" in a slot on the list. All arguments (slots) of all available VPs are compared for each candidate (filler) in the sentence in the depth-first manner. This operation is repeated for other candidates, and a VP(s) with maximum satisfaction and with fewest gaps is recognized to be the VP of the input sentence.
Since the grammatical cases can be obtained after a VP is identified as a result of the analysis, the identification of the grammatical case for an argument was not necessary for recognizing a VP. Therefore, in the prior art, a system for recognizing the grammatical case does not exist since its significance has not been recognized, and proper information has not been implemented in the system.
In the first method, the selection of VPs depends on the hypothetical order among available VPs. If the trial order is arranged from VPs which have the maximum number of arguments among available ones, with a view to preferentially recognizing this kind of VP, the number of times of backtracking increases since this kind of VP does not necessarily appear frequently. This is because frequently appearing VPs have relatively fewer arguments. When the trial order is arranged in the order of frequency of VPs, there is a possibility of failing to recognize an argument of a VP that should be recognized.
The second method is free from the problem in the first method. However, there is a problem of a fairly large amount of comparison of candidate's features with the slot conditions. In handling one list of VPs, the amount of comparison of candidate's features with the slot conditions is approximately several times larger than the number of VPs. Therefore, there is a problem that fairly large man-hour and processing time are needed.
Moreover, an even greater problem is found in both methods: linguistic characteristics observed in arguments' positions which are determined in relation to the predicate are not distinguished. The relative positions of arguments have the roles of signifying the subject of predication, possibilities of transforms, restriction of extension, and so forth. Control of these roles becomes possible only when the relative position can be managed through all available VPs.
Therefore, the development of a system that can easily reflect these linguistic characteristics is needed.
Further, it has become clear that grammatical cases are concepts common to all natural languages, that they are closely related to the declension of pronouns, and that they are useful as expressions of semantic roles for arguments of a VP in relation to the predicate, and as information for the analysis and generation of a sentence. Therefore, there has been a long felt need for the development of a system for effectively recognizing the grammatical cases. PG,13