The present invention relates to data transformation by a data processor, and more particularly to transformation from an input symbol string into an output term and from data of some structure into data of another structure, both performed using transformation rules supplied from the outside. The invention can be applied, for example, to machine translation, program language conversion such as compiling, interpreting of programs, symbolic execution, automatic theorem proving or the like.
In the above-mentioned processings, in many cases, the data transformation is performed through tree-structured data as an intermediate expression. For example, in a compiler, an input sentence as a character string is transformed into a syntax tree expressing the sentence structure in tree structure, and semantic analysis, optimization and code generation are performed on this tree-structured data, whereby the input sentence is transformed into object machine code. This process may be deemed as a process where the input symbol string is transformed according to given rules. Most devices in the prior art to perform such transformation are designed in accordance with individual transformation rules. Consequently, devices being different in design must be prepared for different transformation rules.
An example of transforming means independent of individual transformation rules is software generally called a parser generator, which implements a parsing apparatus to perform syntactic analysis according to an inputted grammar.
In the syntactic analysis, the grammar is usually described in terms of a set P of relations called grammatical rules, rewriting rules, productions, etc., a set T of terminal symbols, a set N of non-terminal symbols and a start symbol s. The grammatical rule is constituted by the left side and the right side, and represents that a symbol string on the left side can be rewritten into a symbol string on the right side. The non-terminal symbol is that which can be rewritten by the set P of the above-mentioned relations, and the terminal symbol is that which cannot be rewritten by the set P. The start symbol s is a special non-terminal symbol, and if the input sentence conforms to the grammar, the start symbol s can be rewritten into this input sentence by applying the set P to the symbol s.
One of grammars of a basic class is a context-free grammar. The context-free grammar is a grammar where the left side of the grammatical rule belonging to the set P is restricted to one non-terminal symbol, and the right side is restricted to a sequence of zero or more terminal symbols and/or non-terminal symbols, that is, a grammer where each grammatical rule is expressed as follows:
A.fwdarw.a A.epsilon.N, a.epsilon.(N.orgate.T)* PA1 [every,man,loves,mary] PA1 sentence(theta,[every,man,loves,mary],[ ])
where "( )*" represents a set of strings of zero or more elements belonging to either N or T.
In general, in a grammar treated by a parser generator in the prior art, strong restrictions are imposed on the grammatical rules. In compensation for simplification in the parsing mechanism, such a grammar is disadvantageous in that description of sentence structure, i.e., structure of a syntax tree cannot be selected freely. Particularly, the description of a grammar having recursive structure is subjected to significant restrictions.
On the contrary, in DCG (Definite Clause Grammar), the grammar is described in the form of a context-free grammar, and the grammatical rules are deemed as a prolog program and executed by a prolog processing system, whereby the syntactic analysis is carried out. Details of DCG are disclosed in F. Pereira and D. Warren, "Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks", Artificial Intelligence, vol. 13, pp. 231-278 (1980), and an outline thereof will now be described.
DCG expresses the grammar as clauses of first-order predicate logic. According to this construction, parsing a sentence in a language is interpreted as proving a theorem under an axiom system which describes this language. Further, according to a concept that a set of clauses is a program, such theorem proving can be performed automatically. In this sense, a grammar described in DCG is at the same time an axiom system under first-order predicate logic, and further can be executed as a prolog program. The execution of this program is an implementation of a top-down parser of that language.
The notation of DCG is an extension of the notation of the context-free grammar in the following two points. In the following description, a predicate symbol is expressed by a string of small letters, and a variable symbol is expressed by a string of letters starting with a capital letter.
(1) A non-terminal symbol can have an argument.
______________________________________ Example: context-free grammar np DCG np(X, S) ______________________________________
(2) On the right side of the grammatical rule, not only a list of non-terminal symbols or terminal symbols but also a procedure call can be written.
______________________________________ Example: context-free grammar name .fwdarw. NAME DCG name(name(W)) .fwdarw. [w], {is.sub.-- name(W)} ______________________________________
wherein "NAME" appearing on the right side of the context-free grammar is a terminal symbol representing a specific character string belonging to phrase category of "name". "{is.sub.-- name(W)}" on the right side of DCG represents a call for predicate "is.sub.-- name(W)" which returns a value "true" when character string "W" is a name.
For example, when a series of grammatical rules starting with
______________________________________ sentence(s(NP, VP), SO, S) noun.sub.-- phrase(NP, SO, S1), verb.sub.-- phrase(VP, S1, S). ______________________________________
(wherein SO, Sl and S represent positions in a word string) and a dictionary are given as DCG, then the grammatical rules and the dictionary are executed as a prolog program on input sentence
and term
becomes true. In the above term, "theta" is term
______________________________________ s(np(det(every),n(man),rel(nil)), vp(tv(loves),np(name(mary))) ______________________________________
which is equivalent to a syntax tree.
Introduction of the argument and the procedure call term as described above further makes the following manipulations feasible.
(1) Conditions can be given using the procedure call.
______________________________________ Example: date(D,M) .fwdarw.month(M),[D], {integer(D),0&lt;D,D&lt;32}. ______________________________________
In this example, in the grammatical rule indicating that date of M-th month D-th day is described by phrase (M) representing month and subsequent symbol D, predicate "integer(D)" becoming true when D is an integer, predicate "0&lt;D" becoming true when D is positive, and predicate "D&lt;32" becoming true when D is less than 32 are called, thereby restricting D to an integer larger than 0 and less than 32.
(2) Information depending on the context can be transported and tested using the argument.
______________________________________ Example: sentence(s(NP,VP),SO,S) .fwdarw.noun.sub.-- phrase(N,NP,SO,S1), verb.sub.-- phrase(N,VP,S1,S). . . . noun(N,n(Root),SO,S) .fwdarw.[SO,W,S],{is.sub.-- noun(W,N,Root)}. trans.sub.-- verb(N,tv(Root),SO,S) .fwdarw.[SO,W,S),{is.sub.-- trans(W,N,Root)}. . . . is.sub.-- noun(man,singular,man). is.sub.-- trans(like,plural,like). ______________________________________
In this example, the argument N transporting the number is supplied to "noun.sub.-- phrase", "verb.sub.-- phrase" and other non-terminal symbols affected by the number, and the number (singular, plural) given by the dictionary is transported to "noun.sub.-- phrase" and "verb.sub.-- phrase". The numbers transported by "noun.sub.-- phrase" and "verb.sub.-- phrase" are collated at the level of "sentence", and only when these are coincident the predicate "sentence" becomes true.
Summarizing the above, DCG has the following features.
(1) The grammar is an extension of the context-free grammar.
(2) A parsing apparatus is implemented by executing DCG as a prolog program.
(3) Result of the syntactic analysis is a term given as the argument to the non-terminal symbol used in the syntactic analysis, and an output term different from a simple syntax tree can be obtained depending on what is given as the argument.
(4) A procedure call can be described in the grammatical rule.
(5) A grammatical rule depending on context can be described.
The above-mentioned DCG enhances the syntactic analysis ability and the language conversion ability by introducing the argument and others. However, since DCG describes the grammatical rules and the dictionay as first-order predicate logic and executes this as a prolog program, these grammatical rules and dictionary become one body and cannot be separated. As a result, inconveniences occur in the maintenance (correction, supplement, deletion or the like) of the dictionary, and further the common use of the dictionary for the syntactic analysis and other processing becomes difficult.
Also, processing as a prolog program is of the top-down type. As a result, the left recursive grammatical rule cannot be processed satisfactorily. In other words, if a grammatical rule is given wherein a non-terminal symbol on the left side appears at the left end of the right side, the DCG processing will not terminate.
Further, the main object of DCG is to transform an input sentence into a syntax tree. In order to transform data of some structure into data of another structure using DCG, the structured data for input must be previously decomposed into a character string, and then must be transformed from the character string into structured data for output. Through this procedure, a part of structure information possessed originally by the input data, such as the number of components, is lost and the transformation efficiency is deteriorated.