There are many known methods of analyzing and generating a text sentence by using a computer. Those methods can be roughly classified into two groups depending on whether analysis and generation of text sentence are performed based on rules established by human beings or established via statistical learning. In the methods of the former group, processing is performed using a sufficiently wide variety of knowledge. In the methods of the latter group, processing is performed using a sufficiently large amount of simple knowledge to improve accuracy of the method.
In order to perform correct analysis and generate a good text sentence, it is desirable to use a wide variety of knowledge such as knowledge obtained from surface information appearing in or among sentences, knowledge described in a dictionary, linguistic knowledge, etc.
However, in the former method, very complicated rules are required because various kinds of knowledge are treated. The increase in complexity in rules can result in an increase in possibility that conflicts occur among rules. In some cases, arbitration of conflicts among rules is difficult.
If a wide variety of knowledge is used in the latter method, overtraining often occurs. To avoid overtraining, a large amount of learning data is necessary. In the latter method, if a wide variety of knowledge is used and learning is performed properly, the processing accuracy can be improved. However, in the latter method, use of a wide variety of knowledge has not been considered with a few exceptions.
The inventors of the present invention have proposed a new model for text sentence analysis and generation based on the statistical learning. The details of this technique are disclosed in Japanese Unexamined Patent Application No. 2002-334076. This technique is mainly based on the maximum entropy principle, and can efficiently deal with a wide variety of knowledge without falling in overtraining. Experiments have revealed that this technique can provide higher accuracy compared with the conventional statistical methods. In this method, a technique has been established as to how efficiently to use knowledge obtained by learning data, dictionary knowledge, linguistic knowledge, etc., and as to what kind of knowledge should be used in text sentence analysis and generation.
A specific example of a text generation system is disclosed, for example, in Japanese Unexamined Patent Application No. 2003-196280 filed by the present applicant. In this system, when one or more keywords are input, text sentences including the input keywords are extracted from a database, and morphological and syntax structure analysis is performed on the extracted text sentences. Based on the result of the analysis, a text sentence including the keywords is generated.
In a system disclosed in Japanese Unexamined Patent Application No. 2003-271592, word-unit candidates are generated from input keywords, and dependency relationships among the word-unit candidates are assumed. A text sentence candidate is generated according to the assumed dependency relationships. This method allows a natural text sentence to be produced from a small number of keywords.
The techniques described above generates a text sentence in a certain language, for example, Japanese, from keywords in the same language and do not generate a text sentence in a language different from the language of keywords. That is, in the known techniques, a text sentence in the same language as the language of input keywords is generated based on a monolingual corpus, and application of the method disclosed in Japanese Unexamined Patent Application No. 2003-271592 above mentioned to generation of a text sentence in a language different from the language of input keywords has not been achieved.
Machine translation is known as a technique to output a text sentence in a language different from a language of an input text sentence. In machine translation, in general, an input text sentence in a source language is analyzed, and a translation thereof in a target language is generated from the analysis result.
If a natural text sentence can be output from keywords without needing a full text sentence, it becomes very convenient for a user to communicate with another user.
In recent years, it has become easy for a large number of people over the world to communicate with one another via a network. However, there is still a language barrier, which makes it difficult to communicate with one another speaking different languages. Although great advances in machine translation have been made, commercially available machine translation systems are not sufficiently high in performance to allow users speaking different languages to communicate easily with one another.
Thus, there is a need for a target-language text sentence generation method that eliminates the language barrier and that allows users to easily communication with one another in various nations.
In view of the above, an object of the present invention is to provide a method for generating a natural text sentence in a target language different from a source language, based on one or more keywords of the source language given by a user and target language text sentence generating apparatus.