Text generation is one of the elemental technologies used for a variety of natural language processing applications, such as a machine translation, summarization, and dialogue systems. Recently, many corpora have become available, and therefore, these corpora are used for generating natural text. One typical example is a language model used for machine translation, which translates a source language to a target language.
For example, Japanese Patent Application No. 2001-395618 by the present inventors discloses a text generation system in which replaced words and phrases in a target language are ordered in a sequence having the most likelihood so as to generate the target language. In general, an input to a language model is a word set. The function of the language model is primarily to sort these words.
Such a known system assumes that sorting input words in a word set can generate natural language text. That is, a word set for generating natural text must be given by a translation model without excess and shortages.
However, this assumption requires a large translation corpus. Even when the Japanese language, which has a relatively excellent corpus, is the source language, the above-described known method sometimes cannot provide a satisfactory text generation, depending upon the status of the translation corpus of the target language and the corpus of the target language.
Additionally, although the above-described patent document can complement some words, it is only supplementary and cannot efficiently complement the associated words.
This problem is not limited to machine translation. In general, the problem occurs in any text generation. Similarly, if a source language text is not complete, that is, if the source language text is a result of erroneous OCR recognition or erroneous speech recognition, accurate text generation cannot be obtained, which is a problem.