1. Field of the Invention
The present invention relates to an automatic translation system, and more particularly, to an apparatus and method for automatic translation customized for documents in a restrictive domain.
2. Discussion of Related Art
Machine translation or automatic translation systems have been studied since the invention of the computer. However, in spite of its long development history, considering the current market, etc., it can hardly be said that there is an automatic translation system that provides satisfactory output quality to users in general domains.
This is because, with development of a web environment, conventional automatic translation systems like a web document translation system were developed for documents having various expressions and words. For this reason, the most basic elements of automatic translation, such as vocabularies and rules or patterns for transformation have been difficult to perfectly construct due to linguistic characteristics.
Accordingly, serious errors such as words not found in the dictionary, exceeding the coverage of analysis rules, and non-existing transformation data occurred. Consequently, actual output quality is not even close to being on a commercial level, thus blocking commercialization of such automatic translation systems.
Various problems occurring in unrestricted domains naturally attracted attempts to reduce the scope of automatic translation to a restrictive domain. For the purpose of commercialization, this was a very reasonable target considering the state of automatic translation technology at the time.
Particularly, in the case of a patent domain, which is an example of a restrictive domain, the number of patent applications filed and registered all over the world in one year has been rapidly increasing. And, interest in foreign patents as well as domestic patents has been increasing in the global era. Currently, most patent documents are translated by professional translators. Accordingly, individuals not belonging to a company have difficulty in searching for and producing patent documents in a foreign language. Companies also have difficulty due to increase in the cost and time required for patent document translation.
Meanwhile, the problems described below appear in translating a document in a restrictive domain, such as a patent document, using knowledge for a general domain.
First, the most important knowledge for automatic translation generally includes a word, analysis rules/patterns, and transformation rules/patterns. When a document corresponding to the patent domain is translated using this conventional knowledge, the first problem which comes up is unknown words. That is, in the domain of patents, extensive technical terminology of various fields such as electric science, electronics, chemistry, physics, computers, etc. is used. And, even common terms tend to take on a different meaning in a patent document.
Second, patent documents frequently contain certain expressions that are rarely used in other domains. Thus, application of conventional syntax rules or patterns used in the general domain results in a coverage problem.
Third, in automatic translation, the longer a sentence, the more ambiguous its structure. Thus, analysis time goes up significantly and structure analysis performance goes down. Accordingly, it is not easy to analyze and translate a patent document without an appropriate process for long sentences, because long sentences having hundreds of words are often found in patent documents.