There is a strong need felt in industry and academia for effective natural language processing (NLP). Among the goals of natural language processing is to enable automated systems such as computers to perform functions on an input of natural human language. This would tremendously multiply the capabilities of computing environments in a broad range of applications. However, despite substantial investigation by workers in artificial intelligence and linguistics, effective natural language processing has remained elusive. Additionally, different attempted solutions have been developed and applied from one application to another, causing inconsistencies that prevent NLP interaction between applications.
Furthermore, there are special problems in trying to develop NLP systems for certain languages that use non-alphabetic writing systems. For example, one such language is Chinese, which uses a largely logographic writing system, wherein thousands of characters are used, each functioning as a logogram—that is, representing a concept rather than a particular sound, as in an alphabetic writing system such as that used for English and other Western languages. A single character may represent a word, or two or more characters may together represent a single word. Additionally, the characters are traditionally written in a continuous string, without spacing separating one word from the next, as is typically in alphabetic writing systems. This adds an extra layer of ambiguity relative to languages written alphabetically: the ambiguity in the proper boundaries between words from among a continuous string of logograms, that may be one or several to a word. This ambiguity has posed a formidable additional obstacle to NLP systems in languages using logographic writing systems as opposed to those using alphabetic writing systems. Still other languages are written with a substantially syllabary writing system, in which each character represents a syllable. For example, Japanese is written with a mixture of logographic (kanji) and syllabary (hiragana and katakana) characters. The hiragana characters sometimes give hints on how to separate words and phrases, while the kanji and katakana characters likely would not, therefore also presenting an additional layer of ambiguity not encountered in NLP with Western writing systems.
Therefore, there is a persistent need for better methods and systems of natural language processing, particularly in non-alphabetic languages.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.