The present invention relates to computerized language systems. In particular, the present invention relates to dictionaries used in computerized language systems.
Computerized language systems include a wide array of computer implemented functions that manipulate language to improve communication between a computer and a user. Examples include text-to-speech and speech-to-text converters, as well as natural language systems. In each of these systems, the computer must be able to determine the syntax of a sentence. In speech systems the syntax allows the computer to identify the proper tonal inflection for the speech. In natural language systems, the syntax allows the computer to identify the key words in a sentence.
To determine syntax in a sentence, computerized language systems rely on dictionaries that list valid words for a particular language. Preferably, each dictionary entry indicates the word's part of speech and its stem, also known as its lemma. For example, a dictionary entry for "wash" would indicate that the word is a noun and a verb, while the entry for "elate" would indicate that the word is only a verb.
In the art, such dictionaries are built by hand. This requires a great deal of time, which greatly increases the cost of producing computerized language systems for the various languages of the world.