This invention relates generally to modifying a computer database and specifically to adding a word to a lexical transducer in a computer system.
Computers have commonly been used in language related applications. For example, text searching, spell checking, on-line dictionaries, speech recognition, automatic writing, etc., each involving representing and accessing words within a computer system. As is true in most computer applications, the manner in which data is represented within the computer (the "database") is a major factor in determining the speed, efficiency, and versatility of the application.
In order for a computer language application to be effective, the database employed by the application generally needs to be large. The database should include most of the "base" forms of words used in the language and also "surface" forms derived from the base forms of those words. For example, plural forms of singular words, verb forms of nouns, etc., must be represented if the database is to be a complete one. Surface and base forms of words should be associated with each other, so that, for example, a reference to the word "swim" (base form) will access forms such as "swam," "swum," "swims," "swimming" and "swimmer" (surface forms).
A database of word forms is created through a "compilation" process. This involves starting with base forms and generating the various surface forms by using grammatical rules. The application of rules to base forms to generate surface forms is complex and time-consuming because of the non-standard characteristics of language that require numerous rules to be applied. Often the rules have many exceptions or are limited in their application to different types of words. In addition to generating words, the words are translated into a computer-understandable representation and stored as the completed database. The steps of generation and translation are referred to as "compiling" and are performed by software called a "compiler."
Some compilers output a database in a specific form called a "finite state transducer" ("FST"). An FST is a finite state automaton in which state transitions (arcs) are labelled by a pair of symbols and not by a single symbol as in a simple finite state automaton (or, equivalently, finite state machine, "FSM"). A special form of an FST is a "lexical transducer" ("LT"). An LT is a specialized FST that maps base forms to inflected surface forms and vice versa.
It is desirable to provide an end user of the application with only the database and not the compiler to create the database. However, an end user often needs to modify the database. Most commonly, the user may wish to add a word to the database. Traditionally, this would mean that the user would have to define the word as new data to the compiler and execute the entire compilation process over again to end up with a database including that new word.
Accordingly, an invention which allows an end user to add to, or modify, an existing compiled database without performing a compilation procedure is desirable.