1. Field
The present specification generally relates to systems, computer-program products and methods for annotating documents and, more particularly, to systems, computer-program products, and methods for annotating documents by resolving abbreviated text with expanded forms as found in one or more controlled vocabularies.
2. Technical Background
Electronic text documents may be annotated with information. Annotations may be provided in metadata, for example. Markup languages, such as XML, may be utilized to provide additional information regarding an electronic text document beyond the original text. In some cases, an electronic text document is annotated with information regarding the subject matter discussed within the electronic text document.
In text documents, such as scientific text documents, there is a strong tendency to economize on words and space by use of abbreviation. For example, a common pattern in text documents mentioning species is to abbreviate the family name. For instance, “bacillus anthracis” is abbreviated as “B. anthracis;” and “Zaprionus indianus” is abbreviated as “Z. indianus.”
Another pattern, which is semantically distinct but syntactically identical to abbreviating the family name described above, is the abbreviation of names of people. For example, the name “Stuart Hall” may be abbreviated as “S. Hall.”
Expansion of abbreviation term patterns not only enhances recall of entities defined in a controlled vocabulary (e.g. a thesaurus), but also improves precision in automatically identifying concepts described within the text document. For example, “D. melanogaster” might be incorrectly identified as “melanogaster” during automatic text analysis, which is unrelated to Drosophila. 
Accordingly, a need exists for alternative methods of annotating electronic text documents that expand abbreviations into their full multi-word form.