Abbreviations are used extensively in a variety of domains as replacements for their expansions. As used in this specification, an abbreviation refers to a shortened form of a phrase that is derived from letters of the phrase. For example, “UN” is an abbreviation for “United Nations,” “XML” is an abbreviation for “extensible Markup Language,” “PAC” is an abbreviation for “Political Action Committee,” and “codec” is an abbreviation for “Coder/Decoder” or “Compressor/Decompressor.” An abbreviation may or may not be an acronym, which is generally considered to be an abbreviation that is derived from the initial letters of a phrase. For example, “PAC” is an acronym derived from “Political Action Committee,” and “Sgt.” is an abbreviation, but not an acronym, derived from “Sergeant.”
Abbreviations allow a writer in a specific domain to effectively convey concepts without burdening a reader experienced in the domain with the full expansion. For example, in the field of biochemistry the abbreviation “DNA” conveys the concept of “deoxyribonucleic acid” without the cumbersome expansion. Because abbreviations are so pervasive in some domains, it can be difficult for a reader, even one experienced in the domain, to understand the expansion of all abbreviations. In addition, many abbreviations have multiple expansions within a domain and between domains. For example, in the domain of human resources, the abbreviation “PTO” may stand for “Paid Time Off” or “Personal Time Off.” The abbreviation “PTO” may also stand for “Power Take-Off” in the domain of farm machinery, for “Patent and Trademark Office” in the domain of patent law, and for “Parent Teacher Organization” in the domain of education.
Many web-based abbreviation dictionaries are available to assist users in determining the meaning of abbreviations. To use these dictionaries, a user inputs the abbreviation and receives possible expansions for the abbreviation. One abbreviation dictionary reports 73 possible expansions for the abbreviation “PTO.” These abbreviation dictionaries are typically built by human editors who review published documents for abbreviations and their expansions. To ensure that the abbreviation dictionaries are up to date, some dictionaries allow users to submit new abbreviations and their expansions. Some of these abbreviation dictionaries may have editors who carefully review each submission and determine whether it should be added to the dictionary, while other abbreviation dictionaries simply add submissions to the dictionary without any editorial review. At least one abbreviation dictionary uses information retrieval techniques and heuristics to identify expansions for abbreviations.
The use of human editors to maintain an abbreviation dictionary presents several problems. First, it can be time-consuming and expensive to manually create an abbreviation dictionary. Second, because so many documents are being published on a daily basis, it is virtually impossible for human editors to manually identify all new abbreviations and their expansions. The use of heuristics for automatically maintaining an abbreviation dictionary also presents problems. First, the accuracy of the abbreviation expansions depends on the effectiveness of the heuristics, which in general cannot perform as well as a human editor. Second, it is difficult to extend such heuristics to account for new types of abbreviations such as “XML” for “extensible Markup Language” in which the initial letter of a word of an expansion is not used, but a non-initial letter is used.