Acronyms and Abbreviations are shortened forms of words or common phrases. An abbreviation is a shortened or contracted form of a word or phrase that is used to represent the whole (e.g., Dr., for among others, “Doctor” or “Drive”). An acronym is an abbreviation formed from the initial letters of other words and sometimes pronounced as a word (e.g. IBM).
A problem understanding the intended meaning that is common to both acronyms and abbreviations is that they may have more than one, and often many, possible expansions. This results in ambiguity with respect to the actual intended expansion. Context is the backdrop by which a human reader understands the intended meaning of the expansion. For example, the acronym CIA has many possible expansions, of which two well known expansions are Central Intelligence Agency and Culinary Institute of America.
In the sentence below:
“The former CIA officer accused of revving an electric drill near the head of an imprisoned terror suspect has returned to U.S. intelligence as a contractor”,
a human reader would identify that the intended expansion of the term CIA is “Central Intelligence Agency” using the context of the surrounding words for disambiguation. Here the terms “imprisoned”, “terror”, “suspect”, “intelligence” and “officer” are relevant to this disambiguation.
In the sentence below:
“Two Certified Master Chefs from the CIA have designed and tested more than 100 kitchen essentials that meet highest professional standards in gourmet cuisine”,
a human reader would identify that the intended expansion of the term CIA is “Culinary Institute of America” using the context of the surrounding words for disambiguation. Here the terms “Master Chef”, “kitchen”, “gourmet” and “cuisine” are relevant to this disambiguation.
An additional dimension to ambiguity is the question of whether a token in text is an abbreviation—to be expanded—or a legitimate word, which happens to have the same characters as an abbreviation. For instance “WAS” could be a word representing the past tense of “be” or an abbreviation for “Websphere Application Services”. Clearly treating “WAS” as an abbreviation and expanding it appropriately again depends on the context.
For humans, the task of disambiguation is most often simple, straightforward and natural. On the other hand, automated computer systems have a great deal of difficulty in extracting the intended meaning of Acronyms and Abbreviations during the process of Natural Language Processing (NLP). This problem is particularly exacerbated when the NLP task is performed in open and broad domains as opposed to narrow domains of discourse because the number of alternative interpretations is very large.