Natural language processing is the science of making computers interpret instructions or information in the way that a person does. Consider as an example the task of setting the temperature of an oven for baking. Practically anyone can understand the spoken instruction, “set the oven to three hundred and fifty degrees.” Furthermore variations such as, “set the umm burner, I mean oven, to three hundred and fifty degrees” or “set the oven to, you know, like three hundred and fifty degrees” are understood perfectly by people.
A computer controlled oven, however, has difficulty knowing what parts of the spoken instructions to ignore even if it is able to convert the sounds of speech into text words with perfect accuracy. How is the computer supposed to interpret “umm burner, I mean oven” ? What does a person mean by “you know, like”?
The filled pauses (“umm”), parenthetical expressions (“you know”), incorrect grammar and speech repairs (“burner, I mean oven”) of natural speech are stumbling blocks for computers trying to find meaning in the spoken language of people. Researchers in natural language processing have taken the approach that the simplest way for a computer to handle these stumbling blocks is to delete them. If the computer could be trained to ignore “umm burner, I mean” or “you know, like” in the transcribed speech above, then the remaining words would be easier for a conventional text parser to interpret.
A great deal of effort has been put into developing an automatic system for identifying parts of spoken sentences that a computer would be better off ignoring for certain purposes or be used for other purposes. More specifically, systems have been developed to identify so-called edited words in transcribed speech; i.e. words that a computer should not bother trying to understand.
Charniak and Johnson [Eugene Charniak, and Mark Johnson, “Edit Detection and Parsing for Transcribed Speech”, Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics, pp 118-126, (2001) (incorporated herein by reference and hereinafter referred to as “C&J”)] presented a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words.
In order to evaluate the performance of different methods and systems for natural language processing many researchers, including Charniak and Johnson, use the Switchboard corpus provided by the Linguistic Data Consortium. The Switchboard corpus is an extensive set of transcribed telephone conversations that have been tagged by human annotators. The performance of a natural language processing system may be evaluated by comparing the results it generates with those recorded by humans.
Improving the performance of natural language processing systems depends in part on designing better disfluence identifiers of which edited-word detectors are a prime example. A disfluence identifier operates with a model which may comprise a statistically weighted set of features that act like clues to help find disfluences such as edited words.
The technical area of creating better models and feature sets is one that is ripe for innovation. Advances in the field come from researchers' deep understanding of, and ability to combine seemingly counterintuitive insights in, linguistics, statistics, and computer algorithms among other disciplines.