Speech systems depend upon discrete vocabularies of words which form atomic units of recognition results. English dictionaries typically contain no more than one hundred thousand words. However, with proper names, locations, and other non-dictionary words, speech recognition systems typically employ vocabularies ten to one hundred times larger, depending on resource availability and system-specific requirements. These large speech vocabularies, or dictionaries, generally include garbage words of poor quality. Under current systems, speech vocabulary clean up and garbage word removal is performed, in part, by expensive word by word human audits.