Speech indexing is the process of using Automatic Voice Recognition (AVR) to create a searchable database of the content of an audio recording. Once created, this database may be used to analyze the contents of the call. Take, for example, a telephone polling agency that needs to make sure that each pollster is adhering to a predefined script so that the results are consistent or may be tracked over time. (E.g. “As a likely voter, are you more or less likely to vote Republican?” may yield a different answer than “Are you less or more likely to vote Republican if you vote?”) By creating a searchable database of what was said on each polling call, key questions and answers can be individually checked by searching the database for the key question to determine the exact time the key question was asked. The playback of the call may then be advanced to that time and a human listener can confirm the question was asked properly and the response was recorded accurately. This saves the time of the person checking the accuracy of each call as they otherwise would need to listen to the entire call to find the key question and answer.
Unfortunately, efficient AVR engines that are fast are more prone to making errors. Accordingly, speech indexing systems that rely upon these efficient AVR systems may produce false entries in the database. For example, a speech indexing system using an efficient AVR engine may misclassify the spoken phrase “likely repeat voter” as “likely Republican voter.” This may cause the resulting speech index to have multiple entries for the phrase “likely Republican voter” when it was only spoken once. Accordingly, if a person checking the call was looking for the phrase “likely Republican voter” they would need to listen to two parts of the call instead of just one. Therefore, there is a need in the art for improvements to the accuracy of speech indexing systems that use efficient AVR systems.