Speech recognition systems used to facilitate hands-free data entry face the problems of incorrectly recognizing a word due to having an acoustic model that is not well matched to the input audio, and falsely rejecting a word because of poor confidence in its accuracy. These problems result in adverse consequences such as incorrect data entry, reduced user productivity, and user frustration.
The false-rejection problem has been addressed in a previous invention (i.e., U.S. Pat. No. 7,865,362), which is incorporated herein by reference in its entirety. In that invention, a speech recognition system's performance was improved by adjusting an acceptance threshold based on knowledge of a user's expected response at a point in a dialog. In that invention, a response was considered in its entirety. That is, when all of the words in the response hypothesis matched all of the words in the expected response, then the acceptance threshold for each hypothesis word could be adjusted. This approach, however, does not adequately address responses that contain multiple pieces of information (e.g., bin location and quantity picked) spoken without pausing. In these cases, discounting the entire response because of a mismatch in one part is too severe.
Therefore, a need exists for a speech recognition system that uses a more flexible expected response/hypothesis comparison to adjust its performance or adapt its library of models. The present invention embraces using parts of the hypothesis independently. This approach eliminates scenarios in which acceptable hypothesis parts are not considered for adaptation or rejected because of mismatches in other parts of the hypothesis.