Evaluating spoken grammar skills is a primary component of evaluating the overall spoken English skills of a candidate or individual. Also, a jump in the interest to learn and improve spoken English skills coupled with the rising popularity of the Internet has fueled interest in the area of computer assisted language learning (CALL). Most of the existing CALL system approaches focus on evaluation of pronunciation, and/or syllable stress. However, such approaches do not focus on spoken grammar evaluation. Additionally, existing CALL system evaluations are conducted by human assessors, leading to subjectivity, lack-of-scalability, higher costs, etc.
In traditional approaches, to evaluate spoken grammar skills, a candidate is asked to speak on a given topic and a human assessor evaluates the candidate based on the type and the frequency of the grammatical errors committed by the candidate. However, such approaches are difficult in automatic spoken grammar evaluation because the accuracy of the current automatic speech recognition (ASR) systems is relatively low for spontaneous free speech, and the language model (LM), which plays an important role in ASR, significantly reduces the probability of recognizing grammatically incorrect sentences.
In existing automatic approaches, a prompt is played to a candidate that might have a grammatical error in it. The candidate is expected to detect any grammatical mistake and record the corresponding grammatically correct sentence, which can then be decoded by an ASR system with a LM that includes a pre-selected set of sentences. However, such approaches can still result in recognition errors. For example, such an approach makes an error when two (or more) sentences in the LM are acoustically close to each other (for example, “he kill a snake” versus “he killed a snake”). In this case, it is highly likely that a different sentence than the one actually spoken is recognized.
Also, problems can occur in such approaches when a candidate speaks a sentence which is not present in the LM but the ASR recognizes it as one of the sentences present in the LM with a high confidence. This can happen when the spoken sentence is acoustically similar to one of the sentences present in the LM.