Assessment of a response's content correctness is often performed in education and in other domains. Such a situation may occur, for example, where a language proficiency test is administered to aspiring teachers who are non-native English speakers. The spoken responses elicited by the test prompts may have varying degrees of predictability. For example, on the highly-predictable end of the spectrum the examinee may be asked to read a passage aloud, and on the other end of the spectrum the examinee may be asked to provide an open-ended spontaneous response, such as stating an opinion on an issue. In between these extremes are moderately predictable responses that are typically shorter and more constrained by the context of the item stimuli and test prompts as compared to an open-ended response (e.g., the examinee may be asked to instruct a class of students to open their text books to page 55). These types of moderately predictable responses are typically scored manually, which is often costly, time-consuming, and lacks objectivity. The problem is further exacerbated where the number of responses that need to be scored is large.