The increasing availability and performance of computer-based testing has greatly increased the feasibility of assessing language proficiency. However, doubts regarding the feasibility of assessing speaking proficiency remain. Recognizing the speech of language learners is particularly difficult because language learners may struggle to articulate their thoughts and can exhibit highly accented speech. Moreover, speech recognition alone is insufficient to characterize speaking proficiency of language learners from a communicative prospective. In other words, the characterization of speaking proficiency requires more than adequate comprehensibility of the speech. The content and qualitative aspects of the speech can be important in the evaluation of speaking proficiency from a communicative perspective.
Available computerized speaking assessment systems have not adequately elicited the full range of individual and interactive speaking performances in which language educators are interested. In addition, such technologies have not captured the complexities of such performances and the inferences that human evaluators make about them. Accordingly, in order to fully characterize speaking proficiency, task design (the nature of the test question), evidence identification (scoring) and evidence aggregation (psychometric modeling) need to be closely coordinated. Collectively, these three processes and related principles constitute the elements of assessment design.
Task design typically occurs during a test development phase. For example, in an evidence centered design context items are explicitly designed to elicit the evidence called for by the goals of the assessment, such as assessing speaking proficiency from a communicative perspective. Importantly, the process does not occur until the evidentiary implications of the goals of the assessment are well understood. Computer-based delivery of speaking proficiency has been criticized as a hindrance to eliciting such evidence because of limitations in the types of questions that are presented and responses that are elicited.
Assuming that the design of computer-deliverable tasks that appropriately elicit evidence called for in an assessment of speaking proficiency is possible, the appropriate scoring of such tasks is still required. Current systems have not adequately developed automated procedures for identifying evidence of speaking proficiency in cases where the content of responses cannot be reasonably anticipated (i.e., spontaneous or high entropy speech). Finally, psychometric models are needed to aggregate responses to several prompts and update the current estimate of speaking proficiency.
In recent years, significant advances in automatic speech recognition (ASR) systems have occurred. In particular, speaking proficiency systems exist that can automatically score tasks in which response patterns can be anticipated. For example, such tasks include responding orally to questions that have a single anticipated response.
While a novice level of proficiency (such as pronunciation evaluation and training) can be measured using tasks that elicit the limited range of speech required by calling for anticipated responses, higher levels of proficiency can only be tested by tasks that measure responses requiring spontaneity and adaptability to unique situations. For example, in addition to pronunciation evaluation, higher levels of proficiency can require determinations of speech content and qualitative characteristics of speech, such as intonation or other prosodic features.
Moreover, automated recognition of speech from language learners is particularly challenging because such individuals are generally less proficient with the language and can have highly accented speech. A further complexity is that merely recognizing speech is not sufficient to characterize speaking proficiency. For example, prosodic characterizations of speech samples, such as intonation, are also required. Current systems for assessing speaking proficiency do not include the ability to perform such measurements while being able to recognize spontaneous speech.
What is needed is a system and method for analyzing and scoring spontaneous (high entropy) speech.
A need exists for an automatic system and method for determining the speaking proficiency of language learners.
A further need exists for applying assessment design principles to develop an automated system for scoring speaking proficiency based on tasks that are not limited to anticipated responses.
The present disclosure is directed to solving one or more of the above-listed problems.