Although the importance of pronunciation proficiency in foreign language learning has been more emphasized, conventional evaluation performed by human raters has restrictions as: first, human raters' evaluation faces inter- and intra-rater inconsistencies. Even a single rater frequently gives a different score when reevaluating than the score he/she previously gave to the same speech token. Second, human rating is time-consuming. When human raters' scores do not agree on a test speech token, a new rater may have to be invited. Thus, it will take a while until the final result is available. Third, human rating costs. When there are a lot of test-takers' speech to evaluate, hiring high-profile raters that are qualified and trained is difficult and costly.
Consequently, constructing automatic pronunciation evaluation systems will be useful for alleviating the difficulties. Although major language testing services including TOEFL and TOEIC appear to have started using automated systems, their methods and quality are still unveiled. The reason for less active use of automatic scoring is twofold: first, speech recognition technology has not matured enough to support automatic scoring systems. Due to incorrect pronunciation of non-native speakers, converting their speech into text was prone to many errors. However, this problem is expected to be resolved thanks to recent AI technology based on deep learning with big data. Second, more fundamental problem of automatic scoring is attributed to the difficulty in extracting effective acoustic features for automation. Efforts to contrive features simulating the rubric used in human rating have not been successful due mainly to the fact that humans tend to evaluate qualitatively depending on their linguistic knowledge and intuition, whereas machine scoring needs quantitative features.