Word-level transcriptions may be needed to train or adapt large vocabulary continuous speech recognition (LVCSR) systems. However, it can be time-consuming and costly to obtain human transcriptions, especially when facing large-sized training data sets. Automated speech assessment, a fast-growing area in the speech research field, may use an automated speech recognition (ASR) system to recognize input speech responses, and the ASR outputs may be used to generate features for a scoring model. Since the recognition accuracy of the ASR system directly influences the quality of the speech features, especially features related to word entities (e.g., those measuring grammar accuracy and vocabulary richness), it may be important to use ASR systems with a high recognition accuracy.