Technical Field
The present disclosure relates to speech recognition. More particularly, the present disclosure relates to systems and methods for end-to-end speech recognition and may be used for vastly different languages.
Description of the Related Art
Automatic Speech Recognition (ASR) is an inter-disciplinary sub-field of computational linguistics, which incorporates knowledge and research in the linguistics, computer science, and electrical engineering fields to develop methodologies and technologies that enables the recognition and translation of spoken language into text by computers and computerized devices, such as those categorized as smart technologies and robotics.
Neural networks emerged as an attractive acoustic modeling approach in ASR in the late 1980s. Since then, neural networks have been used in many aspects of speech recognition such as phoneme classification, isolated word recognition, and speaker adaptation. Many aspects of speech recognition have been taken over by a deep learning method involving long short term memory (LSTM) and recurrent neural network (RNN).
One of the challenges in speech recognition is the wide range of variability in speech and acoustics. It is challenging to building and tuning a speech recognizer adaptive to support multiple language applications with acceptable accuracy, especially when the involved languages are quite different, such as English and Mandarin.
Accordingly, what is needed are improved systems and methods for end-to-end speech recognition.