1. Field
The following description relates to technology for recognizing a speech based on pronunciation similarity, an apparatus and method for recognizing speech based on pronunciation distances, an apparatus and method for generating a speech recognition engine, and a speech recognition engine obtained by such a method.
2. Description of Related Art
With the widespread use of digital devices, various forms of user interfaces have been proposed to allow users to operate such devices. For example, a flat panel display combined with a capacitive touch interface is widely used as a representative user interface that allows users to operate a variety of personal smart devices, such as smart phones and tablet personal computers (PCs).
A user may find a touch interface to be intuitive as the user may immediately receive a feedback on a command chosen by the user. However, the touch interface may not be easy to use under certain circumstances. For example, it is difficult to use a touch interface when both hands of the user are occupied, when a complicated command needs to be executed, when multi-step interactions are required to perform a command, or when a long text needs to be input.
A speech interface may be natural to the user and intuitive, potentially compensating for flaws found in touch interfaces. Thus, the use of speech interface is desirable in a wider range of applications such as, for example, controlling devices while driving a vehicle or using voice assistance for smart devices.
However, known speech interfaces suffer from inaccuracy. Because the accuracy of speech recognition is considered an important issue for developing speech interfaces, various methods have been proposed to increase the degree of accuracy of speech recognition.
While a recurrent deep neural network (RDNN) based speech recognition technology has been proposed as a method to improve accuracy of speech recognition, several challenges exist in wide-spread application.