One solution for reconciling these aims consists of using very reliable acoustic models making it possible to achieve a low error rate in calculating the acoustic probabilities. This solution is typically the solution implemented in automatic speech recognition devices for modern personal assistants, in particular known under the brands Siri® and Cortana®.
One drawback of this solution is that the acoustic models used require the use of significant computing powers to process very large databases. This makes the solution difficult to use in mobile situations, without a connection to a server having the computing means and memory necessary to implement this solution, which may be the case on board an aircraft.
Another solution consists of using automatic speech recognition devices with restricted syntax, i.e., for which the recognizable phrases are found in a predetermined set of possibilities. These recognition devices make it possible to achieve a very high recognition rate even with fairly unreliable acoustic models, and do not require large computing powers or large databases; they are thus very well suited for use in mobile situations.
One drawback of these devices, however, is that they only make it possible to recognize a limited number of instructions.
A third solution is disclosed in the document “Eye/voice mission planning interface (EVMPI)” (F. Hatfield, E. A. Jenkins and M. W. Jennings, December 1995). This solution consists of modifying the syntax model of the language decoder of an automatic speech recognition device based on the direction of the user's gaze. To that end, the automatic speech recognition device comprises a gaze detector to determine a point fixed by the gaze of the user on a monitor, a fusion engine suitable for modifying the syntax probability law of the syntax model based on the information communicated by an application associated with the point targeted by the user's gaze on the monitor.
This automatic speech recognition device thus makes it possible to recognize a large number of instructions, since it is able to recognize the instructions associated with each of the applications displayed on the monitor. This automatic speech recognition device makes it possible, at the same time, to obtain a good recognition rate, even with a fairly unreliable acoustic model, since the syntax model used at each moment to recognize the oral instructions pronounced by the user only has a vocabulary restricted to the vocabulary of the application looked at by the user; there is therefore a low likelihood of confusion between two words with a similar pronunciation.
Recalculating the syntax probability law in real time in this way is, however, a complex operation, difficult to carry out, slowed by the exchanges of information between the fusion engine and the applications, and which prevents the operation of the linguistic engine while the recalculation is in progress. This results in significant lag time. Furthermore, this solution may create a high error rate if the user does not look in the direction of the application affected by his instructions.
A final solution is disclosed in document FR-A-2,744,277. This solution consists of modifying the syntax model of the language decoder of an automatic speech recognition device based on different parameters, such as the parameters of the mobile carrier, the type and phase of the mission and progress or the history of commands previously executed.
This solution has the same drawbacks as the third solution described above.