Augmented reality (AR) is a term for a live direct or indirect view of a physical real-world environment whose elements are augmented by virtual computer-generated sensory input such as sound or graphics.
Augmented Reality is becoming an intricate part of our everyday life. Today many augmented reality applications exist which just display the scene that the camera of a device captures, with computer generated layers superimposed. The device may be for example AR glasses. When a user uses one of these applications, he needs to hold the device half a meter from his face, with a straight arm. This is inconvenient, so the breakthrough for AR will probably not come until it is possible to buy regular sized glasses that augment an image.
AR glasses do however come with an input problem. In today's augmented reality applications handling input is easy since the user is looking at a touch display; he can just touch the display to interact with it. When interacting with a displayed image when wearing a pair of AR glasses there are a few of options.                1. Voice commands. The user may ask his device about what he is seeing: e.g. “What is the species of that tree that I am looking at?”.        2. Video capture of gestures, e.g. the user points at the object he wants to interact with and the video of his gesture is analysed to determine his intention.        3. Eye tracking, e.g. the user stares at the object he wants to interact with.        
The problems with (2) and (3) is that they are usually socially unacceptable, a user may feel uncomfortable gesturing wildly or staring intensely at an object of interest.
The problem with (1) is that voice recognition is tricky; to be able to do speech recognition with any type of acceptable accuracy it is necessary to have a context in which to interpret the speech. E.g. when ordering tickets with an automated ticketing system, e.g. by telephone, there is a known context. The system knows what the potential destinations are; hence it may reduce the search space when interpreting the speech. Even so, as anyone who has used one of these telephony systems knows, they are far from perfect, and it causes a lot of frustration.
The problem with using speech recognition for AR is that there is no particular context, as in the case with the ticketing system. The context is everything that is visible to the naked eye, which will make it very difficult to use for AR purposes.