Mobile devices occupy an increasingly prominent niche in the evolving marketplace, serving as access points at various stages of conducting a seemingly infinite number of activities. As this trend continues, mobile devices and mobile network capabilities provided thereby are leveraged in an increasing number and breadth of scenarios. Recent examples include the extension of mobile technology to provide a host of financial services such as check deposit, bill payment, account management, etc. In addition, location data gathered via mobile devices are utilized in an increasing number of applications, e.g. to provide targeted advertising, situational awareness, etc.
As the mobile development community finds new utility for devices, users are presented with more numerous, complex, and specific opportunities to provide input required by or advantageous to the underlying process the mobile device is utilized to perform. In addition, the context of the situations in which a user may interact with, or provide input to, a process continues diversifying.
This diversification naturally includes expansion into niches where the implemented technique may not necessarily be the most optimal or even an acceptable approach from the perspective of the user. In a culture where a fraction of a second determines the difference between an acceptable and unacceptable solution to a given challenge, developers seek every possible performance advantage to accomplish superior technology.
For example, several well-known inefficiencies exist with respect to user input received via a mobile device. A first inefficiency is small screen size typical to mobile devices, particularly mobile phones. Since the conventional “smartphone” excludes a physical keyboard and pointer device, relying instead on touchscreen technology, the amount of physical space allocated to a given key on a virtual “keyboard” displayed on the mobile device screen is much smaller than possible for a human finger to accurately and precisely invoke. As a result, typographical errors are common when considering textual user input received via a mobile device.
In order to combat this limitation, typical mobile devices employ powerful predictive analytics and dictionaries to “learn” a given user's input behavior. Based on the predictive model developed, the mobile device is capable of predicting the user's intended input text when the user's actual input corresponds to text that does not fit within defined norms, patterns, etc. The most visible example of utilizing such a predictive analysis and dictionary is embodied in conventional “autocorrect” functionality available with most typical mobile devices.
However, these “autocorrect” approaches are notorious in the mobile community for producing incorrect, or even inappropriate, predictions. While in some contexts these inaccuracies are humorous, the prevalence of erroneous predictions results in miscommunication and errors that frustrate the underlying process, the user, and ultimately defeat the adoption and utility of mobile devices in a wide variety of contexts to which a mobile device could be leveraged for great benefit.
As a result, some developers have turned to alternative sources of input, and techniques for gathering input via a mobile device. For example, most solutions have focused on utilizing audio input as an alternative or supplement to textual input (i.e. tactile input received via a virtual keyboard shown on the mobile device display). In practice, this technique has conventionally been embodied as an integration of voice recognition functionality of the mobile device (e.g. as conferred via a “virtual assistant” such as “Siri” on an APPLE mobile device (iOS 5.0 or higher)).
The illustrative embodiment of this audio input extension being added to a mobile keyboard is demonstrated in the figure depicted below. While this figure displays an interface generated using APPLE's iOS mobile operating system, similar functionality may be found on other platforms such as ANDROID, MICROSOFT SURFACE RT, etc. as well.
Audio input may be received via integrating an extension into the mobile virtual keyboard that facilitates the user providing input other than the typical tactile input received via the mobile device display. In one approach, the audio extension appears as a button depicting a microphone icon or symbol, immediately adjacent the space bar (at left). A user may interact with a field configured to accept textual input, e.g. a field on an online form, PDF, etc. The mobile device leverages the operating system to invoke the mobile virtual keyboard user interface in response to detecting the user's interaction with a field. The user then optionally provides tactile input to enter the desired text, or interacts with the audio extension to invoke an audio input interface. In the art, this technique is commonly known as “speech-to-text” functionality that accepts audio input and converts received audio input into textual information.
Upon invoking the audio input interface, and optionally in response to receiving additional input from the user via the mobile device display (e.g. tapping the audio extension a second time to indicate initiation of audio input), the user provides audio input, which is analyzed by the mobile device voice recognition component, converted into text and input into the field with which the user interacted to invoke the mobile virtual keyboard.
Via integration of audio input to the textual input/output capabilities of a mobile device, a user is enabled to input textual information in a hands-free approach that broadens the applicable utility of the device to a whole host of contexts otherwise not possible. For example, a user may generate a text message exclusively using audio input, according to these approaches. However, these approaches are also plagued by similarly-frustrating and performance-degrading inaccuracies and inconsistencies well known for existing voice recognition technology. As a result, current voice recognition approaches to supplementing or replacing textual input are unsatisfactory.
Voice recognition currently available is known for being subject to failure—often the voice recognition software is simply incapable of recognizing the unique vocalization exhibited by a particular individual. Similarly, voice recognition is prone to “audiographical” errors (i.e. errors analogous to “typographical” errors for audio input, such as falsely “recognizing” a vocalized word).
Furthermore, voice recognition is inherently limited by the existence of a predetermined set of rules (e.g. a set of assumptions or conditions that may be defined based on the language being spoken). Further still, since usage often differs significantly between spoken and written versions of the same language, it may not even be possible to utilize audio input as a supplement or alternative to textual input. For example, audio input is often an unworkable alternative to tactile input in circumstances where the expected form of expression and/or usage (which often define the “rules” upon which vocal recognition relies) correspond to the written form of a language.
Voice recognition is also an inferior tool to utilize for acquiring or validating user input corresponding to information not typically or capable of expression in words. The prototypical example of these limitations is demonstrable from the perspective of user input that includes symbols, such as often utilized to label units of measure. Even where these units of measure have commonly-accepted vocalizations (e.g. the unit of currency known as “U.S. dollars” corresponds to the symbol “$”), these vocalizations are not necessarily unique usages of the corresponding term (e.g. “pounds” may correspond to either a unit of measuring weight, i.e. “lbs.” or a unit of currency, e.g. “£”, depending on context).
Voice recognition is also unsuitable for receiving and processing textual input that includes grammatical symbols (e.g. one or more “symbols” used to convey grammatical information, such as a comma “,” semicolon “;” period “.” and etc.) or formatting input, which includes symbols that do not necessarily have any corresponding physical representation in the language expressed (e.g. a carriage return, tab, space, particular text alignment, etc.).
Other existing approaches include the use of optical input as a supplement to textual input, but these techniques merely present the capability to combine textual input with an image or video clip, and distribute the combined input via a user's preferred form of communication (i.e. SMS text message, email, video chat, etc.). These conventional approaches typically include a combined input interface that facilitates receipt of tactile input via the mobile device virtual keyboard, and optical input via a separate button placed on the input interface (but not necessarily included in the virtual keyboard as is the case for audio input functionalities discussed above).
Upon the user interacting with this separate button, the device facilitates including previously-captured optical input or alternatively invoking a capture interface to capture new optical input and include the previously- or newly-captured optical input in addition to any textual information input by the user providing tactile input to the mobile virtual keyboard.
As a result of the foregoing, existing optical and audio input integration via mobile devices is severely limited as a supplemental or alternative approach to receiving and processing user input via the mobile device. Existing strategies allow cumbersome input of audio for voice recognition, or input of an image to supplement textual input. However, these techniques fail to integrate these various input capabilities in a context-sensitive manner that provides an intelligent alternative and/or supplement to textual input via a mobile device.
Ensuring the additional input capability is invoked in a productive manner that assists rather than degrades the performance of the device and the user's interactions therewith is a complex task, requiring careful consideration of various contexts in which optical input would be useful, and the appropriate conditions for capturing and/or analyzing the optical input to accomplish the context-sensitive benefits offered by intelligently integrating the mobile device camera as a source of input to receive textual information from the user.
Therefore, it would be highly beneficial to provide new methods, systems and/or computer program product technologies configured to supplement and/or replace tactile and auditory input as a mechanism for receiving user input and generating output, especially output that is determined in whole or in part based upon the received input and context of the situation in which input was received or the purpose for which the input is provided.