Human-computer interaction refers to the communications and cooperation between humans and computers. Traditionally, such interactions with computers have occurred via WIMP (i.e., windows, icon, mouse, and point-and-click) interfaces. The rapid growth in diversity and complexity of computers (or computing devices) has resulted in the expansion of interfaces with which humans and computers interact.
Mobile devices (e.g., mobile phones, PDAs, portable media players, e-readers, handheld game consoles) and wearable devices (e.g., smart glasses, watches, bands, jewelry, earbuds), for example, require more applicable and modern, non-WIMP interfaces. WIMP interfaces such as a keyboard or mouse traditionally require a surface or the like on which they can be operated. Such interfaces are therefore not useful or suitable for modern computers including mobile and wearable devices. Instead, more suitable interfaces (e.g., hands-free interfaces) such as speech recognition, eye tracking and lip reading interfaces are becoming more common for human-computer interactions, particularly with mobile and wearable devices.
Speech recognition, while frequently used with mobile and wearable devices, is also used with a wide variety of computing devices or machinery, including appliances, automobiles, aircrafts, and the like. Such devices are often referred to as “voice command devices” because they can be controlled by means of human voice rather than using buttons, dials, switches, and the like. One common use of speech recognition is for voice user interfaces, which enable functionality such as voice dialing, call routing, appliance control, searching, data entry, document drafting, speech-to-text processing, aircraft operation, selecting radio stations, and playing music. Voice user interfaces have valuable applications in a range of industries including education, telephony, aerospace, video games, robotics, training, military, health care, and the like.
Voice user interfaces function by, first, recognizing audio input. Audio inputs may be prompts (e.g., “computer”), which activate and/or prepare the computing device for further input. Audio inputs may also and/or alternatively be commands (e.g., “send a text,” “call contact”), which instruct the computing device to take or perform one or more specified actions. The computing device, interacting with its software and/or an operating system, processes the prompt and/or command and, for example, retrieves information, carries out a task, or the like, based on (e.g., referencing, relying on) the lexical content of the audio input. Often, the audio input causes a mistrigger, which refers to a failure by the computing device to recognize and/or process the audio input. Mistriggers may be caused by poor quality of the audio input, grammatical errors, incomplete prompts or commands, unrecognizable accents, under-articulated speech, and the like.
One technical challenge with the use of voice user interfaces or speech recognition involves the ability to obtain information from the audio input based on the non-lexical portions of the speech. There is a need, therefore, for systems and methods that can, for example, infer human emotional states, intentions and behaviors, from the non-lexical portions of human speech. More particularly, for instance, there is a need for systems and methods that can predict the probability that mistriggers have occurred based on how (e.g., non-lexical) a user speaks in an audio input to a voice user interface.
Further, there is a fast and continuous increase in the number and types of computing devices and systems that generate data. That is, data-generating devices and systems have evolved from common desktop and laptop computers to smartphones, tablets, mobile devices, wearable devices, and the like. In fact, just about any piece of machinery, structure, or good is now capable of generating data, for example, using embedded sensors.
Typically, sensors are systems that allow for eliciting and/or collecting information. Sensors can be embedded, fixed, adhesive, movable, and wearable. Moreover, sensors can be used to obtain information (e.g., sense) about almost any parameters including humidity, temperature, pressure, force, light, images, speech, sound, gestures, touch, presence, proximity, activity, motion, location, and more.
Yet, data generated by sensors is merely an example of the vast amount of information that is being, and will be, generated and stored by computing devices. In fact, computing devices produce and/or store a number of different types of structured, semi-structured and unstructured data including user information, interactions, files (e.g., audio, video), communications (e.g., email, calls, short message service (SMS), transactions, and the like. All of this data is in turn multiplied by systems that generate additional data about the data (e.g., metadata). That is, the generated data is analyzed to identify, create and/or store correlations, patterns, signs, and more, which in turn are used in a number of industries (e.g., business, medical, government) to, for example, make better decisions, increase efficiency, minimize risk, and prevent unwanted outcomes.
In other words, data that is produced by computing devices is being used in a plethora of ways, many of which include personalizing and/or targeting decisions, information, predictions and the like for users, based in part on data generated by or about each user. For example, data generated by a user's computing device or devices (e.g., mobile device, wearable device), such as transaction information, location information, and the like can be used to identify and/or infer that user's preferred shopping times, days, stores, price points, and more. In turn, the identified and/or inferred information can be used to deliver targeted or personalized coupons, sales, and promotions, particularly at times or locations most proximate to the user's preferences.
Data generated by personal computing devices is also being used to diagnose, prevent and/or treat medical conditions. Wearable devices, for example, are often embedded with sensors such as heart rate and motion monitors. The data generated by these sensors can be used to track a user's physical well-being, monitor progress, and customize treatments. However, such data is typically used in connection with physical health states or conditions.
There is a need, therefore, for systems and methods for identifying mental health states based on data collected from computing devices. More particularly, for instance, there is a need for systems and methods for identifying symptoms and/or disorders of users based on behavioral data collected from the users' computing devices.