Hereinafter, a “conversation” is a verbal communication between two humans, and not a verbal instruction from a human to a machine, application, or system. For example, a conversation is a verbal communication occurring between two human occupants of a vehicle, e.g., between a driver and a passenger in a car, and not a trigger instruction, command, or phrase specifically configured to be spoken by a human, e.g., the driver, to invoke an intelligent assistant application on a device, e.g., on the driver's smartphone.
Passive listening is the process of receiving speech as input into an application executing on a device, and processing the speech to perform some action. The application processes the speech to detect the presence of a trigger word or phrase in the speech, and performs an action that is configured corresponding to the detection of the trigger word or phrase.
Many intelligent assistance applications, such as Siri in devices operating on Apple's software, Cortana in devices operating on Microsoft's software, Google Now in devices operating on Google's software, and Echo or Alexa in devices operating on Amazon's software use passive listening to detect trigger phrases configured to launch their respective applications. (Apple, Siri, iOS or their combinations are trademarks owned by Apple Inc. in the United States and in other countries. Microsoft, Cortana, Windows, or their combinations are trademarks owned by Microsoft Corporation in the United States and in other countries. Google, Google Now, Android, or their combinations are trademarks owned by Google Inc. in the United States and in other countries. Amazon, Echo, Alexa, or their combinations are trademarks owned by Amazon.com Inc. in the United States and in other countries.)
For example, a verbal command, such as “Hey Siri”, when detected during passive listening invokes the Siri application on an iOS device. Similarly, trigger phrase “OK Google” invokes Google Now application on an Android device, trigger phrase “Hi Cortana” invokes Cortana application on a Windows device, and “Alexa” “Amazon” or “Echo” trigger words or combinations invoke Amazon Echo application on an Amazon device.
Presently available intelligent assistance technology performs passive listening only to detect preconfigured trigger words or phrases in a speech input. Furthermore, presently available intelligent assistance technology must be preconfigured to associate specific actions with specific trigger words or phrases. For example, “Hey Siri, open maps” invokes Siri intelligent assistant, and launches a map application that has been preconfigured to correspond to the trigger phrase “open maps” when Siri app is invoked.
A wireless data processing system, wireless data communication device, or a wireless computing platform is collectively and interchangeably referred to herein as a “mobile device” or “mobile devices”. Wearable devices are a category of mobile devices. A wearable device is essentially a mobile device, but has a form-factor that is suitable for wearing the device on a user's person. A user can wear such a device as an article of clothing, clothing or fashion accessory, jewelry, a prosthetic or aiding apparatus, an item in an ensemble carried by or with a person, an article or gadget for convenience, and the like. Some examples of presently available wearable devices include, but are not limited to, smart watches, interactive eyewear, devices embedded in footwear, devices wearable as rings or pendants, and pedometers and other clip-ons.
Some wearable devices are independent wearable devices in that they can operate as stand-alone mobile devices. Such a wearable device either includes some or all the capabilities of a mobile device described above or does not need or use the capabilities of a mobile device described above. Other wearable devices are dependent wearable devices in that they operate in conjunction with a mobile device. Such a wearable device performs certain functions while in communication with a mobile device described above.
Natural language is written or spoken language having a form that is employed by humans for primarily communicating with other humans or with systems having a natural language interface.
Natural language processing (NLP) is a technique that facilitates exchange of information between humans and data processing systems. For example, one branch of NLP pertains to transforming human readable or human understandable content into machine usable data. For example, NLP engines are presently usable to accept input content such as a newspaper article or human speech, and produce structured data, such as an outline of the input content, most significant and least significant parts, a subject, a reference, dependencies within the content, and the like, from the given content.
Shallow parsing is a term used to describe lexical parsing of a given content using NLP. For example, given a sentence, an NLP engine determining what the sentence semantically means according to the grammar of the language of the sentence is the process of lexical parsing, to wit, shallow parsing. In contrast, deep parsing is a process of recognizing the relationships, predicates, or dependencies, and thereby extracting new, hidden, indirect, or detailed structural information and contextual meaning from content portions in a given document or some corpora.