Semi-autonomous agents or bots can perform various actions in the physical and virtual worlds. For example, in the physical world, robots assemble cars and clean swimming pools, and unmanned arial vehicles (UAVs) perform complex missions such as urban surveillance and fuel resupply missions. Likewise, in the virtual world, such as first person shooter games, or real-time strategy games, and military training simulators, for example, bots may perform some actions based on scripted rules and low level user inputs.
The state-of-the-art autonomous agents in physical and virtual worlds have limited conversational abilities. Moreover, none are capable of answering locative queries, i.e., “where is . . . ” questions. Further, an agent's autonomy, i.e., its ability to operate with little or no human supervision or operation, is very limited. For example, UAVs require six operators to fly a mission and the virtual bots require complex and unnatural controller inputs. Additionally, natural language communication with bots is limited to simple commands. For example, a driver can turn on the radio in a car using voice commands, but cannot ask any questions and receive answers. These challenges inherently and substantially limit the practical uses for autonomous agents.
In one aspect, the present invention provides physical and virtual bots with a vastly superior level of autonomy than the current state-of-the-art. Specifically, virtual bots (“communicative bots”) in accordance with the present invention are capable of acting based on natural language commands, having a task-based conversation, and answering locative questions. The practical applications of such bots are endless.
With regard to locative question answering, one can imagine a common household scenario, where a farsighted elderly person asks his/her spouse “Where are my reading glasses?”, to which the spouse responds, “They are directly behind you on the dresser”. There are two main components of such a response. First, the landmarks such as you and the dresser, and second, the terms such as behind and on that describe the spatial relations of the target object, that is, reading glasses, with the landmarks and possibly between the landmark objects.
To create a response to such locative questions, an agent must decide which and how many landmarks to use and what spatial relations go with them. Past efforts describing landmark selection in the spatial reasoning field outline a set of selection criteria for landmarks, such as, it is bigger and more complex in shape than the target object, immediately perceivable, and less mobile. Past efforts also describe an influence model for landmark selection, which include search space optimization, listener location, and brevity of communication. However, these efforts fall short of operationalizing their approach and do not present a model that provides a clear description of how many landmarks to select. In contrast, the present invention provides a computational model for landmark selection and location description generation which minimizes the listener's mental cost in locating the target object.
One of the key insights employed by the present invention is that, in asking the locative question, the requester is trying to reach or find the target object, and the best response description is the one that minimizes the amount of search or looking around that the listener must do to locate the target object.
For example, imagine a force protection unit asking a scout bot, “where are the insurgents?” The scout bot in accordance with the present invention can respond, for example, “on the roof top of the building two blocks ahead of you.” As another example, a worried parent of a lost child in a museum who was previously tagged with a location sensor may ask a locator bot in accordance with the present invention, “where is Johnny′?” using her cell phone. The locator bot in accordance with the present invention can respond, for example “in front of the drinking fountain next to the dinosaur exhibit to your right.” As another example, in a multiplayer video game such as Call of Duty Black Ops™ (Call Of Duty™, 2011) a player could ask his virtual team mate “where are you” and it would provide a suitable answer.
The same approach can be used to help shoppers locate products in a retail store and the retailer can cross-sell and up-sell products to the shopper based on the questions they ask. No currently available system or methods provide such abilities. While there are many commercial systems for outdoor wide area navigation, there is little in the way of indoor navigation support such as shopping malls, large ships, and public parks.
The present invention thus provides a computational approach for representing and reasoning about the locations of objects in the world to construct a response to locative queries. The present invention can be implemented in an embodied software bot (a bot that can sense the world around it, reason about what it senses, and act) that answers locative (e.g., “Where is . . . ?”) questions.
The present invention further provides, in part, an approach for implementing a Communicative Agent for Spatio-Temporal Reasoning (called CoASTeR™, in one embodiment) that responds to multi-modal inputs (speech, gesture, and sketch) to dramatically simplify and improve locative question answering in virtual worlds, among other tasks. In one aspect, the present invention provides a software system architecture workbench that includes CoASTeR. The components of an agent as provided in accordance with one embodiment of the present invention can include one or more sensors, actuators, and cognition elements, such as interpreters, executive function elements, working memory, long term memory and reasoners for responses to locative queries. Further, the present invention provides, in part, a locative question answering algorithm, along with the command structure, vocabulary, and the dialog that an agent is designed to support in accordance with various embodiments of the present invention.