(1) Field of the Invention
The present invention relates to a combined dialog and vision navigation system, and more particularly to a navigation system capable of identifying salient features of the local environment and using them as semantic content within the dialog to assist a user in navigating.
(2) Description of Related Art
While driving an automobile, people make constant reference to the local environment in their speech. This is especially true during navigation tasks. For example, consider the following real exchange between a driver and a human navigator.                <driver> “Where am I going now?”        <navigator> “About another three blocks, you'll see it up on the left side, it's a plain brown building, its got a big red sign that says Savinar and Company Luggage.”        <navigator> “In fact here, this is it, take a left. Not here, sorry, up where that white truck is taking a left.”        
In previous open dialog systems, a driver is unable to make reference to local objects. However, in navigation tasks between a driver and a human navigator, a large percentage of verbal interactions involve references to local objects from the perspective of the speaker (called deixis) with words like “that,” “this,” or “there.” Thus, what is needed is a navigational system which allows references to be made to local objects.
In previous in-vehicle dialog systems, dialog construction is not as natural as if the driver were talking with a passenger. As a result, the driver must pay more attention while interfacing with the in-vehicle dialog system, which decreases the attention the driver can give to external events such as pedestrians, dense traffic, or other moving objects. In present systems, the user has to respond to the navigation system in a particular way that is necessary for the system to work. What is needed is an open dialog system that allows visible references, which makes it much easier to communicate and requires less cognitive attention.
Commercial navigation systems enable a user to follow visual directions on a map or receive verbal directions from the system (via synthesized speech). Such commercially available navigation systems include the Garmin StreetPilot®, NavMan Tracker, Magellan® 750Nav, Autodesk® LocationLogic. Garmin StreetPilot® is produced by Garmin International, Inc., located at 1200 East 151st Street, Olathe, Kans. 66062-3426. The NavMan Tracker is produced by Navman Europe Ltd., located at 4G Gatwick House, Peeks Brook Lane, Horley, Surrey, RH6 9ST, United Kingdom. The Magellan® 750Nav is produced by Thales Navigation Inc., located at 960 Overland Court, San Dimas, Calif. 91773. Autodesk® LocationLogic is produced by Autodesk, Inc., located at 111 McInnis Parkway, San Rafael, Calif. 94903.
Most systems use static Navteq digital maps, produced by Navteq. Navteq is located at 222 Merchandise Mart, Suite 900, Chicago, Ill. 60654. Further, commercial navigation systems support turn-by-turn directions cued from global positioning system (GPS) information, sometimes with additional cues from wheel sensors and other on-board sensors that increase position accuracy; however, they do not support references to objects in the local vicinity of the vehicle, either static or dynamic.
Autodesk® Location Services and TargaSys, a division of Fiat Auto, are marketing the Autodesk® LocationLogic platform as a telematics solution. Fiat Auto is located in Turin, Italy. Among other services (e.g., real-time weather and traffic information), Fiat Auto offers a “Follow Me” navigation service to get a user to their destination efficiently. However, this service cannot make reference to local features of the environment that are changing or not represented in their static maps since it has no vision capability and no corresponding dialog model.
IBM researchers have developed an automatic in-car dialog system that carries on a conversation with the driver on various topics to keep the driver awake (an artificial passenger). IBM Corporation is located at 1133 Westchester Avenue, White Plains, N.Y. 10604. The system developed by IBM is disclosed in U.S. Pat. No. 6,236,968. The system analyzes the driver's answers together with his or her voice patterns to determine if the driver is alert while driving. The system warns the driver or changes the topic of conversation if it determines that the driver is about to fall asleep. Such a system can be used for voice-activated operation of audio equipment, such as in-car CD/DVD players, radios, and telephones, but cannot refer to anything in the local vicinity of the vehicle.
Robotics researchers have developed systems that attempt to learn from a human teacher through the use of dialog. Bugmann et al., in “Instruction-based learning for mobile robots,” Proceedings Adaptive Computing in Design and Manufacture (ACDM) 2002, describes the design of a practical system that uses unconstrained speech to teach a vision-based robot how to navigate in a miniature town. It allows the human controller to state commands to the robot, such as “Take the second right.” The robot's environment is a static-tabletop, simulated-world that it perceives as a single image at-a-time. Such vision-based navigation only operates in the model environment. In other words, Bugmann et al. describe methods for extracting “action chunks” from the dialog, and converting those into forms to which the robot can respond.
Project VITRA (Visual Translator) researched the relationship between natural language and vision, including answering questions about observations in traffic scenes on a three-dimensional (3D) model of a town. Project VITRA was disclosed in Schirra et al., “From Image Sequences to Natural Language: A First Step Towards Automatic Perception and Description of Motions,” Applied Artificial Intelligence, 1, 287-305 (1987). Also as part of VITRA, Maab et al., in “Visual Grounding of Route Descriptions in Dynamic Environments” in Proceedings of the AAAI Fall Symposium on Computational Models for Integrating Language and Vision, Cambridge, Mass., 1995, describe a computational model for an agent that can give incremental route descriptions.
Laengle et al., in “KANTRA—A Natural Language Interface for Intelligent Robots,” published in the Proceedings of the 4th International Conference on Intelligent Autonomous Systems, in March 1995, describe a robot with a manipulator arm and vision camera that can be controlled using dialog to make references to objects on a table based on visual input. Objects are described in terms of their location and orientation.
U.S. Pat. No. 4,638,445, entitled, “Autonomous Mobile Robot,” describes a system with near and far ultrasonic object sensing. Behavior of the robot can be controlled via a limited set of typed commands from a human operator. The invention does not address spoken human-computer dialog, navigation, external object references, precise sensing of object characteristics, and associated dialog modeling.
U.S. Pat. No. 5,870,701, entitled, “Control Signal Processing Method and Apparatus Having Natural Language Interfacing Capabilities,” describes a signal processing method and apparatus to enable use of simple command-based language to control a switching system. The vocabulary for this system is limited to a small (approximately 100) word vocabulary with combinations of words for different control means. Related patents for similar command-language control of systems and databases are numerous (such as U.S. Pat. Nos. 6,035,267; 5,377,103; 5,321,608; 5,197,005; and 5,115,390) and typically use text-based input.
U.S. Pat. No. 5,748,974, entitled, “Multimodal Natural Language for Cross-Application Tasks,” describes a method to combine natural language (spoken, typed, or handwritten) by any standard means from an application the user is running (the current application) to perform a task in another application (the auxiliary application) without leaving the current application. Similar database retrieval systems include U.S. Pat. No. 5,442,780.
U.S. Pat. No. 5,721,845, entitled, “Topically Organized Interface with Realistic Dialogue,” describes a method and apparatus for formulating and responding to an inquiry through an interface which is topically organized. The dialogue system interface is comprised of various topical objects wherein each domain object has a set of object values. Selection of a desired object value of a domain object yields a set of potential inquiries, corresponding to the selected value, for selection. A selected inquiry is transmitted to an underlying system for formulation of a response. The formulated response of the underlying system is then transmitted to the user through the dialogue system interface. The dialogue system generates and displays further domain objects, object values, and inquiries that are logically anticipated from the selected inquiry.
U.S. Pat. No. 6,604,022, entitled, “Robot for Autonomous Operation,” describes a robot which incorporates a body, arms with a hand grip, legs, several sensors, light elements, an audio system, and a video system. The sensors allow the robot to interact with objects in the room, and prevent the robot from traveling off an edge or bumping into obstacles. The light elements allow the robot to express moods. The audio system allows the robot to detect and transmit sounds. The video system allows a user to remotely view the area in front of the robot.
Additionally, the robot may operate in a plurality of modes, including modes that allow the robot to operate autonomously. The robot may operate autonomously in an automatic mode, a security mode, a greet mode, and a monitor mode. Further, the robot is manipulated remotely.
U.S. Pat. No. 6,539,284, entitled, “Socially Interactive Autonomous Robot,” describes a robotic system that can accept an input from a human, select dynamic content from a database wherein the dynamic content is responsive to the input, and present the human with a response corresponding to the dynamic content selection. This system does not enable a dialog between the human and the robot, only a command language (preferably in the form of a menu-based touch screen). “Dynamic content” consists of data such as weather, news, entertainment, etc., obtained from a database, and thus is not of the form that can be used to reference moving objects.
U.S. Pat. No. 6,584,377, entitled, “Legged Robot and Method for Teaching Motions thereof,” describes a legged robot that can learn a series of expressive motions by recognizing and language. The robot presses instructions from a user in the form of voice-input, and extracts and combines at least one basic motion in a time series.
U.S. Pat. No. 6,556,892, entitled, “Control Device and Control Method for Robot,” describes a system for scheduling a robot's behaviors. Such behaviors include, among other capabilities, the ability for a user to issue a command, and a dialog management unit for supervising the dialog with the user based on the user input command. This is not an open dialog interaction and the robot does not have the ability to sense specific objects and refer to them in the dialog.
U.S. Pat. No. 6,122,593, entitled, “Method and System for Providing a Preview of a Route Calculated with a Navigation System,” describes the NAVTEQ® system capability for referencing elements of a geographic database, such as roads. Other NAVTEQ® patents, such as U.S. Pat. No. 6,438,561, include the ability to use real-time traffic information in a navigation system. However, none of these describe the capability to reference moving objects in the near vicinity of the vehicle for navigation purposes.
U.S. Pat. No. 6,427,119, entitled, “Method and System for Providing Multiple Entry Points to a Vehicle Navigation Route,” relates to a method and system for providing multiple beginning instructions for navigating a vehicle from a route generator. It does not contemplate the use of a dialog in conjunction with the navigation route planning, or references to local objects. Other NAVTEQ® patents include U.S. Pat. Nos. 6,047,280; 6,122,593; and 6,173,277.
U.S. Pat. No. 6,424,912, entitled, “Method for Providing Vehicle Navigation Instructions,” describes a system using a database of road segments and a vehicle tracked with GPS. As the vehicle moves, the database is accessed for roads in the vicinity of the vehicle where a maneuver may be required.
U.S. Pat. No. 6,184,823, entitled, “Geographic Database Architecture for Representation of Named Intersections and Complex Intersections and Methods for Formation thereof and Use in a Navigation Application Program,” describes using named intersections. The intersections facilitate certain functions for an end-user of the navigation application program and enhance performance of the navigation system. The invention does not contemplate the use of dynamically tracked objects such as pedestrians and vehicles in a navigation dialog with the driver.
U.S. Pat. No. 5,177,685, entitled, “Automobile Navigation System using Real Time Spoken Driving Instructions,” describes an automobile navigation system that provides spoken instructions to guide a driver along a route. A map database, route finding algorithms, a vehicle location system, discourse generating programs, and speech generating programs are described. Based on the current position of an automobile and its route, the discourse-generating programs compose driving instructions and other messages in real-time and then speech is generated for the driver.
U.S. Pat. No. 6,498,985, entitled, “Method for Multimedia Supported Navigation and Navigational Device,” describes a device for providing driving directions. The device refers to prominent objects along the travel route, using everyday language and is supported by an optical readout. The prominent objects include only those objects that are known in advance (e.g., landmarks, monuments) and stored in the system database. Similar patents include U.S. Pat. No. 6,477,460.
Other similar patents are related to the use of a voice output to provide navigation directions to a driver. Examples of such patents include U.S. Pat. Nos. 6,587,786; 6,526,351; 6,351,698; 6,292,743; 5,736,941; and 5,406,492. However, these systems only provide references to stationary objects that are in a database and that are known in advance. Additionally, the driver cannot reference moving or stationary objects that are not already contained in the database.
Other patents relate to vehicle guidance systems that use specialized off-vehicle infrastructure, such as in an intersection. Examples of such patents include U.S. Pat. Nos. 5,126,941, and 4,907,159. These systems can track multiple vehicles in a restricted zone, such as an intersection, but do not enable the driver or system to reference these objects.
Thus, what is needed is a system and method for using visual context in navigation dialog, where the navigation system can interpret and respond to local environmental conditions. Further, what is needed is a navigation system that enables the driver of a vehicle engaged in dialog with an automated navigation system to make reference to objects in the local vicinity of the vehicle. Such objects should include those that are moving, such as pedestrians and other vehicles, and those that are salient but not recorded in a database.