Automated agents can provide important services for humans. Examples of an automated agent can be a simple autonomous robot to provide services to an elderly person or to patrol a workplace at night. In addition, other automated agents can be a phone or vehicle that can provide navigation using captured images. A component of accomplishing navigation tasks is the ability to localize or estimate the current location of the agent and navigate reliably to reach locations in the environment.
The more affordable these agents are, the more likely such agents are to become commonly used. Some current robust techniques for agent localization and navigation employ high-precision laser sensors that provide reliable metric readings for surrounding objects. However, such high-end lasers are typically expensive, and can be too expensive to use in an agent where the cost of the agent is constrained.
Another less expensive alternative is to use cameras. While modern cameras provide excellent images for relatively low cost, using these images for localization and navigation is challenging because the images do not directly provide metric information about the environment. Instead, agents can avoid maintaining a metric map of the environment and operate directly in image space. A topological navigation can construct a graph of locations, where edges denote direct access between location nodes. Locations can be identified by sensor readings, typically pre-recorded images from a camera assigned to specific locations. An image representation can also be replaced by a condensed set of features extracted from the image to support rapid similarity computations. Then, the agent can navigate from image to image using a technique called visual homing. A well known problem that arises when using imperfect sensors is a perceptual aliasing problem, where multiple locations appear similar or even identical.
Furthermore, when a global positioning system (GPS) or any global localization scheme is not available, navigation has to be robust to illumination changes or any other changes in the scene, particularly with indoor navigation. Visible image sensors used for scene recognition can suffer from increased error when the scene illumination changes or an object in the scene moves, and an agent may not able to properly match scenes even with relatively minor changes.