The present invention is concerned with the problem of enabling a human operator of a teleoperated system (such as a mobile explosive ordnance disposal robot) or of a remote surveillance system, to visualise efficiently the location and orientation of various objects and obstacles in a remote environment, and to visualise efficiently the location and orientation of the remote system, or teleoperator, itself with respect to the remote environment, that is, with respect to various objects and obstacles in that environment, and to operate efficiently the various functions of the remote system, that is, to control its locomotion and to operate any of the teleoperator's effectors, such as robotic arms, grippers, weapons, etc., with respect to the remote environment. The terms "teleoperator" or "robot" include any system, such as a mobile robot, which can be controlled at a distance and from which visual information is fed by means of a video signal to a human controller. The terms also include a video camera system alone, without remote vehicle platform or telemanipulator, as used for remote surveillance.
In conventional systems, the ability to carry out these functions is limited primarily by the ability of the human operator to view the remote environment. Typically, a closed circuit monoscopic video system is used with such systems. A closed circuit monoscopic video system includes a single video camera mounted on or near the mobile robot and the human operator views the remote environment via a single video monitor. The term "remote" is used here in a general sense, to refer to any separation between the observer and the camera(s), that is, either a physical or a temporal or a functional separation. There are a number of visualisation problems which commonly accompany such viewing systems and these arise from the factors briefly discussed below.
First, the resolution of the closed circuit video system is typically about 330-360 horizontally resolvable lines, depending on the quality of the (colour, solid state) video camera, optics, and monitor. This is much less than that of the human visual system during direct viewing and therefore limits the ability of the human operator to detect and recognise details. Second, unless expensive coupling hardware between the human operator's head movements and the remote camera's pan and tilt unit has been provided, which is typically not the case at present, the ability of the human operator to "look around" and assess the remote environment comfortably is greatly restricted. Third, the relatively small field of view afforded by the camera lenses being used is typically around 30.degree.-40.degree., depending on the focal length of the lens, is much less than the natural field of view of about 120.degree. of the human binocular visual system. Further, the usual reduction in scale due to the size of the viewing screen restrict the ability of the human operator to assimilate important information from the remote visual environment, such as estimating the rate at which objects are streaming through the camera's visual field, information which is necessary for the operator to estimate robot speed and to control robot locomotion accurately. Fourth, single camera video systems can, under many circumstances, severely restrict the ability of the human operator to estimate the distances between objects in the remote environment, as well as to detect the presence of objects or obstacles which otherwise tend to blend into the visual background.
The present invention is particularly concerned with the fourth problem addressed above, although it does have implications for the other viewing problems mentioned. In order to estimate "depth" information with monoscopic video systems, i.e. the relative distance of objects in the direction perpendicular to the plane of the viewing screen, the main visual cues available include relative object size wherein objects closer to the camera appear larger, motion parallax involving relative change of visual angle of moving objects, occlusion wherein closer objects block off farther objects located behind them, surface texture and lighting. Stereopsis, the important ability to perceive volumetric information by means of binocular disparity, i.e. the differences between the projections of the parts of an object onto the two retinas of an observer's eyes, is not achievable with monoscopic television systems.
In some operations carried out with remotely manipulated systems, it is necessary to estimate the distance from the robot, or from the remote cameras, to a particular object or, more particularly, to estimate the spatial coordinates of a specified object relative to the robot. Furthermore, in some operations, it is necessary to estimate the distance between two particular objects or specific points in the remote vicinity of the teleoperator. For example, an operator might want to know the distance to a particular object for purposes of orientation, weapon aiming, manoeuvring, etc. Similarly, the operator might want to indicate a particular point in space in order to issue some kind of "go to" command to the locomotion or manipulator control system, in a higher order control mode than is presently possible. In the case of a mobile explosive ordnance disposal robot, for example, instead of aiming the robot's weapon at a target manually, if the operator were to have the relative spatial coordinates of the designated target available, it would be a straightforward matter to design a microprocessor based system to direct the weapon towards the specified target.
For all of the above operations, the basic objective is to automate various teleoperator functions and thereby to improve operational efficiency, by taking advantage of the ability to make precise numerical computations afforded by available computing power. The problem in all of these applications, however, is the lack of an adequate means to communicate accurately to the computer system the essential information about spatial coordinates of objects of interest in the robot's surroundings.
Present techniques for addressing the problems outlined above consider separately two levels of problems. The first problem is with respect to the human operator's perception of the spatial relationship among various objects in the vicinity of the robot and the second problem is that of communicating the spatial coordinates of designated perceived objects or locations to the local computer system.
At present, the most common means of addressing the first problem is to continue to use monoscopic video and to rely on the various monoscopic depth cues listed above. A more advanced means of addressing the problem is to install a stereoscopic viewing capability on the mobile robot. Under many circumstances this will greatly improve the human operator's perception of the remote environment and should especially enhance operations involving, for example, (negative) obstacle avoidance, gripping and detection of camouflaged objects.
Stereoscopic video systems are used in practice to allow an observer to perceive volumetric information about all three dimensions of a (remote) environment. That is, instead of the two dimensional images displayed on the surface of a conventional video monitor, the viewer of a stereoscopic display is able also to perceive depth and distance directly within the image. In order to accomplish this, the two images produced by the two cameras at different viewpoints must be presented to the corresponding eyes of the observer separately, on either one or more than one display surface. The term "display surface" will therefore be taken here to refer to one or more display devices which are used to present left and right eye information separately to the observer's left and right eye respectively.
With respect to the second problem mentioned, there is at present no adequate practical means for the human operator to estimate an object's spatial coordinates, other than by estimating this solely on the basis of visual observation (either monoscopically or stereoscopically). On the other hand, it is possible to accomplish such measurements automatically, by making use of suitable machine vision equipment. Typically this would comprise suitably arranged remote camera(s), hardware and software for digitising camera images, pattern recognition software for recognising object features in the camera images, and software for computing the requisite spatial coordinates of designated objects of interest.
The obvious drawback to achieving the automated solution to the second problem outlined above is the expense involved in adding the necessary hardware and software components. Equally important, however, is the reliability of such an arrangement. Although great progress has been made in the area of machine vision, the general problem can not as yet be considered to be "solved". In real operational environments, potentially under poor lighting conditions, problems associated with using computer software to identify integral objects, whose features may not be easily distinguishable within a noisy and possibly complex visual environment, can be great and could impede performance of the teleoperator system as a whole. Furthermore, even if the computing power of the system is able to identify individual objects within the steroscopic camera images, the problem still remains of how to enable the human operator to indicate to the computer system which of those objects in the visual scene are of interest to the human operator.