Electronic and computerized systems for helping visually impaired and blind people are well known in the art. For example, U.S. Pat. No. 5,687,136 to Borenstein describes an ultrasound based system which gives a proximity warning and is embodied in a cane. The cane is equipped with sensors to calculate the distance to nearby obstacles and guide the user away from potential collisions. The cane is pushed by the user ahead of himself and it maneuvers itself around obstacles by means of two steerable wheels. However, when the user approaches an object, like a closed door through which he wishes to pass, the cane tends to guide the user away from the door. In an effort to solve the problem, the robotic cane's guidance can be overridden or modified via a joystick incorporated into the handle. Where there is direct conflict between the commands of the computer and the human—such as walking up to a door—the cane is set to respond to the human's commands. In situations that are less clear, the computer's and human's commands are combined. For example, where a person directs the cane to turn right but the cane senses an object in the way, the cane would turn slightly to the right, accommodating both the user and sensor inputs. A task which appears beyond the capabilities of the disclosed cane is that it cannot prevent a user from stepping off a curb into traffic.
U.S. Pat. No. 5,936,528 to Kubayashi, teaches a guidance and support apparatus for a visually handicapped person, comprising a resonance tag, antenna and portable scanner. The antenna transmits microwaves from 5 to 60 MHertz. The system provides voice messages, such as “go,” “stop” and “turn to left.”
Arkenstone is a provider of reading systems for people with visual and reading disabilities, and is used to find out exactly how to get from here to there, and what is encountered along the way. A product called The Talking Map adds software to a user's talking PC, with which the user can explore his neighborhood or city. Once the route is determined in the privacy of the home, the directions are saved to a tape recorder. However, product provides no assistance in physical step-by-step navigation.
In general, the limiting factor on the amount and quality of information that can be generated by systems that sense and communicate via the electromagnetic frequency spectrum, which includes at the low end sound waves and ultrasonic frequencies just beyond audible sound at about 30 KHz, up through microwave frequencies and then lasers in the trillions of Hertz, or cycles per second, is roughly proportional to the frequency itself. Thus, laser-based communications has enormous potential for communicating detailed information.
State-of-the-art laser-based sensing systems include 3D laser imaging sensors and computerized vision sensors. A 3D laser imaging sensor may consist of a laser-based triangulation scanner designed to produce realistic 3D models rapidly. The resulting models are integrated with 3D editing tools. Laser triangulation is one of the most common techniques of 3D data acquisition. It is an active stereoscopic technique where the distance to the object is computed by means of a directional light source and a video camera.
A laser beam is deflected from a mirror onto an object being scanned. The complete object is scanned by incrementally changing first the horizontal angle, α, across a line, and then adding lines by incrementally changing the vertical angle, β. The process is identical to that used to scan an image on to a standard television receiver. For example, there may be 100 by 100 pixels, or scanned points per frame, and five frames are scanned per second. The object scatters the light, which is then collected by a video camera located at a known triangulation distance from the laser. For facilitation purposes the camera is modeled as an ideal lens and the charged coupling device (CCD) detector is modeled as flat. It can be observed that the angle and the pixel position of the scattered light are related. Since the focal length of the camera lens is known, the analysis of the resulting video image can determine the angle of the scattered light.
A pixel is defined as a 2D (2-dimensional) picture element, the smallest resolvable rectangular area of an image, either on a screen or stored in memory. Each pixel in a color image has its own brightness and color, usually represented as a triple of red, green and blue intensities. By contrast a voxel the smallest distinguishable box-shaped part of a three-dimensional space. A particular voxel is identified by the x, y and z coordinates of one of its eight corners, or its center. The term is used in 3D modelling.
The angle is also known, since it is the projection angle of the laser beam, with two angular coordinate components, α and, β. Thus, using simple trigonometry, the 3D spatial (XYZ) coordinates of a surface point can be determined. Hence the name triangulation.
Simple 3D scanning produces single projection scans from a single viewpoint only. Thus, for 360° model creation, a scan fusion, or “gluing” algorithm is critical.
The first step in gluing is to manually select three corresponding points on the two scans to be glued. This provides a starting translation and rotation vector for the scan fusion algorithm. Since the two scans must have corresponding points, the scans, which are to be glued, must overlap at least ½″. Once the points are selected the scan fusion algorithm provides a preview mode allowing the user to see a rough alignment of the fusion. Before starting the fusion process the user may select the scan's geometry or texture to be used as the overriding factor for fusion. Once activated, the fusion algorithm finds the best rotation and translation matrices of the models, “glues” them together creating a single geometry for the object, and finally, re-triangulates. The result is a new seamless geometric model with properly blended textures. This approach permits gluing of almost any object with distinguishing characteristics in either texture or geometry.
Computerized vision sensors with 3D sensing software should include:                automated digitization with best next view and maybe path planning algorithms, coupled to automated motion when the sensor does have the possibility to move around the digitized object in an autonomous way;        automated or a least semi-automated transformation and modelization of the raw data, in the form of:                    cloud points,            triangular meshes,            parametric surfaces, and            high-level geometric primitives such as cylinders, planes or spheres; and                        visualization and editing of the raw and processed data.        
Today this perfect piece of software does not exist, but subsets of it do exist. They are often specialized in a particular application and prove efficient enough in that context. Movies are among the first to use 3D geometry capture and modeling techniques.
The general principle of 3D sensing is that sensors receive light, or other forms of energy such as ultrasound or x-rays, from their environment and are supposed to construct a model of this environment. The model depends on the application:                In 3D sensing, the goal is usually a precise 3D geometrical model of the surface of objects. It is attempted to detect the reflection of light, etc. on the surface of objects;        Sometimes the ‘color’ properties of a surface must also be digitized; and        For inspection purposes, it is sought to map the physical properties inside an object, and to interpret changes in the incoming signal due to the changing physical properties inside an object.        
A few years ago, mobile robotics were the main field of application for 3D sensing. Computer vision scientists were mainly interested in passive sensing techniques that were supposed to reflect the way the human eye is working. Passive, means that no energy is emitted for the purpose of sensing, it is only received. Such passive techniques include stereo vision and monocular shape-from-X techniques like shape-from-contour or shape-from-shading. The problem is that recovering 3D information from a single 2D image is an ill-posed problem. Thus, monocular techniques have to add a priori information such as surface smoothness to recover 3D data, a process known as regularization. Regularization is most often context dependent and is difficult to apply outside of the lab.
The majority of automated video identification is done with 2 dimensional modeling. Identification may be greatly enhanced by 3 dimensional recognition of contour through depth perception.
To overcome the above problems of passive sensing, active sensing techniques have been developed. Active means that properly formatted light, etc. is emitted, and then received, once it has been interacted with the object to digitize. Typically, light is emitted in the direction of an object, reflected on its surface and recovered by the sensor, and then distance to the surface is calculated using triangulation or time-of-flight.
Volume digitization techniques like computerized tomography (CT), magnetic resonance imaging (MRI) or ultrasound imaging also fall in this category.
Effective television mini-cameras also are generally expensive, costing $50,000. To be useful for helping the visually impaired and blind person, it is important to be able manufacture an effective camera for approximately $400.
Thus, the trend is to integration or fusion of 3D techniques, and there is therefore a need in the art for a system to provide increased efficiency, adaptability and versatility for the fusion of 3D and computer vision sensing for the benefit of visually impaired and blind people.