Various sensors are known for use in automatic object detection and control systems. For example, photo voltaic sensors detect objects interrupting a beam of visible or UV light. Mechanical switches and load cells detect objects through direct or indirect contact or by detecting an object weight. Thermal sensors detect objects radiating heat, and electro-magnetic sensors detect objects such as metal objects that alter electromagnetic fields. These sensors typically send signals to logic circuits which control mechanical actuators, record the object's presence and/or alert an operator based on the presence or absence of an object.
Such sensors are not well suited for certain applications because they are easily circumvented. They only detect a certain class of objects moving through a narrowly constrained space. Similarly, they can not directly determine an object direction or velocity. These sensors often have problems maintaining uniform sensitivity throughout a monitored space or over time, and they can be prohibitively expensive.
In some applications, more than one sensor is necessary. For example, typical automatic door controllers used in most grocery stores use a microwave sensor or ultrasound sensor to detect a person approaching a door. An infra-red motion detector is often used to determine whether a person is loitering in a doorway before allowing the doors to close.
Various camera based systems are also known for use in object detection systems and control systems. Camera based systems have the additional advantage of providing an image of the monitored space which can be stored for later analysis. Such systems typically use an electronic still camera or an electronic video camera which captures images on an array of charge coupled devices (CCDs) and converts the images into electronic data files for automatic analysis or storage. For example, automatic face recognition systems have long been the subject of experimentation and are now in use in several high security applications. These systems can be too slow, expensive or unreliable for most common applications.
Motion detection systems have been developed using electronic video cameras and frame capturing processes which detect and track certain features in each frame a captured video sequence. For example, automatic door control systems are known that tracks corners of an object from frame to frame and calculating a velocity vector for the object. The velocity vector is used to determine whether to open or close an automatic door.
Heretofore known feature tracking systems, like the referenced corner tracking system described in the Alexander article, extract data from a monocular image sequence. Such monocular systems provide only 2 dimensional (2-D) from which to compute velocity vectors. Such monocular systems have difficulty distinguishing shadows and lighting effects from actual 3-dimensional objects. This problem is exacerbated in certain security systems wherein, for example, a pre-alarm condition triggers a warning strobe light that affects detected images of the monitored space.
Monocular video monitoring systems operating on 2-D image data must tolerate blind spots or blind intervals during which regular obstructions appear in the camera's field of view. For example, some doors or doorframes being controlled by monocular video systems can come into the field of view of the monitoring cameras whenever they are opened. Some systems are programmed to ignore frames or frame segments whenever the door is opened. Other more refined systems use additional sensors to detect a door's actual position over time and ignore only the portions of a frame where the door or door frame is expected to appear, see for example U.S. Patent Application No. US2001/0030689 to Spinelli.
When monocular vision motion detection systems are first installed they must be “trained” using a reference image in order to establish a frame of reference appropriate to the particular environment. Such training can often involve tedious and expensive procedures. Image coordinates are calculated, stored or output in 2-D image space because real 3-D coordinates are unavailable in monocular systems.