In general, multi-view reconstruction techniques first establish a correspondence between the two dimensional (2D) images of a scene as viewed by respective multiple sensors. For example, two cameras may be arranged to have overlapping fields-of-view, so that they image the same area with known angular and translational offsets. Three dimensional (3D) scene reconstruction involves triangulating the correspondences between the 2D images, given the geometric relationship between different views.
The task of establishing correspondence across multiple views relies on the extraction of robust image features in each related 2D view. Typically, such features are characterized by local variations in image contrast. These local contrast variations are commonly referred to as “texture.” In a general case, the viewed scene may contain regions corresponding to surfaces that lack sufficient texture for successful correspondence, thus making it difficult or impossible to obtain a complete 3D reconstruction of the scene. Such regions may be referred to as being “textureless.” Some common examples are flat, uniformly painted and illuminated walls, floors, or tables.
The risk of not being able to obtain image correspondence due to insufficient texture is often mitigated using at least one of two methods. A first method modifies the surfaces at issue, such as through painting or by affixing textured markers. Another approach involves the projection of structured light into the area during imaging, to “add” texture to otherwise textureless regions.
Applications in which it is necessary to have a fully textured view at all times primarily benefit from the above mitigations. However, for large monitoring areas, such mitigations are often expensive and inconvenient. Imagine, for example, a sensing apparatus that is intended to monitor a given area defined by boundaries that are configured during system setup. Such boundaries may correspond to physical surfaces, or they may represent virtual surfaces defined by 3D volumes. A machine vision system based on the use of stereoscopic cameras provides a good working example of such a system.
An apparatus based on stereo vision includes some type of stereo vision sensor or range data analyzer that analyzes clusters of 3D points in the monitored volume, using criteria necessary to establish the presence and position of objects bigger than the minimum detectable size. The position of an object is compared against the configured boundary, which defines the monitored volume, and a decision is made as to whether or not an intrusion has occurred.
One approach for boundary configuration relies on a user manually entering the boundary coordinates. For example, an authorized user has access to a PC-based configuration tool that provides an interface to the apparatus, for manually configuring the monitoring boundaries according to defined coordinates. This approach requires the user to calculate or know boundaries before system setup, and may not be convenient.
Another way to configure boundaries is to let the apparatus measure them. This approach is commonly called a “learning” mode, wherein the apparatus automatically learns the positions of physical surfaces already present in the viewed area. Boundaries are automatically configured to be positioned along the learned physical surfaces, separated by a specified tolerance/standoff. The user may then accept or modify the configured learned boundaries, as required for the application. However, to the extent that some portion of the boundaries to be learned automatically is textureless, the apparatus will miss such portions or will not range such portions with sufficient accuracy. The user may manually enter coordinates for the problem boundary portions, but doing so imposes an inconvenience on the user and interferes with fully automatic acquisition of boundary information.
Moreover, during setup of such an apparatus, care must be taken such that the configured boundaries (defining the monitored volume) are consistent with the projective nature of the sensor's field of view (e.g., within the field of view of a stereoscopic vision camera included within the apparatus). In other words, each part of the configured boundary must be viewable—without being shadowed—anywhere in the sensor field of view. Further, once boundaries are configured the user usually must check their validity. The check may be done by moving special textured test pieces or test panels across each boundary. Such an approach is time consuming and inconvenient, and therefore prone to error. Additionally, in some safety applications, it may be necessary to guarantee the presence of a physical background within a maximum distance from the sensor, or within a minimum distance of the configured boundary. The manual validation of these conditions also may be a challenging task for the operator.