(1) Field of Invention
The present invention relates to an object recognition system and, more particularly, to a system for filtering, segmenting, and recognizing objects in three-dimensional data.
(2) Description of Related Art
Recent advances in range sensor technology provide the ability to collect three-dimensional (3D) data over large areas with high resolution and accuracy. The precision of the collected 3D data is usually high enough to capture not only common large structure (e.g., buildings), but also smaller objects such as pedestrians and cyclists. This improvement enables 3D based scene analysis for a wide range of applications.
3D Scene analysis has been studied under quite a few different settings, including urban city (see the List of Incorporated Literature References, Literature Reference No. 12), indoor (see Literature Reference No. 3), and aerial settings (see Literature Reference No. 4). Different techniques have been developed for labeling surfaces such as grass, walls, or pavement, and small sets of object types such as foliage, people, and cars in 3D outdoor scenes. Most of these approaches label individual 3D laser points using features describing local shape and appearance in combination with spatial and temporal smoothing via graphical model inference. Typically, features are either extracted from a fixed neighborhood around 3D points or from small patches generated through an over segmented scene. In the structured graphical model, a node in the graph is a random variable representing a 3D feature point's label and edges are formed to model the scene context. In order to be effective, many interactions need to be considered, which result in a densely linked graph/random field. In general, exact inference over such a random field is intractable and only approximate methods can be used. This complicates the learning process further. In addition, the use of approximate inference makes the learned solutions somewhat arbitrary and sensitive to parameters.
The work by Douillard et al. (see Literature Reference No. 5), for example, references the use of 3D point clouds as a possible solution for classification. In the work of Douillard et al., a pipeline for fast segmentation of 3D point clouds and subsequent classification of the obtained 3D segments is proposed. However, the core of their classification module relies on aligning candidate segments with a set of pre-defined 3D templates via an Iterative Closest Point (ICP) algorithm. In other words, the work of Doilard et al. requires predefining 3D object templates (i.e., reference 3D point clouds), and the iterative alignment step requires substantial computational time.
Thus, continuing need exists for a system that can recognize objects that does not require modeling a scene for object entity classification and, further, does not require the labeling of individual 3D feature points. In other words, a continuing need exists for a system that provides for unsupervised detection and segmentation of 3D candidate objects in an uncontrolled environment.