The invention relates to a method and system for implementing successive multi-layered feature recognition in multi-dimensional space in which recognition cells are dynamically generated to accommodate the input data and are adapted during the recognition process, wherein the recognition cells are structured into groups which have specific recognition features assigned to them.
Three-dimensional laser scan data generated in an industrial plant setting for the purposes of documentation, computation, and analysis has a usefulness that is directly related to the degree or complexity of recognition, the so-called intelligence. For example, recognizing the mere presence of a surface is accomplished by a laser scan, but is not very useful; to recognize that surface as a part of a cylinder is better, but to recognize it as part of a pipe of a certain size with specific end connections allows the generation of useful CAE models of existing plants.
Pattern recognition by means of feature extraction is well known in the art and is the general term applied to the process of recognizing complex objects based on primitive or simplified representations. Recent advances in this field have led to numerous applications of neural network techniques and architectures in commercially available software and progress in related research.
The problem to be solved in any recognition process is firstly the ability to distinguish and recognize the assigned features, and secondly to do so in the presence of disturbances such as noise or artefacts in the raw data, faulty data, or partially missing data. The first aspect of the recognition task may involve more than merely detecting the presence of a certain object at a particular location. Depending on the type of entity to be recognized, it may also be necessary to recognize further auxiliary features or parameters, such as the orientation or size of the object. The difficulties introduced by the second aspect of the recognition task have a detrimental effect on the quality of the first aspect, causing errors or partially incorrect solutions that need to be tested, confirmed, reworked, and optimized. Some type of confirmation mechanism would be advantageous to ensure that the proper solution is found.
Pattern and object recognition by means of successive computation steps generally begins with the loading of an input dataset into a pre-defined set of input variables or units which constitute the lowest recognition layer. During the recognition process, each recognition unit in higher layers generates a response based on the values or responses of a selected subset of units in lower layers (receptive field). The number of layers used, the sizes of the receptive fields, and the rule used by each recognition unit to compute its response vary depending on the type of information to be recognized, that is, the complexity and number of the patterns, and the intermediate features that must be recognized to successfully identify the overall pattern. Sufficiently fine-grained intermediate features, overlapping receptive fields, and strongly converging data paths enable distortion-invariant and position-tolerant recognition.
In commonly available similar recognition systems, the structure and dimension of the recognition layers are generally fixed during the recognition process, requiring that each layer contain enough recognition units or cells to fill the N dimensions of the recognition space in the required resolution. For cases where N>2 the resulting large number of cells makes a computation of the cell responses unfeasible. The process becomes inefficient particularly where the input data is sparsely distributed throughout a large input space.
The feature extraction process itself is intended to solve the problem of recognizing objects from input data that is partially missing or noisy. As detailed by Fukushima, the simple lower-layer representation of the object is replaced by the more intelligent, meaningful and detailed representation in higher recognition layers. This step can succeed if there are merely enough lower layer features present to initiate or suggest the appropriate higher-layer response, and if there are no conflicting or contradicting responses generated. In existing methods, the resolution of conflicts among members of the higher recognition layers is resolved by simple response magnitude comparisons—the stronger response, or the response representing the greater number of lower-level features is considered the winner. The losers, which nevertheless represent some aspect of the recognizable features, are generally neglected. There is generally no procedure to reintegrate or reform the seemingly incorrect responses into the general recognition solution.
Isshiki describes an optical character reading system in which a given character is read several times to produce multiple result signals, from which group the final result is determined by the most frequent occurrence of the same signal. That method actually temporarily generates the above mentioned multitude of response elements which the current invention seeks to avoid by generating cells only where there is data to recognize. Furthermore, the technique of generating a multitude of overlapping responses is well known in the art and is not an aspect of the present invention.
Maney describes a recognition technique in which multi-bit fields are collected into groups but has no provision for confirming the initial grouping based on the results of later or higher groupings. That method also lacks a mechanism for ensuring or at least facilitating that the lower-level multi-bit fields (which are analogous to the cells in the current embodiment) are collected into compatible groups. Furthermore, the method described by Maney finds the same pattern on all levels of the hierarchy and does not construct the final solution from collections of intermediate solutions. It is designed to detect the presence of a given signature somewhere within a larger signal space, and not fully interpret, organize, and structure all of the components of signals that may be present in the signal space. Furthermore the hierarchical grouping represented in FIG. 2 of U.S. Pat. No. 5,295,198 is an arbitrary grouping that is as such not necessary for the solution of the problem and does not represent a distinguishable point in a recognition hierarchy. The process of dividing a signal space into many subsets and recombining those subsets in various groups is well known in the art and was described by Fukishima to provide his Neocognitron network with the ability to recognize patterns independently of their location in the signal space. Furthermore the system described by Maney bases on a discretization of the recognition space, (as evidenced by the spatial cells which make up the parameter space his pattern recognizers use) which the present invention avoids by generating recognition cells where there is data to recognize.
Filipski describes a method of statistical pattern recognition in which a hierarchical structure is generated to assist in the decision process involved recognizing various classes of objects. The classes used therein are however not structured in their nature in that the constructed hierarchy does not imply a hierarchy of content but only a hierarchy of procedure or order of processing. Therefore it is not necessary to establish a mutually fortifying relationship between pairs of hierarchical levels and there is no ownership relationship between elemental members of the different levels. It is therefore also not possible to implement the bottom-up/top-down information flow which is the subject of the present invention.
Kuratomi et al. describe a neural network system for the recognition of twodimensional images which also is based on multi-layer feature extraction and represents the standard art described earlier by Fukushima. This method however does not generate recognition elements where there is data to recognize, but covers the recognition space with an independent grid which is fully populated with recognition elements. The described method also does not provide for the top-down information flow of the present invention.