This invention relates to pattern recognition and more particularly to preprocessing of information in the form of feature extraction. Feature extraction technologies are known in the various fields of pattern recognition, character recognition, speech recognition and so forth. Classifiers are employed to map extractable features into decision sets.
Classifiers of the prior art are characterized by inherent scaling. Since data are often inherently limited as to class and recognizable elements, the problem of addressing scale is frequently moot. However, if the data are not examined with an appropriate scale, no classifier, no matter how sophisticated, will be able to sort the source data.
Feature extraction techniques are known, including edge detection, fixed-sized partitioning, region-based classification, borrowed strength classification. Often these techniques are limited by inherent scaling assumptions or inability to incorporate domain specific elements or expert knowledge, such as that a detected polarity change in seismic data represents an impedance or density change or that spoken words can be parsed into phonemes.
Pattern recognition and classification will benefit from improved feature extraction. The present invention is intended to provide such an improvement.
According to the invention, given a spatial dataset of n dimensions, in a data processing system, a data-driven partitioning of the dataset is effected into topologically contiguous regions using domain specific indices (for example, by examining the change in polarity of the impedance of seismic data). Then on each region, a set of features (which may be mathematical functions) is calculated (e.g., mean value of all data in the region), wherein the features are considered sufficiently descriptive of the region. Thereupon, two or more regions which are topologically, and in a specific embodiment topographically, contiguous are grouped together, and the associated features are assembled in a structure (e.g., a vector, a matrix, a mathematical graph or a typecode) to be input to a classifier. (A classifier is a function which maps data into a decision.)
A characteristic of the present invention is that contiguous regions of the dataset that are heterogeneous can be classified by considering the homogeneous elements which they contain, and that in some way the contiguous regions correspond to a real world characteristic or a manifestation of a real-world process.
The invention will be better understood by reference to the following detailed description in connection with the accompanying drawings.