Automated pattern recognition systems compare the key features of input data with key features of standard/expected object(s) to generate output decisions. “Patterns” cover a wide range of entities such as typed or hand-written characters, pictures and faces, weather (temperature, wind, pressure measurements), finger prints and iris scans, sounds and voice (waveforms), grammar and text sequence, and many other types of data that can be sensed/acquired and processed. The key features may be encoded according to familiar measurement metrics or via abstract mathematical transformations.
Typically, in pattern classification systems, a set of features (stored as arrays or vectors) are extracted via a predefined process on both prototype/training samples and new input data. These feature vectors may include numbers or characters representing physical attributes (measurements), time-dependent attributes like speech articulations (phonemes), digitally encoded bit streams, or mathematically encrypted patterns. The feature vectors may be (i) compared to ideal/desired values as in identification, inspection, and quality control applications, or (ii) compared against each other as in data clustering applications, or (iii) compared against m other feature vectors as in classification applications. In all cases, these methods require fixed-length feature vectors—i.e. feature vectors with n elements are compared to other n-length feature vectors with the same ordering of elements, in order to compute a meaningful similarity (or distance) metric. [See refs 1-6 below]
In some applications, a fixed number of features per sub-sample generates a variable-length feature vector due to a variable number of sub-samples for each input pattern. When variable-length feature vectors have been encountered, solutions have involved a conversion of feature vectors to a common fixed-length reference before comparison operations are invoked. For example, when comparing color images, the size/length of feature vectors may vary (even when size and resolution of photos are the same) depending on the complexity and richness of the colors in different regions of a picture. A common solution is to map the feature vectors to a global color table, (thereby generating a fixed-length feature vector) and compute standard vector distances or similarity metrics thereafter. [See ref 7 below]
Other cases where variable-length feature vectors are encountered include time-variant problem domains such as speech recognition, on-line handwriting recognition, time-series data and click-stream analysis in web-mining. In these cases solutions involve application of machine learning algorithms consisting of hidden Markov models [See ref 8 below], recurrent neural networks [See ref 9 below], and dynamic time warping [See ref 10 below] to find a warping function which optimally matches two (or more) feature vector sequences such that a time-normalized distance between the variable-length feature sequences can then be calculated. It is also known that dynamic programming methods [See ref 11 below] can also be used for computing time- or length-normalized distances between numeric or symbolic sequences.
In the methods set out in references [7] to [14] below, which are believed to represent the most relevant prior disclosures, the problems involve variable-length feature vectors, and the solutions (in refs [7] to [13]) include some type of normalization to a reference/global vector, or conversion of the variable-length feature vectors to fixed-length representations. P. Somervuo [ref 14] does not convert variable-length symbol sequences to fixed-length feature vectors in his research, wherein he investigated learning of symbol sequences by use of self-organizing maps (SOMs). SOM are well suited for data clustering, visualization of large data sets, and initializing (data pre-processing) for pattern recognition tasks, but are not suited for targeted/customized pattern detection [See ref 15 below].
Other than reference [7], all of the documents referred to above deal with variable-length feature vectors from temporal or sequential (time-variant) data. The document believed to be of most relevance to problems relevant to the present invention is reference [7] (mapping to a global reference vector) which the approach set out in this document is not always efficient or practical as described below.
In problem domains that deal with heterogeneous data and natural language text, there is no standard/global basis vector to serve as a normalization base. For example, a feature element describing device/product configurations has no “global table” to use as a normalization reference, as there are many different types of products yielding different numbers and types of configuration parameters. Similarly, a feature element comprising a “Customer Complaints” or a “Frequently Asked Questions” (FAQ) list has no standard reference vector, as natural language descriptions are unstructured, and complexities of products vary widely. Arbitrary limitations on number of parameters or simplified analysis (e.g. on some maximum number of keywords) lead to loss of information, context, and semantics. Padding of feature vectors to an arbitrary maximum length introduces computing memory and processing inefficiencies. System designers have resorted to these artificial constraints on the past since alternative solutions have not been available.