The term “indexed data” or “spectrum” refers to a collection of measured values called responses. Each response may or may not be related to one or more of its neighbor elements. When a unique index, either one-dimensional or multi-dimensional, is assigned to each response, the data are considered to be indexed. The index values represent values of a physical parameter such as time, distance, frequency, mass, weight or category. For index values that are measurable, the distance between consecutive index values (interval) can be uniform or non-uniform. Besides that, the indices of different indexed data might be assigned at standard or non-standard values. The response of the indexed data can include but are not limited to signal intensity, item counts, or concentration measurements. By way of example, the present invention is applicable to any of the aforementioned type of spectra.
Indexed data are used in a variety of pattern recognition applications. In general, the purpose of those pattern recognition applications is to distinguish indexed data that are collected from samples from subjects experiencing different external circumstances or influences; undergoing different internal modes or states; or originating from different species or types. The subjects include, but are not limited to, substances, compounds, cells, tissues, living organisms, physical phenomena, and chemical reactions. As used herein, the term “conditions” refers generally to such circumstances, influences, modes, states, species, types or combinations thereof The conditions are usually application-dependent. The underlying rationale of a pattern recognition application is that a response at each index may react differently to different conditions. If a response increases with the existence of a condition, the condition “upregulates” the response; if a response decreases with the absence of a condition, the process “downregulates” the response. Typically a collected spectrum comprises at least (1) the responses ofinterest, which are correlated to the conditions of interest; and (2) common characteristics and noise, which are uncorrelated to the conditions of interest, but which may be correlated to other conditions of non-interest.
In such applications of indexed data, a set of indexed data is usually collected under different conditions, thus forming an “indexed dataset”. Within the dataset a category is a set of labels that represent a condition or a combination of conditions. The separation of indexed data from the indexed dataset into categories is usually performed by a pattern recognition system. In this regard the end-to-end objective of a pattern recognition system is to associate an unlabeled spectrum sample with one of several pre-specified categories (‘hard’ clustering), or alternatively to compute a degree of membership of the sample to each one of the categories (‘soft’ clustering).
However, present pattern recognition systems include a normalization module that is a major bottle neck. The normalization module is a bottle neck, because the amount of information that a feature extraction module can extract is limited by the amount of information that is retained by the normalization module. The performance degradation due to error in the normalization module often cannot be corrected by subsequent modules. Consequently, present pattern recognition systems suffer from a variety of deficiencies, which include lower processing speed and lower discriminatorypower than desired or, in certain instances, needed.
Applicants have discovered that one source of these deficiencies is the failure to remove the common characteristics and noise before normalization. Consequently in many applications where the interested response is weaker than the common characteristics, normalization to the common characteristics instead of the interested response reduces the discriminatory power of the spectra significantly. In addition, the large dimension of common characteristics and noise retained after normalization put extra burden on a feature extraction module and lower the processing speed of the pattern recognizer significantly. Accordingly, there is a need for removal of non-discriminatory indices before the feature extraction module, to permit increased discriminatory power, while minimiing the computational cost of the pattern recognition system.