Data Classification
Data classification assigns a pre-defined label to a data item based on some quantitative information extracted from the item and a training set of previously labeled items. For example, an email classification system can label a specific email as “spam” or “no-spam” based on the email's content and a training dataset of mails that are known to be “spam” or “no-spam.” A performance of the classifier depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all classification problems. The performance also depends on the quality of the training data. A well trained classifier requires large training datasets that have labeled samples with varying characteristics.
Classifiers can vary based on the mathematical models used to extract information from data items, the amount of training data and model complexity. The choice of a classifier often depends on the data characteristics and its computational resource requirement, e.g., such as the CPU usage and memory requirements. For example, some classifiers might be unsuitable when categorization results are required in real-time.
Embedded System
An embedded system is usually integrated into some other device or machine. The embedded system can be designed to perform dedicated functions, often in real-time. Embedded systems are common for many devices, such as portable video players, cameras, traffic lights, factory controllers and surveillance systems. Because many embedded system perform dedicated functions, the embedded system can be optimized for size, cost, reliability and performance.
Embedded systems that include sensors and perform classification can be trained using training data. The trained embedded system can than have an improved functionality and performance. For example, a classifier on a camera can alarm presence of intruder in a “no-trespassing” surveillance area, However, embedded systems typically have limited memory and cannot store a large training data set.
One solution to the limited memory problem is to store only a small number of carefully selected “exemplars” from the training data that are sufficient for effective classification. As defined herein, an exemplar is sample data that are characteristic of a larger training data set.
Exemplar Learning (EL) Methods
An exemplar learning (EL) method can be used to select a small set of training data from a large training dataset. EL, as the name implies, learns by exemplars. That is, ‘good’ samples that reduce the error rate of the classifier are retained, while ‘bad’ samples are discarded. Thus, EL can be used to generate a small training data set for a memory-based classifier in an embedded system that has limited memory.
Conventional EL methods learn exemplars based on some neighborhood structure. Then, the methods measure a loss or gain in performance due to a sample being removed, using conventional misclassification rates.
The EL method can continuously adjust the training data set as samples are processed, that is good new samples are retained, and bad new samples are discarded. Thus, the classifier can dynamically adapt to a changing environment in which the embedded system operates. Almost all EL methods discard samples based on the following hypothesis:
Hypothesis 0 (H0): If the removal of a sample in a given training data set does not increase misclassifications or an error-rate of the remaining samples, then the sample can be discarded.
EL methods have some disadvantages.
Incremental Update
Conventional EL methods are computationally intensive, offline, and are not incremental in nature. They require that the entire training data set is stored in a memory throughout the execution of the exemplar EL method. This makes those methods inapplicable for embedded systems where memory is limited, and the training data are updated regularly.
Class-Imbalance
Misclassification rates used in the hypothesis H0 are insensitive to class-imbalance. This problem is critical in EL, where removal of a sample changes the class population. For example, in a set with 90 positive class samples and 10 negative class samples, removing a positive class sample gives nine positive sample misclassifications and one negative sample misclassification, while discarding a negative sample gives one positive and nine negative sample misclassifications.
Because the error rate is the same for both the cases, i.e., 10%, the class-imbalance could cause the discarding of the negative samples, leaving only positive class samples in the training data set.
It is desired to have an EL method that yields an estimate of the classifier's ability to discriminate between two classes, as opposed to the overall classification accuracy to produce better results.
Ordered Removal
Most ordered removal procedures are either ad-hoc, or return a training data set whose size is determined during run-time. Those methods ignore an ordered removal of samples in order to find the best training data set for a given memory size. It is desired to have an EL method that produces an optimal training data set that satisfy predetermined memory size constraints, such as typically found in embedded systems.
Validation Consistency
The conventional EL methods remove samples if the classification error rate for the samples remaining in the training data set does not increase. Accordingly, a sample plays a dual role. That is, the sample participates both in updating the training data set and a testing set to be classified. As the removal progresses, the size and nature of the training data set dynamically varies, and thus the error-rates are determined over different sets that are not consistent and have lower statistical significance. It is desired to have a separate validation data set disjoint from the training data set, which remains unchanged during the removal process.