1. Technical Field
The present disclosure relates generally to optimizing the performance of cascaded based topologies, and more specifically to a method for optimizing the performance of cascaded based topologies using multiple operating points.
2. Discussion of Related Art
Classifiers label data objects by grouping them into one or more classes based on certain measured features. When the properties of a data object have a hierarchical structure, rather than making use of a single complex M-ary classifier, it can be more efficient to construct a cascade of low complexity binary classifiers. For example, the cascade may be more efficient when operating on continuous data streams. A binary classifier classifies a data object as belonging to one of two particular groups, based on whether or not the data object has a particular property or feature of that group. By using a cascade of classifiers, data of disinterest can be filtered out or discarded in an early stage of a classification process, thereby decreasing a processing load on downstream classifiers to lead to an overall reduction in resource consumption.
Conventional stream and data mining applications use cascades of classifiers to improve the confidence of an end to end classification. The applications are quite diverse, ranging from face recognition and rapid object detection in video scenes, analysis of land cover maps for crop detection, extraction of concepts from image data, digital recognition, etc. The resources that are available to an application making use of such a cascade can vary considerably. Prior research on resource constrained stream mining applications falls into two broad categories.
A first set of approaches rely on load-shedding, where applications determine a discard policy given the observed data characteristics (e.g., data is received in bursts), and the desired Quality of Service (QoS) requirements. Several of these approaches are limited by their assumption that the impact of load shedding on performance is a known a-priori. Further, such approaches often only consider simple data management jobs such as aggregation, for which the quality of the job result depends only on the sample size. Other load shedding approaches attempt to maximize certain Quality of Decision (QoD) measures based on a predicted distribution of feature values in future time units.
A second set of approaches formulate resource management in terms of a filtering problem and the designed solutions filter out unrelated data as early as possible, to conserve resources. Such approaches may have a simpler design and may be flexibly implemented. However, they often consider only locally available information and metrics, such as data characteristics, accuracy, or a QoD metric at a particular point in the system. This may lead to sub-optimal end-to-end performance when the data discarded at one stage is essential for a downstream (later stage) classifier.
Further, the classifiers of both approaches make use of a single operating point, thereby coupling the data rates of each outgoing “yes” and “no” edge. A “yes” edge indicates that the classifier has positively identified a particular feature within data, while a “no” edge indicates that the classifier has determined that the particular feature is not present within the data. A single operating point may not be able to partition data to satisfy resource constraints, especially when the resource constraints are tight or if the load is heavy, due to the rate coupling.
Thus, there is a need for methods of configuring binary classifiers under resource constraints to have multiple operating points.