1. Technical Field
The present disclosure relates generally to cascaded classifier/filter based topologies, and more specifically to methods for configuring cascaded classifier/filter based topologies and classifying data.
2. Discussion of Related Art
Real-time processing, mining, and classification of continuous, high volume data streams is increasingly important for many applications including financial analysis, real-time manufacturing process control, search engines, spam filters, medical services, etc. Distributed stream mining systems have been recently developed to support such stream processing applications. Applications may be decomposed into flow-graphs or topologies of distributed processing operators that are deployed on a set of resource constrained nodes to meet scalability, reliability, and performance objectives of large-scale, real-time stream mining.
Stream classification and mining applications implement topologies of low-complexity binary classifiers to accomplish the task of complex classification. Such classifiers may be implemented as software executing on one or more computer processors to perform the intended classification function. Classifiers label data objects by grouping them into one or more classes based on certain measured features. When the properties of a data object have a hierarchical structure, rather than making use of a single complex classifier, it can be more efficient to construct a cascade or tree of low complexity binary classifiers. However, managing such complex topologies of operators under dynamically changing resources and data characteristics to maximize application relevant performance can be challenging.
Conventional approaches in stream mining use load-shedding to deal with large data volumes or limited system resources. While naïve load shedding may perform well for simple data management jobs, such as aggregation, they generally do not perform well on jobs involving sophisticated data classification. Intelligent and Quality of Service (QoS) driven measures based on predicted feature values have been developed. However, the performance of such local load-shedding can be highly suboptimal in terms of end-to-end performance, as data discarded at one stage may be needed for a later (downstream) stage.
In one approach, each of the classifiers includes a single operating point, which corresponds to a single probability of correct detection PD and a single probability of false detection PF. For example, FIG. 1 illustrates a cascade of classifiers for classifying speech, where each of the classifiers 101 and 102 have a single operating point (e.g., {PD=0.9, PF=0.3}). The classifiers 101 and 102 include a single outgoing “yes” output branch and a single outgoing “no” output branch. For example, data classified correctly or incorrectly (e.g., a false alarm) as being speech by the classifier 101 is output along the “yes” output branch and the rest of the data is output along the “no” edge. However, a single operating point per classifier may not be able to partition data to satisfy resource constraints when the resource constraints are tight or the load is heavy.
A classifier of another approach employs a single operating point for each output branch. For example, FIG. 2 illustrates a classifier tree 200 for classifying sports images, which has been configured to employ a single operating point for each output branch. The classifier tree 200 includes a parent classifier 200-1, a first child classifier 200-1-1 and a second child classifier 200-1-2. The parent classifier 200-1 receives images and classifies whether the received images represent a team sport (e.g., football, baseball, etc.). Each of the classifiers 200-1, 200-1-1, and 200-1-2 includes two operating points, one for each “yes” output branch and one for each “no” output branch. For example, the parent classifier 200-1 includes a positive classifying portion 200-1A having a first operating point of (PD=0.9, PF=0.3) and negative classifying portion 200-1B having a second operating point of (PD=0.7, PF=0.1), the first child classifier 200-1-1 includes a positive classifying portion 200-1-1A having a first operating point of (PD=0.8, PF=0.2) and a negative classifying portion 200-1-1B having a second operating point of (PD=0.6, PF=0.15), and the second child classifier 200-1-2 includes a positive classifying portion 200-1-2A having a first operating point of (PD=0.85, PF=0.2) and a negative classifying portion 200-1-2B having a second operating point of (PD=0.65, PF=0.1). However, from the prospective of optimization and performance analysis, this approach may lead to a non-convexity in a utility surface. Further, one cannot guarantee convergence of this approach to a global optimum or guarantee the quality of its solution. Further, the approach may be suboptimal in resource-constrained processing environments.
Thus, there is a need for improved methods of configuring classifier networks.