Recent growth in the amount, importance, and richness of unstructured information has instilled a need for more sophisticated data analysis and classification techniques. However, as the number and variety of data analysis functions applied to information sources increase for a given technique, resource requirements, in terms of computing, storage, communication, and human effort, also increase. This results in large scale data analysis systems that are often less responsive to a user's needs and unable to effectively provide quick answers or operate under constrained conditions. It is desirable for users and systems to efficiently and effectively exercise the trade-off of data analysis processing along a number of possible dimensions to make best use of resources within constrained conditions.
In the case of large-scale classification systems, there is a requirement to process, often simultaneously, large numbers of data sources using large numbers of classifiers. An example of such a system is real-time speech analysis or speaker identification in call centers. Typically, there are constraints in terms of total processing power or total response time that prevent the complete analysis required by the classification system. As a result there is a need to allocate resources among a large number of classifiers or data analysis functions given a constrained environment. Resource allocation strategies aid in adapting the large-scale classification systems to those applications in which there are limited resources and overwhelmingly large amounts of data and analysis functions. Such strategies are also useful when less accurate results are needed quickly, for example to trade-off the processing time for accuracy.
Large-scale classification is a significant problem arising in the context of video surveillance analysis in which many simultaneous information feeds or video streams must be analyzed and indexed. It is often not possible to complete a full classification of the contents of all of the streams simultaneously in real-time. However, it may be possible to select from a variety of classification algorithms, such as, for example, K-Nearest Neighbor (KNN), Support Vector Machines (SVMs), Gaussian Mixture Models (GMMs), Hidden Markov Models (HMMs), and Decision Trees, to best exercise the trade-off in computation and classification accuracy given the overall processing constraints of the systems or response time required by the user.
Other examples may include the analysis of Internet data, such as, for example, chat rooms, blogs, and streaming video, in which it is important to analyze multiple modalities, such as, for example, text, image, audio, speech, and XML. This type of data analysis involves significant processing in terms of feature extraction, clustering, classification and semantic concept detection. As a result there is a need for an interactive real-time system in which analysts or users may explore this type of data as well as run batch-mode analysis methods that apply large numbers of classifiers or data analysis functions.
M. Naphade et al., “Modeling Semantic Concepts to Support Query by Keywords in Video,” IEEE Proc. Int. Conf. Image Processing (ICIP), September 2002, teaches a system for modeling semantic concepts in video to allow searching based on automatically generated labels. This technique requires that video shots are analyzed using a process of visual feature extraction to analyze colors, textures, shapes, etc., followed by semantic concept detection to automatically label video contents, with labels such as “indoors,” “outdoors,” “face,” “people,” etc. Furthermore, new hybrid approaches, such as model vectors allow similarity searching based on semantic models. For example, J. R. Smith et al., “Multimedia Semantic Indexing Using Model Vectors,” IEEE Intl. Conf. on Multimedia and Expo (ICME), 2003, teaches a method for indexing multimedia documents using model vectors that describe the detection of concepts across a semantic lexicon. This approach requires that a full lexicon of concepts is analyzed in the video in order to provide a model vector index.
These large-scale classification systems need to support a trade-off in analysis quality of detectors with resources. In both of these cases, it is possible to choose from a variety of algorithms for the feature extraction and concept detection processes. For example, concepts may be detected using statistical models of the extracted features. One known modeling approach is based on SVMs that describe a discriminating boundary between concept classes in high-dimensional feature space. While SVMs may provide good accuracy for classification, they also require significant resources in terms of the representation of the model and its parameters. On the other hand, GMMs provide a more compact representation of the model that requires fewer resources, but, may not provide the same level of classification accuracy as SVMs. Alternatively, a batch mode analysis may be able to use a computationally expensive but high-quality KNN classification algorithm for detecting hundreds of different types of events in video. However, given an interactive system, it may be necessary to apply a faster Decision Tree classifier of less quality in order to get quick results.
The known solutions to this allocation problem involve either applying only subset of classifiers given constraints on resources, or using hierarchical classification structures that apply progressively more expensive and higher qualities of detectors on small sets of data in order to make more efficient use of resources. The problems with these solutions is that choosing not to run certain classifiers is not optimal when classification results are desired or needed for all detectors. Furthermore, hierarchical classification does not give quick results when high-quality classification is not needed.
Given these varied analysis approaches in large scale classification systems, there is a need to develop a system that provides an efficient or optimal trade-off among the important dimensions of the classifiers and collateral processing elements in order to best meet various constraints.