Data mining extracts information from large-scale databases and the Internet. Data mining has been applied to the analysis of market, financial, sensor, and biological data. Data mining should not be confused with pattern matching where data are searched for known patterns.
Fundamentally, data mining discovers “interesting” and previously unknown patterns in data. Interesting patterns are usually defined in terms of the reoccurrence rate of a particular pattern. Because data mining does not presume any pre-defined patterns, it is frequently described as unsupervised learning.
Data mining derives rules, trends, regularities and correlations from a large volume of data. Often, data mining is based on artificial intelligence (AI), memory based reasoning (MBR), association rule generation, decision trees (DT), neural analysis, statistical analysis, clustering, and time series analysis.
Clustering identifies homogeneous groups of related information in data. Prior art clustering assumes that relationships among the data are known. Clustering has been studied extensively in statistics, pattern recognition, and machine learning. Examples of clustering applications include customer segmentation for marketing analysis, and identification of sub-categories of signal databases in sensed data.
Clustering techniques can be broadly classified into partitional techniques and hierarchical techniques. Partitional clustering separates data into K clusters such that the data in each cluster are more similar to each other than to data in different clusters. The value of K can be assigned by a user, or iteratively determined so as to minimize the clustering criterion.
Hierarchical clustering is a nested sequence of partitions. Agglomerative hierarchical clustering places data in atomic clusters and then merges the atomic clusters into larger and larger clusters until all data are in a single large cluster. Divisive hierarchical clustering reverses the process by starting with all data in one cluster and subdividing the cluster into smaller clusters, see for example, Jain et al., “Algorithms for Clustering Data,” Prentice Hall, 1988, Piramuthu et al., “Comparison of SOM neutral network and hierarchical clustering methods,” European Journal of Operational Research, 93(2):402–417, September 1996, Michaud, “Four clustering techniques,” FGCS Journal, Special Issue on Data Mining, 1997, and Zait et al., “A Comparative study of clustering methods,” FGCS Journal, Special Issue on Data Mining, 1997.
Most data mining methods reduce the dimensionality of the input data.
Clusters that are formed in a high-dimensional data space are not likely to be meaningful clusters because the expected average density of points anywhere in the high-dimensional data space is low. Known techniques for reducing the dimensionality of data include principal component analysis (PCA), factor analysis, singular value decomposition (SVD), and wavelets. Principal component analysis, also known as the Karhunen-Loeve expansion, finds a lower-dimensional representation that explains variances of data attributes, whereas factor analysis finds correlations among the data attributes. Jain et al., in “Algorithms for feature selection: An evaluation, Technical report,” Department of Computer Science, Michigan State University, East Lansing, Mich., 1996, describe a technique for image analysis.
A popular data mining method used for analysis of consumer buying patterns is the identification of non-obvious associations, or association rules. An example of an obvious association is that consumers who buy baby formula also buy diapers at the same time. However, it was discovered in 1992 that beer and diapers are often purchased together in the evening hours. Such an association is a good example of a non-obvious association. Normally, one would not associate diapers and beer as strongly related purchase items because beer is usually not considered a baby formula. Such an approach has also been termed market-basket analysis.
An association is defined as follows. If there is a set of n items I1, . . . , In, and a transaction, e.g., a database or operation, that selects a subset of the n items, then an association between two items Ii and Ij is defined as a rule R for any transaction in both items Ii and Ij are selected into the subset. A condition of the rule R is defined as the occurrence of item Ii. A result of the rule R is defined as the occurrence of the item Ij. A support of the rule R is defined as a percentage of the transactions that have both items Ii and Ij. A combination of the rule R is defined as the occurrence of both items Ii and Ij in the same transaction. A confidence of the rule R is defined as a ratio of the support of the combination and the support of the condition. Finally, an improvement of the rule R is defined as a ratio of the support of the rule over the product of the support of the condition Ii and the support of the result Ij.
An association is strong when both the support and the confidence of the association are high. For example, for a grocery store transaction, milk is strongly correlated with every other item sold so the support is high, because most transactions include milk as an item. However, for some items, such as greeting cards, the confidence is low because these are bought infrequently.
Finally, the improvement of the association needs to be strong as well, because an improvement less than 1 indicates that the condition does not predict the combination with any better accuracy than by using the raw probability of the combination itself. So even if the support and confidence of the rule are high, without a corresponding improvement greater than 1, the rule offers no advantage over pure chance. Below, we describe the use of associations for “labeled clusters” of video features, instead of consumer “items.”
Time series analysis correlates data values as a function of their temporal separation. For example, time series analysis has been used to discover patterns in stock prices, sales volumes, climate data, and EKG analysis. It is generally assumed that there is an underlying deterministic process that generated the time series and that that process is not random. However, time series data of real-world phenomena is often intermixed with non-deterministic data, for example, unavoidable random noise.
Typically, time series are compared using a similarity measure such as the Euclidean distances, or some variation thereof. However, Euclidean distance measurements tend to be unreliable. A more robust similarity measure is based on dynamic time warping (DTW), see Berndt et al., “Finding patterns in time series: a dynamic programming approach,” Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Menlo Park, Calif. pp. 229–248, 1966. DTW attempts to align time series data by selectively stretching and shrinking the time axis.
Up to now, most data mining techniques have focused on textual data, numeric data and linear (one-dimensional) signals. However, a huge amount of information is now readily available in the form of multi-dimensional images, movies, and videos that have both space and time dimensions.
Some prior art techniques perform a rudimentary type content analysis on videos. The most common approach trains an expert system using a set of labeled samples, hence those techniques are based on supervised learning, and not unsupervised data mining see Xu et al., “Algorithms and Systems for Segmentation and Structure Analysis in Soccer Video,” IEEE International Conference on Multimedia and Expo, Tokyo, Japan, Aug. 22–25, 2001, U.S. patent application Ser. No. 09/839,924 “Method and System for High-Level Structure Analysis and Event Detection in Domain Specific Videos,” filed by Xu et al., on Apr. 20, 2001, and Naphade et al., “Probabilistic multimedia objects (multijects): A novel approach to indexing and retrieval in multimedia systems,” Proceedings of the fifth IEEE International Conference on Image Processing, vol. 3, pp. 536–540, 1998.
Prior art unsupervised video analysis techniques are mostly content neutral. For example, videos have been summarized by selecting key frames from identified segments. There, the segments are determined by detecting scene or “shot” changes, e.g., fades or sudden changes in audio volume. What the scenes depict is immaterial. The particular frame selected is usually a good representative of the other frames in the shot according to some criterion. Other techniques exploit changes in camera angles or field of view, e.g., zooming. Such content-neutral techniques have had moderate success and require supplementary content-specific techniques for semantically satisfactory performance.
However, in general, the problems associated with content-based video mining are not well understood.
For example, it is unclear whether well-known classification and regression trees (CART) are applicable to video mining without considerable modification. The CART method splits independent variables into small groups of data sets, and fits a constant function to the small data sets. In categorical trees, the constant function is one that takes a finite small set of values, e.g., yes and no, or low, medium, and high. In regression trees, the mean value of the response is fit to small connected data sets.
Therefore, it is desired to provide a system and method for mining multi-dimensional time series data sensed from a scene, i.e., a sequence of frames acquired by a camera—a video. Video mining would be particularly useful for discovering interesting patterns in videos where an a priori model of the domain and content, such as editing models, characteristics of a news video, or patterns in a sports video, etc., are not readily available.