Clustering is a process by which we search for patterns in data and classify these patterns into categories such that the degree of association is high among the data in a chosen category. Clustering is generally accomplished by automatic characterization, detection, and classification of input data. However, the applicability of clustering is not restricted to pattern recognition. It is applicable, for example, to the taxonomies in biology, to classification of documents in information retrieval, and to social groupings based on various criteria. Clustering can also be used in data mining applications, by applying clustering to prediction tools for training data for prediction of unknown data.
The current approaches to clustering broadly fall under statistical, machine learning, and fuzzy based techniques. The statistical technique analyzes the linear characteristics of the data and classifies them. A machine learning based technique, such as the Artificial Neural Network (ANN), is trained to capture the non-linear characteristics of the data, resulting in a better classification. The fuzzy set theory technique introduces robustness in the classification process by introducing uncertainty similar to human thinking.
The ANN technique attempts to emulate the architecture and information representation schemes of the human brain and its architecture depends on the goal to be achieved. The learning in ANN can be either supervised or unsupervised. In a supervised learning (SL) we assume what the result should be (like a teacher instructing a pupil). In this case we present the input, check what the output shows and then adjust the connection strengths between the input and output mapping until the correct output is given. This can be applied to all inputs until the network gets as error free as possible. The SL method requires an output class declaration for each of the inputs.
Unsupervised learning (USL) or learning without a teacher refers to situations where the input (training) samples are not classified prior to inputting them into the network. In USL the network recognizes features of the input data itself (self-organizes) and displays its findings in some way as to be of use. This is a much more demanding task for the network. The two generally used USL approaches are the parametric approach involving combined classification and parameterization and the non-parametric approach involving partitioning the unclassified data into subsets using Adaptive Resonance Theory (ART). ART encompasses a wide variety of neural networks (NN) based on neurophysiology including prior knowledge and adaptive to learn/acquire new knowledge. This phenomenon known as the stability—plasticity dilemma forms the basis for competitive learning.
The Fuzzy logic approach is carried out by defining relations between data based on factors such as similarity, preference, and uncertainty. This relationship between the input data is represented by a degree of membership.
Current approaches to clustering based on statistical methods do not capture any nonlinear relationship that can exist between the input data, and hence can result in poor classification of input data where non-linearity might exist between input data. On the other hand the ANN and the fuzzy methods can handle static and real-time characteristics, uncertainty, and robustness required in an integrated manner in the input data. However, for a clustering technique to be efficient, it should be able to provide a generic solution to static/real-time data based on characteristics, such as linearity, non-linearity, uncertainty, and robustness.
Therefore, there is a need in the art for an efficient technique that can provide a generic solution to clustering input data (static as well as real-time data) having characteristics such as linearity, non-linearity, uncertainty, and robustness.